diff --git a/146m14b100mdedup/3328773.err b/146m14b100mdedup/3328773.err new file mode 100644 index 0000000000000000000000000000000000000000..6357e2110457146ee0ad1d8181c01824fbd1af3d --- /dev/null +++ b/146m14b100mdedup/3328773.err @@ -0,0 +1,1120 @@ +4: 2023-03-17 10:43:11.161443: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:43:11.161455: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:43:11.161501: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:43:11.161843: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:43:11.161856: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:43:11.161869: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: 2023-03-17 10:43:11.161850: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:43:11.161839: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:43:11.161845: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: 2023-03-17 10:43:11.161520: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:43:11.161541: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:43:11.161549: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:43:11.161559: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:43:11.162038: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:43:11.162066: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:43:11.161579: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:43:11.161957: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:43:11.161964: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:43:11.161979: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:43:11.162040: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:43:11.162043: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:43:11.162083: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:43:11.162115: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:43:11.162122: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:43:11.162139: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:43:11.162214: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:43:11.162217: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:43:11.161917: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:43:11.161939: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:43:11.162068: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:43:11.162082: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:43:11.162088: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: 2023-03-17 10:43:11.161974: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:43:11.161992: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:43:11.162074: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: 2023-03-17 10:43:11.162456: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:43:11.162460: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:43:11.162472: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: 2023-03-17 10:43:11.162362: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:43:11.162384: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:43:11.162398: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: 2023-03-17 10:43:11.161955: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:43:11.162115: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:43:11.162119: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:43:11.162083: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:43:11.162084: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:43:11.162481: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:43:11.162487: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:43:11.162425: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:43:11.162418: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:43:11.162023: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:43:11.162132: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:43:11.162502: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:43:11.162440: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:43:11.162442: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:43:11.162021: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:43:11.162570: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:43:11.162572: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:43:11.162519: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:43:11.162471: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:43:11.162534: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:43:11.162540: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:43:11.162561: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:43:11.162564: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:43:27.382465: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.382493: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-17 10:43:27.383017: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-17 10:43:27.382949: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 10:43:27.382970: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:43:27.382928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:43:27.382960: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:43:27.382885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:43:27.382861: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-17 10:43:27.382506: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-17 10:43:27.383044: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:43:27.382981: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:43:27.383019: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.383184: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:43:27.382956: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:43:27.382996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:43:27.382908: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:43:27.382899: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-17 10:43:27.382548: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-17 10:43:27.383060: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:43:27.383517: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:43:27.383517: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:43:27.383007: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:43:27.382975: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-17 10:43:27.383003: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-17 10:43:27.383004: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:43:27.382930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:43:27.382911: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-17 10:43:27.383206: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.383081: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:43:27.383066: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.382547: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:43:27.382987: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-17 10:43:27.383027: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-17 10:43:27.383039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:43:27.382935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:43:27.382930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.383224: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.383097: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:43:27.383063: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.382559: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:43:27.382996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-17 10:43:27.383055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-17 10:43:27.383063: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:43:27.382951: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:43:27.382954: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.383246: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.383123: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:43:27.383082: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.382595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:43:27.383008: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-17 10:43:27.383070: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-17 10:43:27.383059: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:43:27.382962: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:43:27.382958: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.383268: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.383118: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:43:27.383093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.382573: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:43:27.383012: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-17 10:43:27.383073: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-17 10:43:27.383039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:43:27.382971: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:43:27.382965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:43:27.383282: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:43:27.383290: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:43:27.383300: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.383129: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:43:27.383075: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:43:27.383030: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-17 10:43:27.383115: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-17 10:43:27.383055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:43:27.382971: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:43:27.383011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-17 10:43:27.384198: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:43:27.383575: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:43:27.383580: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:43:27.383603: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:43:27.384109: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:43:27.384217: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.384227: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.384236: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.384245: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.384244: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:43:27.384237: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:43:27.383614: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:43:27.384226: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:43:27.384155: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:43:27.384127: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:43:27.384233: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.384262: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:43:27.384262: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:43:27.384255: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:43:27.383622: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:43:27.384244: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:43:27.384187: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:43:27.384137: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:43:27.384148: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:43:27.384155: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:43:27.384157: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:43:27.384265: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:43:27.384272: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:43:27.384304: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:43:27.384315: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:43:27.384326: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:43:27.384266: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:43:27.384275: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:43:27.384283: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:43:27.383635: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:43:27.384258: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:43:27.384265: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:43:27.384281: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:43:27.384286: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:43:27.384185: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:43:27.384168: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:43:27.384333: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:43:27.384292: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:43:27.384291: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:43:27.384301: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:43:27.384204: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:43:27.384178: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:43:27.384354: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:43:27.384304: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:43:27.384309: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:43:27.384208: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:43:27.384212: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:43:27.384226: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:43:27.384260: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:44:02.624059: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.624247: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 10:44:02.624075: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.624350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.624266: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 10:44:02.624088: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.624369: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 10:44:02.624093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 10:44:02.624273: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.624379: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 10:44:02.624466: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 10:44:02.624102: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 10:44:02.624286: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.624397: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 10:44:02.624406: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:44:02.624108: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 10:44:02.624291: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:44:02.624483: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.624400: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:44:02.624111: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 10:44:02.624298: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:44:02.624504: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.624409: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 10:44:02.624446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.624575: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 10:44:02.624118: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 10:44:02.624300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:44:02.624507: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.624411: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.624600: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 10:44:02.624317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:44:02.624516: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.624423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 10:44:02.624682: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.624595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.624462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 10:44:02.624512: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.624618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 10:44:02.624711: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 10:44:02.624474: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 10:44:02.624520: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.624614: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 10:44:02.624735: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 10:44:02.624499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 10:44:02.624531: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.624621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 10:44:02.624733: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 10:44:02.624506: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.624631: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 10:44:02.624740: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 10:44:02.624523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.624634: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 10:44:02.624749: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 10:44:02.624535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:44:02.624755: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:44:02.624765: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:44:02.626414: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.626527: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-17 10:44:02.626414: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.626531: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-17 10:44:02.626420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.626531: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-17 10:44:02.626420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.626532: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-17 10:44:02.626422: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.626533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-17 10:44:02.626422: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:44:02.626432: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:44:02.626535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-17 10:44:02.626433: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:44:02.626437: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:44:02.626440: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.626543: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:44:02.626440: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:44:02.626439: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:44:02.626545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:44:02.626549: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:44:02.626552: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:44:02.626477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 10:44:02.626551: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:44:02.626551: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.626633: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-17 10:44:02.626482: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.626638: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-17 10:44:02.626490: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:44:02.626498: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:44:02.626647: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:44:02.626651: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:44:02.626767: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626767: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626768: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 10:44:02.626908: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626770: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626774: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 10:44:02.626910: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626776: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 10:44:02.626909: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626778: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-17 10:44:02.627023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 10:44:02.626910: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626784: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:44:02.626785: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626786: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:44:02.626788: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:44:02.626788: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:44:02.626913: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 10:44:02.626791: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:44:02.626794: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626850: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.626914: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 10:44:02.627111: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:44:02.626863: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:44:02.627025: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.626926: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.626925: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:44:02.626927: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:44:02.626935: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:44:02.627025: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 10:44:02.627154: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 10:44:02.626935: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:44:02.626936: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:44:02.626975: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 10:44:02.627121: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-17 10:44:02.627029: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.627155: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 10:44:02.626980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 10:44:02.627120: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-17 10:44:02.627029: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:44:02.627126: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.627155: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 10:44:02.626991: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:44:02.626993: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:44:02.627119: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-17 10:44:02.627031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.627155: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 10:44:02.627123: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-17 10:44:02.627036: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:44:02.627039: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627157: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 10:44:02.627126: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-17 10:44:02.627037: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:44:02.627043: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:44:02.627043: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627160: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 10:44:02.627124: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-17 10:44:02.627047: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:44:02.627047: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:44:02.627047: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:44:02.627053: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:44:02.627055: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 10:44:02.627122: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:44:02.627139: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:44:02.627140: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627163: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 10:44:02.627144: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:44:02.627144: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:44:02.627145: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:44:02.627173: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627173: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:44:02.627146: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:44:02.627147: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627174: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627176: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627177: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627178: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627180: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:44:02.627181: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:44:02.656174: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.656206: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.656218: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.656241: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.656262: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.656272: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.656279: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.656294: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.658834: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.658829: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.658832: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.658834: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.658837: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.658851: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:44:02.658842: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.658854: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:44:02.658858: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:44:02.658860: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:44:02.658859: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:44:02.658862: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:44:02.658893: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.658899: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:44:02.658916: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:44:02.658920: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +6: Successfully preprocessed all matching files. +6: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +5: +5: +5: +5: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: +2: +2: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +0: +0: +0: Loading extension module utils... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils...Loading extension module utils... +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils...Loading extension module utils... +7: +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +2: No modifications detected for re-loaded extension module utils, skipping build step... +3: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +2: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +2: No modifications detected for re-loaded extension module utils, skipping build step... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils...Loading extension module utils... +2: +3: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +1: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils...Loading extension module utils... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m14b100mdedup/3328773.out b/146m14b100mdedup/3328773.out new file mode 100644 index 0000000000000000000000000000000000000000..7ab92e32f32d520ab5bb391220461b93f07335ee --- /dev/null +++ b/146m14b100mdedup/3328773.out @@ -0,0 +1,5664 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m14b100mdedupval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m14b100mdedupval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m14b100mdedup --load checkpoints_146m14b100mdedup --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3328773.json --zero-stage 0 +START 3328773: Fri 17 Mar 2023 10:42:03 AM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 48.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 48.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 43.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 47.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 45.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 49.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 48.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 42.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 43.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 46.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 53.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 49.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 45.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 44.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 48.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 47.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 48.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 45.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 38.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +2: Launching on nid006719 (2/8), master nid006717 port 9999, GPUs 8, CUDA: True +3: Launching on nid006720 (3/8), master nid006717 port 9999, GPUs 8, CUDA: True +7: Launching on nid006724 (7/8), master nid006717 port 9999, GPUs 8, CUDA: True +0: Launching on nid006717 (0/8), master nid006717 port 9999, GPUs 8, CUDA: True +4: Launching on nid006721 (4/8), master nid006717 port 9999, GPUs 8, CUDA: True +5: Launching on nid006722 (5/8), master nid006717 port 9999, GPUs 8, CUDA: True +1: Launching on nid006718 (1/8), master nid006717 port 9999, GPUs 8, CUDA: True +6: Launching on nid006723 (6/8), master nid006717 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3328773.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m14b100mdedupval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m14b100mdedup +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m14b100mdedup +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m14b100mdedupval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-17 10:45:38,405] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.117 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: [1/1] c++ scaled_masked_softmax_hip.cuda.o scaled_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 27.985 seconds +0: time to initialize megatron (seconds): 81.449 +0: [after megatron is initialized] datetime: 2023-03-17 10:46:09 +0: building GPT model ... +0: [2023-03-17 10:46:09,585] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-17 10:46:09,586] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-17 10:46:09,586] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-17 10:46:11,588] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-17 10:46:12,041] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-17 10:46:12,042] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-17 10:46:12,042] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.48 GB, percent = 6.3% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-17 10:46:12,043] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-17 10:46:24,535] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-17 10:46:24,536] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-17 10:46:24,536] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-17 10:46:24,540] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-17 10:46:24,541] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-17 10:46:24,658] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-17 10:46:24,658] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 10:46:24,659] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.16 GB, percent = 6.4% +0: ninja: no work to do. +1: Time to load utils op: 0.280193567276001 secondsTime to load utils op: 0.2801947593688965 seconds +1: +1: Time to load utils op: 0.2802271842956543 seconds +1: Time to load utils op: 0.280245304107666 seconds +1: Time to load utils op: 0.2802724838256836 secondsTime to load utils op: 0.2802741527557373 seconds +1: +1: Time to load utils op: 0.28027772903442383 seconds +1: Time to load utils op: 0.2802858352661133 seconds +7: Time to load utils op: 0.27638864517211914 seconds +7: Time to load utils op: 0.276430606842041 seconds +7: Time to load utils op: 0.2763218879699707 seconds +7: Time to load utils op: 0.2769312858581543 seconds +7: Time to load utils op: 0.27632570266723633 seconds +7: Time to load utils op: 0.27652549743652344 seconds +7: Time to load utils op: 0.2767918109893799 seconds +7: Time to load utils op: 0.27652835845947266 seconds +5: Time to load utils op: 0.27465248107910156 seconds +5: Time to load utils op: 0.274674654006958 secondsTime to load utils op: 0.2746708393096924 seconds +5: +5: Time to load utils op: 0.2747061252593994 seconds +5: Time to load utils op: 0.2747173309326172 seconds +5: Time to load utils op: 0.2747337818145752 secondsTime to load utils op: 0.2747328281402588 secondsTime to load utils op: 0.27473902702331543 seconds +5: +5: +0: Time to load utils op: 0.1974802017211914 seconds +0: Time to load utils op: 0.28565382957458496 secondsTime to load utils op: 0.2857041358947754 seconds +0: Time to load utils op: 0.2856757640838623 seconds +0: +0: Time to load utils op: 0.2858283519744873 secondsTime to load utils op: 0.2857787609100342 seconds +0: +0: Time to load utils op: 0.2857091426849365 seconds +0: Time to load utils op: 0.3165767192840576 seconds +6: Time to load utils op: 0.27919864654541016 seconds +3: Time to load utils op: 0.2751801013946533 seconds +6: Time to load utils op: 0.27968311309814453 seconds +3: Time to load utils op: 0.27518463134765625 seconds +6: Time to load utils op: 0.27860212326049805 seconds +3: Time to load utils op: 0.2752115726470947 seconds +6: Time to load utils op: 0.2793252468109131 seconds +6: Time to load utils op: 0.2793247699737549 secondsTime to load utils op: 0.2795412540435791 seconds +6: +6: Time to load utils op: 0.2789039611816406 seconds +6: Time to load utils op: 0.27858662605285645 seconds +3: Time to load utils op: 0.2752492427825928 seconds +3: Time to load utils op: 0.27518749237060547 seconds +3: Time to load utils op: 0.2752516269683838 seconds +3: Time to load utils op: 0.27527832984924316 secondsTime to load utils op: 0.27527952194213867 seconds +3: +4: Time to load utils op: 0.2761807441711426 secondsTime to load utils op: 0.2760756015777588 seconds +4: +4: Time to load utils op: 0.27619194984436035 seconds +4: Time to load utils op: 0.2761876583099365 seconds +4: Time to load utils op: 0.2764599323272705 seconds +4: Time to load utils op: 0.2765324115753174 seconds +4: Time to load utils op: 0.2765471935272217 secondsTime to load utils op: 0.2765517234802246 seconds +4: +2: Time to load utils op: 0.2766585350036621 seconds +2: Time to load utils op: 0.2765812873840332 secondsTime to load utils op: 0.2766902446746826 seconds +2: +2: Time to load utils op: 0.2767183780670166 secondsTime to load utils op: 0.2767298221588135 seconds +2: +2: Time to load utils op: 0.2767329216003418 secondsTime to load utils op: 0.2767322063446045 seconds +2: Time to load utils op: 0.27675843238830566 seconds +2: +0: [2023-03-17 10:46:24,960] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-17 10:46:24,961] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 10:46:24,961] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.16 GB, percent = 6.4% +6: Time to load utils op: 0.0007929801940917969 seconds +6: Time to load utils op: 0.0008471012115478516 secondsTime to load utils op: 0.0008435249328613281 secondsTime to load utils op: 0.0008642673492431641 seconds +6: +6: +6: Time to load utils op: 0.0008573532104492188 seconds +6: Time to load utils op: 0.0008444786071777344 seconds +6: Time to load utils op: 0.0008635520935058594 seconds +6: Time to load utils op: 0.0009591579437255859 seconds +0: Time to load utils op: 0.0006492137908935547 secondsTime to load utils op: 0.0006811618804931641 secondsTime to load utils op: 0.0006339550018310547 seconds +0: +0: +0: Time to load utils op: 0.0006783008575439453 seconds +0: Time to load utils op: 0.0006856918334960938 secondsTime to load utils op: 0.0007107257843017578 seconds +0: +0: Time to load utils op: 0.0005218982696533203 seconds +7: Time to load utils op: 0.0009908676147460938 secondsTime to load utils op: 0.0010221004486083984 seconds +7: +7: Time to load utils op: 0.001129150390625 seconds +7: Time to load utils op: 0.0013153553009033203 seconds +7: Time to load utils op: 0.001453399658203125 seconds +7: Time to load utils op: 0.0014789104461669922 seconds +7: Time to load utils op: 0.0014178752899169922 seconds +7: Time to load utils op: 0.0014491081237792969 seconds +3: Time to load utils op: 0.0008158683776855469 seconds +2: Time to load utils op: 0.0008292198181152344 seconds +5: Time to load utils op: 0.000774383544921875 seconds +3: Time to load utils op: 0.0011365413665771484 seconds +5: Time to load utils op: 0.0008642673492431641 secondsTime to load utils op: 0.0009136199951171875 seconds +5: +5: Time to load utils op: 0.0009057521820068359 seconds +3: Time to load utils op: 0.0011088848114013672 seconds +2: Time to load utils op: 0.0010182857513427734 seconds +3: Time to load utils op: 0.0012083053588867188 seconds +5: Time to load utils op: 0.0010237693786621094 seconds +5: Time to load utils op: 0.0010764598846435547 secondsTime to load utils op: 0.0011224746704101562 seconds +2: Time to load utils op: 0.001074075698852539 seconds +5: +5: Time to load utils op: 0.0011811256408691406 seconds +3: Time to load utils op: 0.0013086795806884766 seconds +3: Time to load utils op: 0.0013725757598876953 seconds +1: Time to load utils op: 0.0008616447448730469 seconds +3: Time to load utils op: 0.0013544559478759766 seconds +2: Time to load utils op: 0.001329183578491211 seconds +3: Time to load utils op: 0.00135040283203125 seconds +2: Time to load utils op: 0.001169443130493164 seconds +1: Time to load utils op: 0.0009176731109619141 seconds +2: Time to load utils op: 0.0011947154998779297 seconds +2: Time to load utils op: 0.0011944770812988281 seconds +2: Time to load utils op: 0.0012314319610595703 seconds +1: Time to load utils op: 0.0011506080627441406 seconds +1: Time to load utils op: 0.0012047290802001953 seconds +1: Time to load utils op: 0.0012273788452148438 seconds +1: Time to load utils op: 0.0011959075927734375 secondsTime to load utils op: 0.0012881755828857422 seconds +1: +1: Time to load utils op: 0.0012810230255126953 seconds +4: Time to load utils op: 0.0007898807525634766 seconds +4: Time to load utils op: 0.0011146068572998047 seconds +4: Time to load utils op: 0.0011315345764160156 seconds +4: Time to load utils op: 0.0011439323425292969 seconds +4: Time to load utils op: 0.0011365413665771484 seconds +4: Time to load utils op: 0.0012030601501464844 secondsTime to load utils op: 0.0011773109436035156 seconds +4: +4: Time to load utils op: 0.0011496543884277344 seconds +0: [2023-03-17 10:46:25,117] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-17 10:46:25,117] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 10:46:25,118] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.31 GB, percent = 6.4% +0: [2023-03-17 10:46:25,221] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-17 10:46:25,222] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 10:46:25,222] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.31 GB, percent = 6.4% +0: [2023-03-17 10:46:25,327] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-17 10:46:25,328] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:46:25,328] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.31 GB, percent = 6.4% +0: [2023-03-17 10:46:25,431] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-17 10:46:25,432] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:46:25,432] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.31 GB, percent = 6.4% +0: [2023-03-17 10:46:25,536] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-17 10:46:25,537] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:46:25,537] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.31 GB, percent = 6.4% +0: [2023-03-17 10:46:25,639] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-17 10:46:25,640] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:46:25,640] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.31 GB, percent = 6.4% +0: [2023-03-17 10:46:25,748] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-17 10:46:25,749] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:46:25,749] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.31 GB, percent = 6.4% +0: [2023-03-17 10:46:25,851] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-17 10:46:25,852] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:46:25,852] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.31 GB, percent = 6.4% +0: [2023-03-17 10:46:25,852] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-17 10:46:25,852] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-17 10:46:25,852] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-17 10:46:25,852] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-17 10:46:25,853] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-17 10:46:25,854] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-17 10:46:25,855] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-17 10:46:25,855] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0004222393035888672 seconds +0: [2023-03-17 10:46:25,855] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-17 10:46:25,865] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +7: [2023-03-17 10:46:25,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-17 10:46:25,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:25,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-17 10:46:25,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:25,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:25,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:25,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:25,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:25,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:26,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:26,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:26,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:26,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:26,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:26,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:26,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:26,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:26,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:26,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:26,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:26,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:26,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:26,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:26,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:26,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:26,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:26,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:26,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:26,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:46:26,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:26,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:26,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:26,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:26,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:26,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:26,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:26,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:46:26,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:26,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:46:26,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:26,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:46:26,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:46:26,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:26,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:26,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:46:26,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:46:26,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:46:26,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:46:26,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:46:26,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:46:26,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:46:26,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:46:26,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:46:26,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:46:26,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:46:26,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:46:26,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:26,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:26,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:26,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:46:26,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:46:26,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:46:26,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:26,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:26,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:26,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:26,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:26,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:26,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:26,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:46:26,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:46:26,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:46:26,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:26,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:26,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:26,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:26,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:26,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:26,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:46:26,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:26,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:26,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:26,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:26,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:26,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:26,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:26,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:28,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:28,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:46:28,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:46:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:28,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:28,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:28,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:28,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:28,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:28,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:28,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:28,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:28,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:28,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:28,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:46:28,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:28,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:28,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:28,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:28,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:28,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:28,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:28,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:28,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:28,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:28,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:28,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:28,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:28,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:28,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:28,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:28,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:28,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:28,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:28,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:28,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:28,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:28,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:28,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:28,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:46:28,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:28,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:28,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:28,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:46:28,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:28,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:28,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:28,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:28,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:30,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:30,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:46:30,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:46:30,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:46:30,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:46:30,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:46:30,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:46:30,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:46:30,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:46:30,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:46:30,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:46:30,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:46:30,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:46:30,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:46:30,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:46:30,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:46:30,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:46:30,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:46:30,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:46:30,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:46:30,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:46:30,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:46:30,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:46:30,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:46:30,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:46:30,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:46:30,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:46:30,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:46:30,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:46:30,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:46:30,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:46:30,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:46:30,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:46:30,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:30,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:46:30,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:46:30,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:30,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:30,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:30,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:30,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:30,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:30,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:46:30,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:30,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:30,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:30,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:30,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:30,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:46:30,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:30,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:30,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:30,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:30,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:33,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:46:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:46:33,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:33,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:33,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:33,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:33,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:33,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:33,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:46:33,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:33,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:33,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:46:33,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:33,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:33,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:33,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:46:33,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:33,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:46:33,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:46:33,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:46:33,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:33,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:33,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:33,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:33,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:33,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:33,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:33,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:46:34,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:34,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:46:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:46:34,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:34,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:46:34,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:46:34,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:46:34,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:46:34,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:46:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:46:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:46:34,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:46:34,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:46:34,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:46:34,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:46:34,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:46:34,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:46:34,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:46:34,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:46:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:46:34,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:46:34,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:46:34,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:46:34,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:46:34,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:46:34,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:46:34,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:46:34,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:46:34,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:46:34,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:46:34,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:46:34,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:46:34,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:46:34,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:46:34,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:46:34,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:46:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:46:34,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:46:34,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:46:34,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:46:34,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:46:34,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:46:34,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:46:34,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:46:34,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:46:34,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:46:34,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:46:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:46:34,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:46:34,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:46:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:46:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:46:34,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:46:34,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +0: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:46:34,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:46:34,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:46:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:46:34,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:46:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:46:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:46:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:46:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:46:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:46:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:46:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:46:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:46:34,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:46:34,943] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +2: [2023-03-17 10:46:34,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:46:34,946] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +2: [2023-03-17 10:46:34,946] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +4: [2023-03-17 10:46:34,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:46:34,947] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +2: [2023-03-17 10:46:34,948] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +4: [2023-03-17 10:46:34,949] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +7: [2023-03-17 10:46:34,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:46:34,951] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +7: [2023-03-17 10:46:34,953] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +0: [2023-03-17 10:46:34,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:46:34,953] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +0: [2023-03-17 10:46:34,955] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +3: [2023-03-17 10:46:34,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:46:34,955] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +6: [2023-03-17 10:46:34,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:46:34,956] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +5: [2023-03-17 10:46:34,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:46:34,956] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +3: [2023-03-17 10:46:34,957] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +5: [2023-03-17 10:46:34,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:46:34,957] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +6: [2023-03-17 10:46:34,958] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +5: [2023-03-17 10:46:34,958] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +5: [2023-03-17 10:46:34,959] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +1: [2023-03-17 10:46:34,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:46:34,963] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +1: [2023-03-17 10:46:34,965] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +6: [2023-03-17 10:46:34,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:46:34,965] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +6: [2023-03-17 10:46:34,967] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +3: [2023-03-17 10:46:34,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:46:34,971] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +3: [2023-03-17 10:46:34,972] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +0: [2023-03-17 10:46:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:46:34,974] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +0: [2023-03-17 10:46:34,976] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +3: [2023-03-17 10:46:34,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:46:34,981] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +0: [2023-03-17 10:46:34,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:46:34,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:46:34,982] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +0: [2023-03-17 10:46:34,982] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +3: [2023-03-17 10:46:34,983] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +5: [2023-03-17 10:46:34,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:46:34,984] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +0: [2023-03-17 10:46:34,985] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +0: [2023-03-17 10:46:34,985] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +4: [2023-03-17 10:46:34,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:46:34,986] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +5: [2023-03-17 10:46:34,986] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +4: [2023-03-17 10:46:34,988] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +4: [2023-03-17 10:46:34,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:46:34,991] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +4: [2023-03-17 10:46:34,992] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +2: [2023-03-17 10:46:34,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:46:34,992] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +2: [2023-03-17 10:46:34,994] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +7: [2023-03-17 10:46:34,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:46:34,995] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +1: [2023-03-17 10:46:34,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:46:34,997] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +7: [2023-03-17 10:46:34,997] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +2: [2023-03-17 10:46:34,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:46:34,998] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +1: [2023-03-17 10:46:34,998] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +2: [2023-03-17 10:46:35,000] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +7: [2023-03-17 10:46:35,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:46:35,000] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +7: [2023-03-17 10:46:35,001] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +6: [2023-03-17 10:46:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:46:35,002] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +6: [2023-03-17 10:46:35,004] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +1: [2023-03-17 10:46:35,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:46:35,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:46:35,006] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +6: [2023-03-17 10:46:35,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:46:35,006] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +5: [2023-03-17 10:46:35,006] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +3: [2023-03-17 10:46:35,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:46:35,006] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +1: [2023-03-17 10:46:35,007] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +6: [2023-03-17 10:46:35,007] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +5: [2023-03-17 10:46:35,007] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +3: [2023-03-17 10:46:35,008] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +0: [2023-03-17 10:46:35,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:46:35,008] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +5: [2023-03-17 10:46:35,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:46:35,009] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +0: [2023-03-17 10:46:35,011] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +5: [2023-03-17 10:46:35,011] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +0: [2023-03-17 10:46:35,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:46:35,013] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +0: [2023-03-17 10:46:35,015] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +0: [2023-03-17 10:46:35,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:46:35,016] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +0: [2023-03-17 10:46:35,018] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +4: [2023-03-17 10:46:35,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:46:35,020] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +4: [2023-03-17 10:46:35,022] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +6: [2023-03-17 10:46:35,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:46:35,027] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +3: [2023-03-17 10:46:35,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:46:35,027] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +6: [2023-03-17 10:46:35,028] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +3: [2023-03-17 10:46:35,029] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +2: [2023-03-17 10:46:35,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:46:35,029] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +2: [2023-03-17 10:46:35,030] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +7: [2023-03-17 10:46:35,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:46:35,034] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +7: [2023-03-17 10:46:35,036] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +3: [2023-03-17 10:46:35,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:46:35,038] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +2: [2023-03-17 10:46:35,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:46:35,038] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +5: [2023-03-17 10:46:35,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:46:35,038] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +3: [2023-03-17 10:46:35,039] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +2: [2023-03-17 10:46:35,040] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +5: [2023-03-17 10:46:35,040] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +7: [2023-03-17 10:46:35,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:46:35,040] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +7: [2023-03-17 10:46:35,042] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +5: [2023-03-17 10:46:35,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:46:35,049] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +5: [2023-03-17 10:46:35,051] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +7: [2023-03-17 10:46:35,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:46:35,054] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +7: [2023-03-17 10:46:35,056] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +1: [2023-03-17 10:46:35,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:46:35,104] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +1: [2023-03-17 10:46:35,106] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +1: [2023-03-17 10:46:35,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:46:35,173] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +1: [2023-03-17 10:46:35,174] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +3: [2023-03-17 10:46:35,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:46:35,296] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +3: [2023-03-17 10:46:35,298] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +6: [2023-03-17 10:46:35,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:46:35,358] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +7: [2023-03-17 10:46:35,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:46:35,359] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +6: [2023-03-17 10:46:35,360] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +7: [2023-03-17 10:46:35,360] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +6: [2023-03-17 10:46:35,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:46:35,371] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +4: [2023-03-17 10:46:35,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:46:35,372] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +6: [2023-03-17 10:46:35,372] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +4: [2023-03-17 10:46:35,373] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +2: [2023-03-17 10:46:35,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:46:35,419] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +4: [2023-03-17 10:46:35,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:46:35,420] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +2: [2023-03-17 10:46:35,420] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +4: [2023-03-17 10:46:35,421] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +2: [2023-03-17 10:46:35,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:46:35,444] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +2: [2023-03-17 10:46:35,447] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +2: [2023-03-17 10:46:35,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:46:35,456] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +2: [2023-03-17 10:46:35,457] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +4: [2023-03-17 10:46:35,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:46:35,481] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +4: [2023-03-17 10:46:35,482] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +5: [2023-03-17 10:46:35,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:46:35,544] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +5: [2023-03-17 10:46:35,545] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +1: [2023-03-17 10:46:35,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:46:35,591] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +1: [2023-03-17 10:46:35,592] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +7: [2023-03-17 10:46:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:46:36,064] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +7: [2023-03-17 10:46:36,066] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +4: [2023-03-17 10:46:36,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:46:36,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +4: [2023-03-17 10:46:36,106] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +1: [2023-03-17 10:46:36,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:46:36,545] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +1: [2023-03-17 10:46:36,547] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +1: [2023-03-17 10:46:36,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:46:36,551] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +1: [2023-03-17 10:46:36,552] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +3: [2023-03-17 10:46:38,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:46:38,153] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +3: [2023-03-17 10:46:38,155] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +6: [2023-03-17 10:46:38,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b100mdedup/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:46:38,168] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +6: [2023-03-17 10:46:38,170] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +0: successfully loaded checkpoint from checkpoints_146m14b100mdedup at iteration 0 +7: time (ms) | load-checkpoint: 12307.85 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-17 10:46:38 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.033347 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.077 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.038172 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.082 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-17 10:46:53 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 29420.13 | train/valid/test-data-iterators-setup: 14149.02 +0: [after training is done] datetime: 2023-03-17 10:46:53 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.963449E+00 | lm loss PPL: 5.263855E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3328773: Fri 17 Mar 2023 10:47:24 AM EET diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..71fc8734447a368ff846dc8c2b3585e260785417 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:68ac659c986f3b1e0801e342d61e2349cefed5775dc33f9faac0c4b13cc53ccd +size 27478295 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f3522c9907ea846e26726f728b3418ee14ea961c --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7c127d288c9d4811c6000a92a3de168109ead07a400c3f46ec19c37ae349e901 +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..868f38f59d3ff81b3abb2cea0668b388068a10f5 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9baf5ab81251829c5573b2e6f2552dd00068aa8dc20c66a0b042776c6da2daa +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..71ec117bf3da4fcc958673fb429c47ff8ace1e56 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c713314afa47f164298709df7b2af4b5698b2eff0c26d9d12f1fb244c2ceb37 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aafc1aec980b49e5376b8755e10f8dc4b7d80a89 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:949d3bcce13b07e86bc8259221701b7713537f262cc3060650c72dd21ca47c4b +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2ac68c8bc78b3405b4c9965a184c1dcc014ff658 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43c2d3750e3cb2e5681c3c70f35f64c4b06a915fb6aa8505250ef0cb746425a4 +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..697c52c461bd9a3146385b7a037fff10a7c5ea01 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af8061d7092a354f7ebf4c4217e0479a5165fe55d18a79a0e417c88351bd489e +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..49760d83f33897267a8b8136fb0b1f9927218779 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9b5bb2403d09105485415b2f21496cf9f10349fc1a11d940e2fc12584238a733 +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..588ecf2f7fb3d734f735fc828b4159af9dc5ea62 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26ce47e04765f45530fd00e5d2681b35115f0cc6361db1fbd83dbd33a439718b +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..734e362fefa37d5550412e978469b292871de913 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32eaa3b40cc66227d83c1c98e77d733468fb580602995e6f15c070c167df4896 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0af706baa0b35a3d671f311965361e1a477c6d7c --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2df52688202d209d234b2c5ef5e5c1201f4b75f9a30bcd30a90c2a4720e38ad1 +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e32628591f3665695aec8c4cea6a73776806cf2a --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:31d14a3d2f4f08568c162ecca875962d2fa7a6ad8b9bfddc426d644a341bc4d3 +size 27478231 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c25e72d24626a3500f2080db6764851411f51e51 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a61afede6c7cda27c6037cc8499812e25c448a3039da7470b593165b9c0bd8e +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b95b7435c7b248aa0b296658d0ca7061c71de1c --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d51fe8f21565f9748ec1f04c043d686c32fd7203b6b4c3ee9ebc288c88ff3447 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..db20e8bfb7f15f33c75cb62835003ed87aafbc2e --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6095864392a7ffdcc57464dac012739dab504c531056daf07ce50034f68539f2 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..18760249c167eed7050b7d233436422f0ffe8f68 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85f5899830bb29bbbb64c256690697e7eaec890b875bc88448942a3c46f551ce +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57cf62afa2771c1a9fdf6304bc9cbf66cee9e462 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9bcdc6698a68b90454b0c4221f049fd9a38319106270f57bf82d5866316f1bb9 +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..914d03cb3b9d539d8ceb1349b1632ced55103b38 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7dffe3ff5cbe8718e4a49dd733c9dea52199a16d5e89e3e049297625a78ff704 +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..160428676a334798a5d7a459b0f1630d95fa9c02 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4bc3e90676ac03a7e2a0d61835fb4356c3e821fe89cc0145729b5d43fcf8c1eb +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5128935a2e6a67295f1fb29f8b7d89b225aa622a --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17fb8ef847da55fc765cd3045ea44bee76f369e46231d7cf22e86fbad72fd66d +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..042238fdb5156733302f1b8b4948aa2316ea6c0d --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f3df11831036cc83adafe7eb9406a744990a872f918ba20ec42bc80da1b99578 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5567fc0162c7f1ec5162ed9976acefab6b651d40 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:394b044786cd410250e94c87649b1c3e652a2c54dd90992fceeb1021d6e7aecc +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bc493783922ecb5d36e44af66bba8d1f0f8d1718 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc8ddee90ec319fff7e291057763cda21eb5b993821d330626dcfd7b30d350d1 +size 27478231 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9fb3857cb77eaa334f394f7664f6330bb87f55b --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db256f451c322f7e1c11d4d8f307c623b6d20336a6607830fb78def2e02aac45 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dacd6689c58eac9dd525bda4ca3d5bc5e2194657 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1944d3eb05ccd18846dd17cd7e02acba1077936db0022f264bf8f6828806ef6c +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b15bf7001d528c56a49b268244aaa3bfb5b8b77 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7dcafbcc957ebb20ce6d25ec00623e5ae18427c552e2d71cce2918503acaff32 +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..92ac628a81d4dda0f907e51a69c948ccf38b8106 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cd698b45d813b99e6e8a58ec0a596ef6bb4475397ac158f8f92dae0e3c6155c1 +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..797b7749e6b785cc43996f872f9104c3044ae1fa --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a381ac0f87c4aab5a38df73cf979801c6a8fe47ecbe6ddc6dae2a66b311a163d +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dab6bf27ece8cdd10286ee04d1e9c5807340f5b9 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0fa086c62438251a80f448a0376e2a12176f6eb81e2f71a9469c09100546315d +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a7f7cb007207d1fc942f7c80c7637a363ed63dc --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab2334e4759f995369ba3bc6a8c16334b931fa874b041e9e7cedaf6f81fb2eea +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..352a8541eae060799e9045610dee2d143f543660 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4e058e58002b99efa7be04855eac4116eb1faabdbc9603d01e70925cae5b0dfe +size 27478114 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dfd4e9368f214888a7a38b71c321517459286fba --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9baaf643ad3cd95950144bbdf4757c966c2108d068629b0a247f82a71b86e3b4 +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5ce7aa56b871bdb3069a4189bfcd065b26f5d13 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9aa5a5ae125edb819a2151a603cd86884ee7684bc63d13e2f55728f454105737 +size 27478434 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cce2d56348582a511abee934c119c29a5ef80a79 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:094110a49e732f33d7e087ac4a9c0c386ad5d685f9288d2e6b66fd302fe8087e +size 27478167 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc7201b3b778465a13c493e0983f247ff5e22985 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e7cdfde524888fa1aea87b5538dde468fad1c8c02ac9ba8750551452e36063c5 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b47999bd775ca652c5f06112cef47e9cd69febf3 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1163597d0b3c731179c0ea1889f26c10b32a32499c0aeabbbef9ff4cb265f0cd +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..16221f1022ce8895fe390f4461d11f0095a485ce --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:470f907d90c897e4caa806b1d9dfe0a1259da51604ad32e5924b3d06aede3916 +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..72605df3586b20ffbf0d89eeddff6920477fe5fb --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c4f90b9209944f2b2e594fbd05db0df03f08597a61213a0957645df8fc78d3a1 +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..68d39e932fa9e95cea8360fcd573c69c9989e6ad --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f4c20ca43d4455fd3ce25e666610ba57d7010d896befe9a21f240b637367d41 +size 27478434 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..11ef1dbda0a4a732e0a93ce9bd8a69565e24aea2 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:364a67b3ba0b287a2decb60ffec9dc3a574cb41b280a2cfd561b3b323234adeb +size 27478050 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4fb18af27197dd5e188ec7fc68bdbeb14f6c4b41 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:981927d6ccc5f904e207e4c46f5399154dff95151b3d6867db5a6bc456056eee +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..16e973327f76f33c37c81e4bffee65de80834ca5 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f3611755f88836cac1a0e11c524ad1cc7e43fdd18618d46ad53734f333848203 +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..385746834dae504aa66981a646ca80ebbfc136a3 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b92479ed11bb328947af8a38efb558bb9dcc2e1ebc514554c2d17271fe82076c +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb3e9de6fc5ecf8898f67ea03ac9f56f97b8c7ac --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:788540ff78a9c18794f7b3e54b51c3e05e47c24d4a1a56295d5633c85b4d28e5 +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5d903d754335d5002f23c697698c44a480633de --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e047b8c6f014781725db1761f12f4d5f5df43459d6157414f0da5e8fc4d45cb2 +size 27478231 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..85584affa61c77430341da47178ec7a146c135fa --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2e36dd66c1f927ccaf804bd1a0a4bc2f519bc3ee846c0bd9b94aa41231cb3d21 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e6868284d89e28a1b96c617b88e44172fe5c909f --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e534c0b361af16dffa98f04b948e36f78ca9eeba2cdeeda00c4e9d2b1be1535d +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..99e69283c8eb781639a6b92158bd44e8b0ba9ecf --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d2f297adec5543fa3f40f3d962980e61349e41a5f5849bc9244f14aa241e59a3 +size 27478434 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6912b1a5c658016df090c1196112f2e181b302d4 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f0ea067ffd0c8145758184898682feb5a9e5b53b3f19cf4acac0de82cfd31dcb +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c8ec04b986d7767eb9c91aa0925bcdfde35286e --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f2aefc2e84f7b8ee06b5eb2e91a6f5b95dd6ead4a8126c048eb9ac1da0a9193 +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aabdcb694e10176308aa9742e70b4c621347b937 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e01da450fc13da7c9922935e15dc90e637dd3c7a52f4838e53cea64ba295b0a +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..edc1f8ee93e9c7176a2998b7ab9a529f7c39a155 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a7d9bbc968cc86cbb2c8628f3ef33ef7cd14b8a8494f0f9bf0483404b4f3e75 +size 27478306 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eb8bd2d2255771c1b861c6ce9ef8ea95b0ac6872 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fdaad1efae58ec859155755d5084b56131bf8c687b56f657a09bee04d9f7bbdd +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e97e28290009a26987c01f1e7c4d846c7b471063 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aaf318c216e5a358e7dc7501b7ce80768a78fc0ef04d1bb0234adc46bffc4a18 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..89472085181da28da2d55e62d7c3359f3cb83848 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4475a096c3a90ddef62de918e76456cdd68c3169c0ca6578a03f5a51e455854b +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..946993aa1a5dbb5996cebdcac15cb544de245919 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ec298a5d127d0975cf62dd98f79f7724eb38d873a1acb046ba7412a38a029de8 +size 27478167 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a3e47cb19727e57a842ac0668ed463e3df4635a0 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:81a50f67fc4fb7f59ae58f4cf2628922da582b6e1e31dd726a840b7306f13673 +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b677a42be55416d7a9ce738653714551273f0018 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a1e4a5cff3945a5e9dbaffb4278422db0ab6af9355d5a2371c4d7df1b60c616 +size 27478370 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7fd7dbeec84466a111b4cdf02a6b49edd21e9f15 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7753875b1d37e49cb18741ffc821b8a2f464778301367cdefa1a2fd14c135fbc +size 27478178 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0ada0eb8033f3a6e6bd254e6858051dd3a0f346c --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b581a882a85187bf6da09c30b55bf2b2ccd1da6545912e74cab3d4f4e1412852 +size 27478242 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ca4f3f78a35b97571d6dfddfda6c4b1a6af28cc6 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d01070d8047236aabc36bedfd2ad808564191105e73e9e27c95f0590f6dd3806 +size 27478359 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c17a4f9baeea0268247b1b47e02f1d59e8c1177 --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34b353b36da21dad2b42526d9514a67af6b88efb7f5adc8390553f303e6a40ac +size 27478103 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ab9528119d26b88dd801bd2e2be94af4d8456d6c --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b9324e4f0a112fd4c74db17e3caae0bf2d7457e6f74de1b6acfdf4d91f63465 +size 27478359 diff --git a/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0865c16ae495de7e242588cbb67da7623594b9b --- /dev/null +++ b/146m14b100mdedup/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b1509cd1841a04eb9b14c0d8cf9d319987f8b87f1d793ada9cdd707b0b014bb +size 27478167 diff --git a/146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7907733c0817b60270cb2fa3673bdcf4e531423c --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:68fae34a4c6a4e84479415bff8f2b379d0a8b8cdde7ae322d3ca814addfec9ee +size 80413955 diff --git a/146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fd114b59fded0516f1a14f45a4038d2eada48d0f --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5bff9dd56fd53d5252ff547571535eb00f95b55e9f401a95212c936d1de73bb9 +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b1586c019dd9471882a9abd1417800b61c64c94c --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c1b0e1a0928e2b953eef816e84f2a7d6ba7aa8c31a0a2541283c474c0b0b7f56 +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7aff0829d0f53eaafbc1264bd4947b11d0ee851a --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:166b8a7a9a873fe9752173a18d371d2bc680c464c753dc304dc74fd6a040c90e +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0011df8aaecec3242a5566bd41393cef352bea6b --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:79bbed445b5a3b4cbc1745778df21e799fb1ed7a5669f8dc96b47499209ad119 +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dad6b55814bc239c6e0f291935b493f87d412eb1 --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b6a5c49a884cc85a01e89ff7d1c8b84a5ece2631e5a29e0722846b13c057cd8 +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..67df9618dd5f0de3f69827e15c31d9193d66093a --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce7f311a2b8362e60633f70603c736f4d7aeb4f81c28270f17f2052d7577997a +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..07bf2fe714838e7a9593abb4c1b96b695f74a682 --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e8121c89d645ce238dad755dce53c574daae6648bf0c69cb9bc8b65eb221485c +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..308a6e851138aa881c3d21cae25389663b5e76ef --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e8e668618b336b55a72d4a53b8cf9a1b810f3c6e3028805e80a28b6a1eb0d7c +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f70c842c8921ddab878bf440b6320d45867f989 --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dec51afcf1a013e95d5fbe2034a716efda4048c31bfa363495a28dd836b1f057 +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d74fb34bec17fc40c7e13268f0664ae07ad2b2ac --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab6f36d553a5940017915d630510142ddf23cdfbc31c4f2ed546f3c402657b70 +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a16a3f414a23440228c78a4a39dd1bdab5c21289 --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cd58d68af60b2346ee3b36da7ca1cd47d08c54da1875d85b92179da07ba6870f +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..04390a13a21b4eb01e4f38aa98639e0add4e7cc7 --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2465abfb2e5f9189c20c7999b0050750f4fedac3265cc863d0c19c859b44abed +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..065545dc2e57cf3e6cee2ad51a9f59ddb3e44e6e --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:289f3a71f0ccdb5af1e03f7a4b96a59db97c97eb27904af5761294b59301abee +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..78ad16f62bca7b1bcf9c1d1ac15ab7120e9163f0 --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b7eca18af2002b89186ef8f8951a42134c743f21f70dbadb465b443e99accd66 +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb9ea3773d5957a2a27362ac31d7c2a5a0dc3fb9 --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a6f537bdf38583caf698004324eea9d239ba047dd47bda192f8ca3f99185e852 +size 14180099 diff --git a/146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt b/146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0069f56ed57b76a033de92c0e4d95a14f9fd259f --- /dev/null +++ b/146m14b100mdedup/global_step21553/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:592a1688c946cd32f57d2f7fcf1ca71207ce1cf934c6a078065fb6d9cfa8143f +size 4291 diff --git a/146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt b/146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bba5954a9ac091faac149ba713f260da625cf7b7 --- /dev/null +++ b/146m14b100mdedup/global_step21553/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a50e0b92ebe78e69924faa340e7b06bf2b85a7f29228a8d2df448006a8b84a0 +size 35443 diff --git a/146m14b100mdedup/sbatch_146m14b100mdedup.sh b/146m14b100mdedup/sbatch_146m14b100mdedup.sh new file mode 100644 index 0000000000000000000000000000000000000000..c3e93a43e7e4131e854a9ee41f20d07e2e883dc1 --- /dev/null +++ b/146m14b100mdedup/sbatch_146m14b100mdedup.sh @@ -0,0 +1,162 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m14b100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100mdedup.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=5_517_578 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 55_176 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m14b100mdedup/sbatch_146m14b100mdedupval.sh b/146m14b100mdedup/sbatch_146m14b100mdedupval.sh new file mode 100644 index 0000000000000000000000000000000000000000..1fde1fee3ea6d423ff09f8f32c4e3a6719252633 --- /dev/null +++ b/146m14b100mdedup/sbatch_146m14b100mdedupval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m14b100mdedupval +VARIANT_CKPT=146m14b100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m14b100mdedup/tensorboard_146m14b100mdedup/events.out.tfevents.1679003674.nid005161.71567.0 b/146m14b100mdedup/tensorboard_146m14b100mdedup/events.out.tfevents.1679003674.nid005161.71567.0 new file mode 100644 index 0000000000000000000000000000000000000000..c0a29572940ab8ab72e2c1b9f354a9eec1c720ed --- /dev/null +++ b/146m14b100mdedup/tensorboard_146m14b100mdedup/events.out.tfevents.1679003674.nid005161.71567.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2898958ab9289030db63e2a079b33b3ece481e66d8a200964ff48d77dabecab9 +size 38441548 diff --git a/146m14b100mdedup/tensorboard_146m14b100mdedupval/events.out.tfevents.1679041633.nid005365.78898.0 b/146m14b100mdedup/tensorboard_146m14b100mdedupval/events.out.tfevents.1679041633.nid005365.78898.0 new file mode 100644 index 0000000000000000000000000000000000000000..601777db759a3ba0dfd86942fef8c1eb075585e2 --- /dev/null +++ b/146m14b100mdedup/tensorboard_146m14b100mdedupval/events.out.tfevents.1679041633.nid005365.78898.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21e4372a3356221ad855064c3b515cb9961068023004be429d7a4565bf61949f +size 40 diff --git a/146m14b100mdedup/tensorboard_146m14b100mdedupval/events.out.tfevents.1679042738.nid006724.87772.0 b/146m14b100mdedup/tensorboard_146m14b100mdedupval/events.out.tfevents.1679042738.nid006724.87772.0 new file mode 100644 index 0000000000000000000000000000000000000000..84be1a1653f1e30050651b828ddd0463d387d31a --- /dev/null +++ b/146m14b100mdedup/tensorboard_146m14b100mdedupval/events.out.tfevents.1679042738.nid006724.87772.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:186ac7765675929c0ad68e2ff5adabde571e3f73c5dc4f748fbd2e92bf80e2de +size 980 diff --git a/146m14b400m/3319354.err b/146m14b400m/3319354.err new file mode 100644 index 0000000000000000000000000000000000000000..0fc02f7a89a4195543c7f9d6d84fa2b2e621475e --- /dev/null +++ b/146m14b400m/3319354.err @@ -0,0 +1,1100 @@ +6: 2023-03-16 09:01:53.490178: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.490180: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.490198: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: 2023-03-16 09:01:53.490062: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490067: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490089: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: 2023-03-16 09:01:53.490090: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490105: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490103: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.490157: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.490166: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.490222: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.490236: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.490244: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: 2023-03-16 09:01:53.490106: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490115: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490114: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: 2023-03-16 09:01:53.490664: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.490684: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490146: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.490188: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.490191: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.490183: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.490249: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.490722: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490154: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490167: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490177: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.490257: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.490755: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490180: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.490212: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.490718: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490195: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.490252: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.490257: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.490775: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.490778: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.490790: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.491297: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.491318: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.491328: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.491342: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.491356: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.491357: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.491370: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.491389: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491323: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491333: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.491286: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491351: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491364: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491377: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491378: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.491326: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491375: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491379: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.491332: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.491374: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.491381: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.491384: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.491402: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.491431: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:02:08.598741: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.598770: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.599305: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:08.598802: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 09:02:08.599003: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.598813: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.599321: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:08.598836: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.599334: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.598849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.598856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 09:02:08.599031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.599341: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.598858: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 09:02:08.599477: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.599049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 09:02:08.599351: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:08.599362: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:08.599370: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:08.599376: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.599078: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.599500: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.599097: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.599087: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.599526: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.599531: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.599112: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.599116: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.599574: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.599585: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.599602: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.599607: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.599822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599892: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599910: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599936: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599939: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599948: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.600706: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600705: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600749: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600750: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600759: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600779: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600787: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600793: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.600889: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 09:02:08.601108: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:08.600917: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 09:02:08.601156: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.601501: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.600942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 09:02:08.601525: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.601179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 09:02:08.600963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.601192: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 09:02:08.600977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.601295: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 09:02:08.600980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.601208: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 09:02:08.600997: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.601206: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 09:02:08.601004: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.601323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.601577: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.601582: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.601144: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 09:02:08.601629: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.601600: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601660: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.601606: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.601609: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.601517: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 09:02:08.601682: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601695: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601703: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.601626: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601718: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601730: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601782: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.601353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.601398: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.601424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.601418: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.601427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.601549: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.601956: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.601982: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.601589: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.601819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.601820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.601584: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.601448: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.602030: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.601601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.601449: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.601582: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.602225: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 09:02:08.602260: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 09:02:08.602320: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 09:02:08.602318: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.602039: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 09:02:08.602343: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.602055: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.602064: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.602084: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.602097: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 09:02:08.602351: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 09:02:08.602372: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 09:02:08.602407: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.601935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.601969: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.601986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.602013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.602012: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.602034: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.602050: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.602059: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.602689: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.602722: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.602744: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.602758: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.602750: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.602767: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.602792: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.602840: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:40.340645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.340670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.340694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.340708: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.340723: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.340748: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.340752: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.340772: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.343840: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.343841: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.343838: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.343843: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.343841: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.343842: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.343848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.343849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.343856: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.343856: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.343861: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.343864: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.343864: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.343865: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.343867: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.343869: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.366721: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.366746: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366889: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.366769: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366903: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.366782: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.366991: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366921: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.366789: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.366929: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 09:02:40.366799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.367023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366943: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366950: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 09:02:40.366814: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.367183: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366964: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.367039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.366957: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 09:02:40.366836: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.367221: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.367077: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.366976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.367235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366979: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.367082: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.366973: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.367255: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.367006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.367090: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366978: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.367032: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.367108: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.367118: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 09:02:40.367525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.367528: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.367532: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.367546: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.367349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.367382: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.367392: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.367410: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.367420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.367440: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.367440: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.367455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.367943: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.367961: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.367972: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.368008: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.368008: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.368011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.368023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.368032: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.369980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 09:02:40.369853: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369857: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369858: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 09:02:40.369982: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.369931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 09:02:40.369856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 09:02:40.369984: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 09:02:40.369986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 09:02:40.369985: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 09:02:40.369934: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369862: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 09:02:40.369986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 09:02:40.369938: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369869: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369870: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369875: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369877: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369878: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.369989: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 09:02:40.369934: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 09:02:40.369877: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369881: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369882: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.369991: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 09:02:40.369934: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.369997: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.369997: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.369997: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.370008: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.370007: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.369938: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 09:02:40.370006: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.370010: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.370010: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.369948: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.369949: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.369941: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.369945: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.369957: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.369962: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.369961: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.369962: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.369967: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.369968: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370316: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370316: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.370481: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.370603: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370324: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.370482: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.370619: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 09:02:40.370613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.370485: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370335: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370336: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370336: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370337: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370338: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.370614: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.370483: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 09:02:40.370339: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370340: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370342: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.370618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.370489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.370618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.370491: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.370621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.370491: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.370637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.370639: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.370641: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.370643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.370643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370494: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 09:02:40.370684: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.370499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.370501: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370506: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370507: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.370690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.370508: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370510: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370509: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.370704: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.370708: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370046: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370062: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370057: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370057: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370069: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370061: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370060: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370065: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370079: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370082: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370083: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370085: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370087: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370087: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Loading extension module scaled_masked_softmax_cuda... +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: +3: +3: +3: +3: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +5: +5: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +7: Building extension module utils... +7: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +7: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +6: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +7: Building extension module utils... +7: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +7: Loading extension module utils... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +0: +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: +7: Loading extension module utils...Loading extension module utils...Loading extension module utils... +7: +7: +6: No modifications detected for re-loaded extension module utils, skipping build step... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils...Loading extension module utils... +3: +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils...Loading extension module utils... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils...Loading extension module utils... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m14b400m/3319354.out b/146m14b400m/3319354.out new file mode 100644 index 0000000000000000000000000000000000000000..3a8160a52c7575928a7c172fbacce0685d27c108 --- /dev/null +++ b/146m14b400m/3319354.out @@ -0,0 +1,5629 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m14b400mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m14b400mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m14b400m --load checkpoints_146m14b400m --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3319354.json --zero-stage 0 +START 3319354: Thu 16 Mar 2023 09:01:25 AM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 46.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 45.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 40.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 44.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 42.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 47.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 48.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 44.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 45.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 39.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 46.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 44.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 50.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 55.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 46.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 43.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 44.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 42.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 42.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 44.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 40.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +5: Launching on nid006615 (5/8), master nid006610 port 9999, GPUs 8, CUDA: True +7: Launching on nid006617 (7/8), master nid006610 port 9999, GPUs 8, CUDA: True +6: Launching on nid006616 (6/8), master nid006610 port 9999, GPUs 8, CUDA: True +2: Launching on nid006612 (2/8), master nid006610 port 9999, GPUs 8, CUDA: True +4: Launching on nid006614 (4/8), master nid006610 port 9999, GPUs 8, CUDA: True +3: Launching on nid006613 (3/8), master nid006610 port 9999, GPUs 8, CUDA: True +1: Launching on nid006611 (1/8), master nid006610 port 9999, GPUs 8, CUDA: True +0: Launching on nid006610 (0/8), master nid006610 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3319354.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m14b400mval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m14b400m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m14b400m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m14b400mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 09:03:56,717] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.097 seconds +0: > compiling and loading fused kernels ... +0: >>> done with compiling and loading fused kernels. Compilation time: 27.810 seconds +0: time to initialize megatron (seconds): -4.511 +0: [after megatron is initialized] datetime: 2023-03-16 09:04:27 +0: building GPT model ... +0: [2023-03-16 09:04:27,626] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 09:04:27,627] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 09:04:27,627] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.88 GB, percent = 6.1% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-16 09:04:29,606] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 09:04:29,952] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 09:04:29,952] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-16 09:04:29,952] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.9 GB, percent = 6.1% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 09:04:29,954] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 09:04:42,478] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 09:04:42,479] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 09:04:42,479] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 09:04:42,483] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 09:04:42,483] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 09:04:42,604] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 09:04:42,605] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-16 09:04:42,605] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.58 GB, percent = 6.3% +7: ninja: no work to do. +0: Time to load utils op: 0.30802059173583984 seconds +0: Time to load utils op: 0.3081176280975342 seconds +0: Time to load utils op: 0.3089179992675781 seconds +0: Time to load utils op: 0.30828237533569336 secondsTime to load utils op: 0.3085598945617676 seconds +0: +0: Time to load utils op: 0.30814051628112793 seconds +0: Time to load utils op: 0.308086633682251 seconds +2: Time to load utils op: 0.3036074638366699 seconds +2: Time to load utils op: 0.3036081790924072 seconds +2: Time to load utils op: 0.3036470413208008 seconds +2: Time to load utils op: 0.30365824699401855 seconds +2: Time to load utils op: 0.3036634922027588 seconds +2: Time to load utils op: 0.3036801815032959 secondsTime to load utils op: 0.3036689758300781 seconds +2: +2: Time to load utils op: 0.3036842346191406 seconds +3: Time to load utils op: 0.3033747673034668 seconds +3: Time to load utils op: 0.3033885955810547 secondsTime to load utils op: 0.30330801010131836 seconds +3: +3: Time to load utils op: 0.30341506004333496 seconds +3: Time to load utils op: 0.3034226894378662 seconds +3: Time to load utils op: 0.3034486770629883 secondsTime to load utils op: 0.3034532070159912 secondsTime to load utils op: 0.30346035957336426 seconds +3: +3: +7: Time to load utils op: 0.30356860160827637 seconds +7: Time to load utils op: 0.3035738468170166 seconds +7: Time to load utils op: 0.3035733699798584 seconds +7: Time to load utils op: 0.303635835647583 seconds +7: Time to load utils op: 0.3299846649169922 seconds +6: Time to load utils op: 0.30176424980163574 seconds +6: Time to load utils op: 0.3017618656158447 seconds +6: Time to load utils op: 0.3018069267272949 seconds +6: Time to load utils op: 0.301800012588501 seconds +6: Time to load utils op: 0.30182313919067383 secondsTime to load utils op: 0.30182480812072754 seconds +6: +6: Time to load utils op: 0.30182600021362305 secondsTime to load utils op: 0.301830530166626 seconds +6: +4: Time to load utils op: 0.30338263511657715 secondsTime to load utils op: 0.30338072776794434 seconds +4: +4: Time to load utils op: 0.3034048080444336 seconds +4: Time to load utils op: 0.303419828414917 secondsTime to load utils op: 0.30342864990234375 secondsTime to load utils op: 0.30342984199523926 seconds +4: +4: +4: Time to load utils op: 0.30343031883239746 seconds +4: Time to load utils op: 0.30342793464660645 seconds +1: Time to load utils op: 0.30741357803344727 secondsTime to load utils op: 0.3074195384979248 seconds +1: Time to load utils op: 0.3074209690093994 seconds +1: Time to load utils op: 0.30742335319519043 seconds +1: +1: Time to load utils op: 0.30742716789245605 secondsTime to load utils op: 0.30742764472961426 seconds +1: +1: Time to load utils op: 0.30743861198425293 seconds +1: Time to load utils op: 0.30745840072631836 seconds +5: Time to load utils op: 0.3039872646331787 secondsTime to load utils op: 0.30398011207580566 seconds +5: +5: Time to load utils op: 0.3040013313293457 seconds +5: Time to load utils op: 0.304027795791626 seconds +5: Time to load utils op: 0.3040320873260498 secondsTime to load utils op: 0.30403995513916016 seconds +5: +5: Time to load utils op: 0.3040473461151123 seconds +5: Time to load utils op: 0.3040628433227539 seconds +7: ninja: no work to do. +7: Time to load utils op: 0.14210247993469238 seconds +0: Time to load utils op: 0.302992582321167 seconds +0: Time to load utils op: 0.000522613525390625 seconds +0: Time to load utils op: 0.0006225109100341797 seconds +0: Time to load utils op: 0.0006394386291503906 seconds +0: Time to load utils op: 0.0006079673767089844 seconds +0: Time to load utils op: 0.0004546642303466797 seconds +0: Time to load utils op: 0.0006277561187744141 seconds +0: Time to load utils op: 0.0006072521209716797 seconds +1: Time to load utils op: 0.0008332729339599609 seconds +1: Time to load utils op: 0.0009572505950927734 seconds +6: Time to load utils op: 0.0008471012115478516 seconds +1: Time to load utils op: 0.0010542869567871094 seconds +1: Time to load utils op: 0.0011491775512695312 seconds +1: Time to load utils op: 0.00118255615234375 secondsTime to load utils op: 0.0011227130889892578 seconds +1: +1: Time to load utils op: 0.001130819320678711 seconds +1: Time to load utils op: 0.0012867450714111328 seconds +6: Time to load utils op: 0.0012409687042236328 seconds +6: Time to load utils op: 0.0014278888702392578 seconds +7: Time to load utils op: 0.0004291534423828125 seconds +6: Time to load utils op: 0.0015244483947753906 seconds +6: Time to load utils op: 0.0012862682342529297 seconds +6: Time to load utils op: 0.0012538433074951172 seconds +7: Time to load utils op: 0.0005431175231933594 secondsTime to load utils op: 0.0004520416259765625 seconds +7: +7: Time to load utils op: 0.00042319297790527344 seconds +6: Time to load utils op: 0.001432657241821289 seconds +6: Time to load utils op: 0.001577138900756836 seconds +7: Time to load utils op: 0.0005168914794921875 seconds +7: Time to load utils op: 0.0005064010620117188 seconds +2: Time to load utils op: 0.000982522964477539 seconds +3: Time to load utils op: 0.0006117820739746094 seconds +2: Time to load utils op: 0.0011518001556396484 seconds +2: Time to load utils op: 0.0013737678527832031 seconds +2: Time to load utils op: 0.001386404037475586 seconds +2: Time to load utils op: 0.0013396739959716797 seconds +2: Time to load utils op: 0.0014042854309082031 seconds +2: Time to load utils op: 0.0013642311096191406 seconds +3: Time to load utils op: 0.0009622573852539062 seconds +2: Time to load utils op: 0.001447916030883789 seconds +3: Time to load utils op: 0.0012173652648925781 seconds +3: Time to load utils op: 0.0012669563293457031 seconds +3: Time to load utils op: 0.0013380050659179688 seconds +3: Time to load utils op: 0.0012657642364501953 secondsTime to load utils op: 0.001226186752319336 seconds +3: +3: Time to load utils op: 0.0012691020965576172 seconds +5: Time to load utils op: 0.0011363029479980469 seconds +5: Time to load utils op: 0.0010638236999511719 seconds +5: Time to load utils op: 0.001239776611328125 seconds +5: Time to load utils op: 0.0012023448944091797 seconds +5: Time to load utils op: 0.0011963844299316406 seconds +5: Time to load utils op: 0.0012655258178710938 seconds +5: Time to load utils op: 0.0012989044189453125 seconds +4: Time to load utils op: 0.0009095668792724609 seconds +5: Time to load utils op: 0.0012733936309814453 seconds +4: Time to load utils op: 0.0009241104125976562 seconds +4: Time to load utils op: 0.0006704330444335938 seconds +4: Time to load utils op: 0.0010542869567871094 seconds +4: Time to load utils op: 0.0010192394256591797 seconds +4: Time to load utils op: 0.0007421970367431641 seconds +4: Time to load utils op: 0.0008008480072021484 seconds +4: Time to load utils op: 0.0012049674987792969 seconds +7: Time to load utils op: 0.20247888565063477 seconds +7: Time to load utils op: 0.20233464241027832 seconds +7: Time to load utils op: 0.00040650367736816406 seconds +7: Time to load utils op: 0.0004048347473144531 seconds +0: [2023-03-16 09:04:43,030] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 09:04:43,031] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-16 09:04:43,031] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.73 GB, percent = 6.3% +0: [2023-03-16 09:04:43,145] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 09:04:43,146] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-16 09:04:43,146] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.73 GB, percent = 6.3% +0: [2023-03-16 09:04:43,247] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 09:04:43,248] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-16 09:04:43,248] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.73 GB, percent = 6.3% +0: [2023-03-16 09:04:43,351] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 09:04:43,352] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 09:04:43,352] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.73 GB, percent = 6.3% +0: [2023-03-16 09:04:43,452] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 09:04:43,453] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 09:04:43,453] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.73 GB, percent = 6.3% +0: [2023-03-16 09:04:43,555] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 09:04:43,556] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 09:04:43,556] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.73 GB, percent = 6.3% +0: [2023-03-16 09:04:43,656] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 09:04:43,657] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 09:04:43,657] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.73 GB, percent = 6.3% +0: [2023-03-16 09:04:43,762] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 09:04:43,763] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 09:04:43,763] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.73 GB, percent = 6.3% +0: [2023-03-16 09:04:43,862] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 09:04:43,863] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 09:04:43,863] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.73 GB, percent = 6.3% +0: [2023-03-16 09:04:43,863] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 09:04:43,863] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 09:04:43,863] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 09:04:43,863] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 09:04:43,863] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 09:04:43,864] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 09:04:43,865] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-16 09:04:43,866] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 09:04:43,866] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 09:04:43,866] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 09:04:43,866] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 09:04:43,866] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0006439685821533203 seconds +0: [2023-03-16 09:04:43,866] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-16 09:04:43,918] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +5: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:43,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:43,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:43,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:43,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:43,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:43,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:43,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:44,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:44,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:44,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:44,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:44,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:44,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:44,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:44,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:44,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:44,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:44,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:44,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:44,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:44,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:44,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:44,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:44,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:44,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:44,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:44,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:44,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:44,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:44,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:44,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:44,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:44,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:44,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:44,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:44,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:44,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:44,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:44,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:44,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:44,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:44,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:44,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:44,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:44,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:44,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:44,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:44,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:44,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:44,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:44,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:44,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:45,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:45,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:45,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:45,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:45,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:45,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:45,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:45,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:45,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:45,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:45,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:45,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:45,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:45,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:45,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:45,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:45,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:45,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:45,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:45,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:45,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:45,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:45,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:45,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:45,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:45,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:45,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:45,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:45,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +3: [2023-03-16 09:04:45,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:45,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:45,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:45,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:45,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:45,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:45,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:45,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:45,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:45,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:45,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:45,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:45,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:45,657] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +6: [2023-03-16 09:04:45,658] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +2: [2023-03-16 09:04:45,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,662] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +7: [2023-03-16 09:04:45,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:45,663] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +1: [2023-03-16 09:04:45,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,664] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +1: [2023-03-16 09:04:45,664] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +7: [2023-03-16 09:04:45,664] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +1: [2023-03-16 09:04:45,666] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +0: [2023-03-16 09:04:45,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:45,672] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +3: [2023-03-16 09:04:45,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:45,673] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +0: [2023-03-16 09:04:45,673] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +3: [2023-03-16 09:04:45,675] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +6: [2023-03-16 09:04:45,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:45,679] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +6: [2023-03-16 09:04:45,680] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +4: [2023-03-16 09:04:45,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:45,681] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +5: [2023-03-16 09:04:45,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:45,681] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +5: [2023-03-16 09:04:45,683] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +4: [2023-03-16 09:04:45,683] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +1: [2023-03-16 09:04:45,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:45,687] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +1: [2023-03-16 09:04:45,689] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +5: [2023-03-16 09:04:45,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:45,697] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +5: [2023-03-16 09:04:45,699] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +6: [2023-03-16 09:04:45,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:45,700] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +6: [2023-03-16 09:04:45,701] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +7: [2023-03-16 09:04:45,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:45,704] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +7: [2023-03-16 09:04:45,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:45,705] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +7: [2023-03-16 09:04:45,705] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +7: [2023-03-16 09:04:45,706] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +0: [2023-03-16 09:04:45,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:45,707] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +2: [2023-03-16 09:04:45,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,709] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +0: [2023-03-16 09:04:45,709] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +2: [2023-03-16 09:04:45,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,709] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +2: [2023-03-16 09:04:45,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,710] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +4: [2023-03-16 09:04:45,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,710] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +4: [2023-03-16 09:04:45,711] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +7: [2023-03-16 09:04:45,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:45,711] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +2: [2023-03-16 09:04:45,711] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +2: [2023-03-16 09:04:45,712] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +4: [2023-03-16 09:04:45,712] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +7: [2023-03-16 09:04:45,713] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +1: [2023-03-16 09:04:45,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:45,714] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +1: [2023-03-16 09:04:45,715] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +0: [2023-03-16 09:04:45,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:45,715] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +0: [2023-03-16 09:04:45,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:45,716] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +6: [2023-03-16 09:04:45,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:45,716] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +3: [2023-03-16 09:04:45,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:45,717] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +0: [2023-03-16 09:04:45,717] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +0: [2023-03-16 09:04:45,718] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +3: [2023-03-16 09:04:45,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:45,718] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +3: [2023-03-16 09:04:45,718] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +3: [2023-03-16 09:04:45,718] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +3: [2023-03-16 09:04:45,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:45,719] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +3: [2023-03-16 09:04:45,720] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +3: [2023-03-16 09:04:45,720] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +5: [2023-03-16 09:04:45,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:45,721] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +5: [2023-03-16 09:04:45,722] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +6: [2023-03-16 09:04:45,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:45,723] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +6: [2023-03-16 09:04:45,724] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +4: [2023-03-16 09:04:45,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:45,726] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +4: [2023-03-16 09:04:45,728] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +7: [2023-03-16 09:04:45,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:45,735] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +7: [2023-03-16 09:04:45,736] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +5: [2023-03-16 09:04:45,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:45,737] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +4: [2023-03-16 09:04:45,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:45,737] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +6: [2023-03-16 09:04:45,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:45,737] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +5: [2023-03-16 09:04:45,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:45,738] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +5: [2023-03-16 09:04:45,738] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +2: [2023-03-16 09:04:45,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,739] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +6: [2023-03-16 09:04:45,739] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +4: [2023-03-16 09:04:45,740] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +5: [2023-03-16 09:04:45,740] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +2: [2023-03-16 09:04:45,740] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +7: [2023-03-16 09:04:45,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:45,741] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +1: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:45,742] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +7: [2023-03-16 09:04:45,743] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +1: [2023-03-16 09:04:45,743] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +4: [2023-03-16 09:04:45,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:45,745] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +1: [2023-03-16 09:04:45,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:45,745] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +4: [2023-03-16 09:04:45,746] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +1: [2023-03-16 09:04:45,747] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +1: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:45,747] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +7: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:45,747] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +1: [2023-03-16 09:04:45,749] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +7: [2023-03-16 09:04:45,749] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +0: [2023-03-16 09:04:45,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:45,750] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +4: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:45,752] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +4: [2023-03-16 09:04:45,752] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +0: [2023-03-16 09:04:45,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:45,752] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +6: [2023-03-16 09:04:45,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:45,752] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +6: [2023-03-16 09:04:45,752] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +4: [2023-03-16 09:04:45,753] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +4: [2023-03-16 09:04:45,753] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +6: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:45,753] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +0: [2023-03-16 09:04:45,754] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +6: [2023-03-16 09:04:45,754] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +6: [2023-03-16 09:04:45,755] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +2: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,756] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +2: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,757] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +4: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:45,757] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +5: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:45,757] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +2: [2023-03-16 09:04:45,758] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +2: [2023-03-16 09:04:45,759] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +4: [2023-03-16 09:04:45,759] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +0: [2023-03-16 09:04:45,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:45,759] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +0: [2023-03-16 09:04:45,759] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +0: [2023-03-16 09:04:45,761] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +5: [2023-03-16 09:04:45,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:45,760] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +5: [2023-03-16 09:04:45,762] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +1: [2023-03-16 09:04:45,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:45,762] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +1: [2023-03-16 09:04:45,764] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +2: [2023-03-16 09:04:45,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:45,765] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +2: [2023-03-16 09:04:45,767] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +7: [2023-03-16 09:04:45,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:45,768] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +3: [2023-03-16 09:04:45,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:45,768] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +5: [2023-03-16 09:04:45,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:45,768] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +7: [2023-03-16 09:04:45,769] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +5: [2023-03-16 09:04:45,769] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +0: [2023-03-16 09:04:45,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:45,769] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +0: [2023-03-16 09:04:45,770] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +0: [2023-03-16 09:04:45,771] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +3: [2023-03-16 09:04:45,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:45,777] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +3: [2023-03-16 09:04:45,778] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +1: [2023-03-16 09:04:45,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:45,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:45,779] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +3: [2023-03-16 09:04:45,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m14b400m/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:45,779] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +3: [2023-03-16 09:04:45,779] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +1: [2023-03-16 09:04:45,780] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +3: [2023-03-16 09:04:45,781] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +3: [2023-03-16 09:04:45,781] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +0: successfully loaded checkpoint from checkpoints_146m14b400m at iteration 0 +7: time (ms) | load-checkpoint: 1872.69 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 09:04:46 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.032060 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.089 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.036994 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.010 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-16 09:05:02 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 19048.47 | train/valid/test-data-iterators-setup: 15564.23 +0: [after training is done] datetime: 2023-03-16 09:05:02 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.390927E+00 | lm loss PPL: 2.969347E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3319354: Thu 16 Mar 2023 09:05:21 AM EET diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..518274d53822cf5e7149d3465d3f5fced8c4d3b8 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:98a0a3f2a5784cccc55f470f97f7986f1ec9ded9709ea6bba259c8a0ea9a3600 +size 27478295 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8ddc9462500dd67d557bda1c61f515bcc9c2d784 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7770c7ce3c251c0a02164bc6c7828cd7e12c6cf6ed4c88d837b6e7e1f899ea92 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0d51fa5437f1b269db16a4fd3dff0d928d5878d1 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6854a7b8e8587e414d09554cb42b815b2c3c44260f9adfb4ff25f4d1d33c942a +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1c11f5a693b1ad23e4067cddb4f25101d40c2ec4 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6edc3c7fbe1f4081820310beba4409b0bc40cb85ef7bf47603fc74399335a494 +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f9dc1b59f714afd55a8bce6bb9b65294ea921f1b --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:088faa9e83642d57e403a1e2cb91514c53ec52602ba14d1d6d746ee910a16891 +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dd0e465bbc86df1f28fb37fa6a684a0508b66c44 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c129a134abe7f0f9cf62f6d799b103a5448d2cdbc7f550e6667af78462c6d8c9 +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4d659218d220dc853a16ff0ec10d3839f4c932c7 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:afe0627022ebf7183ebccbc617d5620b1ffc47d8b5091b145623fa82d8b388aa +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..831a37c76d13d6007163630e140e0ce722e5741e --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa2aa929821452db5ba51247b1748c5df2d9cc047fd16227a20904440973b5ab +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a6d5bc538a8c5c7997887028bd688ea11644ac7f --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56564e2a500cb1ee24b810b4abdf797c480c0b6d8ab441075de66dc008fc4749 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b23432a28937c87255d48b9c953faf91c5ad1e4 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9028e4577cd836f2cbe0a186f0c7203b5e76b8d8189dd402f7a642d1b30368b +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1973feb630b2bfefd1769f8463a352b12196c10 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8dd8c3830c45c6d9643de80cd3a704d8eaae2b46b70e4e549f5838e513f7f4e9 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..792bc0beccf42ee39e78c62829413251e3e85ff1 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc66b9ebebd846f951160cae61832a6ab599d278322c44757095d36233b3c4e6 +size 27478231 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5fe4c74909364abcc287bc65f3941ef641f11260 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7550e8d88aa39426c2cb6f300b80b731930bc7eaa10fccb534898cc07232c29b +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..480c5a7cd341fa5cac513ce11abe471147ebabbb --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a8ec73a601a8cbcb9fc3e812cd48ee3f960466b8fd9ac009f639aec5933cea5d +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5f39b5246e9b8e20759ddefeb63cb9c09d0c8d90 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17f1756e2294175eca5029b3bb0a84a7729a804b489b6e4bb85de73c10a8b69c +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4201ff697784e855d2afcc1cbb7886e4d492f671 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb575ebebb296c56af7fe7c27efc04f26ab1f8d6b73f7253311d2aae6a2c92e1 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7057294b7ecc7dd41d6518c08b708ee86c6c8e2b --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5e45069d5da7c26e46214e38bbc303e871e7f57b8b901aaff6aaaca9b4d9889 +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f7a7ec37412b432b3d417190274bb05a6fd26571 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b68f7d8590b229752f8d9faa4e6777e457f2721a3a5c3016d0270bb6c543a08 +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..39419d40badb025ece077070d9c393a0fcb6d5a3 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:121b7d67f59867813fc4f47c64066b8004606fd1cd57e1e618924c0ed648c354 +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a62e368d48dc855c06e1efa5686d48bc1ce1cd2 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9eaa427fffbafdac31387cd94b904f535d1242bf77fd42f4e5e4cf7413e207e7 +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bc9ecd370f2bfca5f97c13066905d22f7ed0c894 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:930f6d72bde0427c36598d23c6867805e48ab5acd0f9ff2daa7e03cc058296ad +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..afc50f424dfa186c9aa31899042d1acf18c5369b --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02c803c11d5cdf815ea61758a8db0616a8089f2ba45cd877c45c35b057a6b165 +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e290684f29924df3488be10addf582c540fbfa79 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47403e90cb0976cc70a27b23acb10f0f75a46ec6afd4cea75bd3616a79182ec4 +size 27478231 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d8c4480f5088bf32d96af2711790f120c7bac78 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b70e430b4a742919aef7c930da5f0c79f76fcd9d8a78a368354ed079a2e240f0 +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..be7a4a69f8b01bab2d799829b5849efdb7de4259 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:94622e45eba29a3046496f57c12a4e0f6e6d908e6ce5777606c84b1193e85f13 +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bfe8a34ed6c3436b9f9f0df29fd593fa07cdbdeb --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c3dcc574d351bfc2164a29aff87d4ae38e806133b6259618a24440895ef419a +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..10eb06eb1c4c3b77e7276937aa3754253dad6a3b --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:328911eca75a3c1f9c1d5f2189db471dc34b4770c5b1e196883cf250de653531 +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9df7860a7b8189389b65ed46a1209048145b4402 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3b6db1ba5b364eb7dc6be74db482c7293fa5b35b5c4ec53ebc252dcb29e5835b +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f25626891f739eb14f9f311d265155c2344a906 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5938fe6f7cb4e00f4f557c7530593772bf786523819fe3e3e5b778f54f1e9cdd +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80c4b83cd38ab13815725a398cda467ec0aa66fd --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3489b3a5d1ac34e404b9fdd7df94111b937a9f4336b9103a38e1dba705b84827 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d91b5bf6d4ed05bf87a80b92045b8bc4ac4e7c79 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1785b6f2e6f5e3ec1fc21b903e4b726775b180bf54fa19ad410d45ac3b5dda03 +size 27478114 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..48558b58fb8af472167b6bec31eb46f40209de31 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9cbe36f16610340cc14e7834af6b1f1cbd9f9e4a886a749e6d86144d36d388e0 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3dc6bf8c013c0c56259d80f465773bdd9483971d --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c928701c184afe02775ec79bb1429477b4d03f048776db2ed287f6f1bcc658ed +size 27478434 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..14f2388776270a37184c3b6469ace6a498a1915c --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1595e80cb8f41c39075ad92124d698a296623086572d442ffd939703135875d2 +size 27478167 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e330957e74655a8c4d8159944ffe6028512337d5 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4cbd17610cd0cae17a817cd77a48841ff653269472673304e46d601c09ec8bac +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3181b489631fbef2346b9b1efe6032be161febd5 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:16b4bfcb4992939275310fb74576e5dc89966b84630b7d12c8c380fb9b190e91 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..67202ca192592300ac6ebfba48251a3a01a5b1c2 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:202ad1b66b5d2297332970106a8b568dcad95e3ab69db5897a2e8ad78c2c028a +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7bf849501dd8f528beaacd23503b5c49a3165257 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92dc4a828be7b73642c1a6e80dee0db7e818d11d12f5f8368ad4c4c35aa4aa43 +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9ab40e9f34b385825631668e857573afc47c2667 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ee6aacf1f1991672bd034b07dc94e11d6ea3ead5cd75052c5fecb826f603d8f +size 27478434 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b8890425583db65285527da30d33d98f05ace52d --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d285f7c96c6395189048f5bb78145405549b296773fb81369f6e4892db2eb850 +size 27478050 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..67644f828b79ce98bf739b1fd951f60e6d08e8a5 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:079b4d9edb3c8717be201cf9d4e913049a25761e4e662c7475e3d7ada757fd18 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..46b5ef230921248e5363f23f0f5aa8a3b0371b89 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f085a8e3351b03610dcf1912196c71502d91388eca75ea749621e668509ee94c +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee82c60a248aa9e1fb16e774e4b568ac63744124 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e6ce50f0f3762aeb392f60d124601e28466d8a5a7deb7b6ec33c0dc202ea3ef1 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..306d8d99701e9826d8305753df36e4834cfd93da --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8764e828b7efb630c749a2547b5a505fb75581b65f2888fc7d6057b3dd449648 +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a6b13f4a758abb39706df838e6ada5227dee28d0 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d381106c135eda089f65cacbf2f3d373df3234ff02106e3444e487d876b4281a +size 27478231 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..673855764f9cb0f6dcebbbc936fea5dcd012126a --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9ad8dbfe5f41b512737c3fb6810f8ac63fee5031e4b2179e7b602b3b5305a73 +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f0207278debf81bea76b13874cbb8c6c16a24d8c --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1338e5df2ab351aee472afa094760ccefa9aed668e211aa7fdfcfd31557e6e40 +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8ebbfb64b7c48d79deaf847ed07ab1b415d86e28 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:29051b97b8855beaf269ec469bcafc31d1d19688e893884701620a7dcf22091d +size 27478434 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d09ed98ae11b071f337026c6906328aa9dc2196 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3ec89f5df27eeabc1c890c3f7618c3210fa6c1e6a2ebf675dc9f02f7abe3bff7 +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..144265af07c6dc186806126557e0b4520797fb8b --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed7cb9c7a96d48eec70f66001a11e1dd0770166897c5a0af5708af763867caf4 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..abd5184754fc65bffd4149bcdfc97fb0c87165df --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8470a66889b0a228cffcf9295cffb86a392df02ba9e0a26842a807bd09a36402 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7e08e0088e07ebd0e700603a5d99521d34ce4453 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab72dffda700ba7b4e4bcdd71ccfe4ef51fb7dca7e51c3388fde75c619453288 +size 27478306 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e5a2c363870426bdbb47b9287ef0b7049df39f57 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14584b9a41389b0ee89fbea5168109175db50dcca0d197a84207b5d634226d6d +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d4dd4c43b93c3b8bef1972600d08db84738ee864 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:151654398cc550cd69aaf710be2be6a777f0dd66032a11706d94ec49de22af67 +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c7ea00fa73d7592f14a751753d8fa1d4feff5e8 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7fc3c882e8aaaf7999342f0ed7cfcf75242c26f41fd5d6f63be912071f6a9bbc +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dd677c2e0a5dc2ec51e22fa06b627ffa33475fae --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ab2f96b3dc0c215a90ef23b6575ea56dd76eafa20a71b0cac2e5e3a97d26b42 +size 27478167 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7f345c48a96a05af9724a31ec8257d34d64b2e3 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a3b6138000f1548dadec5db6dfcc4c9045780ef0f679bdb5eaa217a7b9dbff6d +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..12369c3736a2cb515fabca69aada6202ae1668c3 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7353ef6fc61b8d76f81bb268c07ff306dc6f09d1c4801a6c6132bc64d399faee +size 27478370 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e3757daa2fa7404a821224cbd5c77eebad2ec2f --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14ed48da12a559b87a37bbbbd6470c124474f7cb3d6bf73800059d99ce5d5927 +size 27478178 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..68a002315a31d887f64d47c2f5590fa73b553d48 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:131ad1b7badec27e10d15f2ba28b944d706efeb3c2a9a2a8e2f47873e5f6510d +size 27478242 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..531ebb7b283fe8e97ec2329d743da6201103f575 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b959dd724df71d69abe74b83f01c6f241374b33e6c911efb2b6615e2c50e139c +size 27478359 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b79f540a35af77a7612765d49755c9311daeed20 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9d9783f003e5feee154e7b94fb5034971d3395ac59a8daffa850ab653c3f59e +size 27478103 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..48258c076bdf179ccbbf4e0e4b83c767b46655e1 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21684cf00dcba680ed041f26f7af731396111ed3b64a22f55040f329a9de2e77 +size 27478359 diff --git a/146m14b400m/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m14b400m/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bae96eff46cf28c6e8f0e1be96114f7ca75af515 --- /dev/null +++ b/146m14b400m/global_step21553/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:804fab1690fbaddcccc399a5c4aa9acfc3e7e7ab6362e8e795acb0e454533430 +size 27478167 diff --git a/146m14b400m/global_step21553/layer_01-model_00-model_states.pt b/146m14b400m/global_step21553/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8cf9dfb77b0ae43aa2a9aac1ff2200b9d5899841 --- /dev/null +++ b/146m14b400m/global_step21553/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1dbbefaa099ffff65f58c7b44afb78677015250d24bd7bcacfb255ae8d835a71 +size 80413955 diff --git a/146m14b400m/global_step21553/layer_03-model_00-model_states.pt b/146m14b400m/global_step21553/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dfd8f55be5a824e9ca1342a738b54cad6897610a --- /dev/null +++ b/146m14b400m/global_step21553/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b6a7c5f18957ca420e42f499632abe84e4d474aa20212a57f0bd9b3ef2853f4 +size 14180099 diff --git a/146m14b400m/global_step21553/layer_04-model_00-model_states.pt b/146m14b400m/global_step21553/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7fcb272212f71684b137b47b2e4cb08d1245e56a --- /dev/null +++ b/146m14b400m/global_step21553/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba0faa3fe8cabe79bd43dab3789b4bfad8a4d743b81b3424d518efdc305eaa8f +size 14180099 diff --git a/146m14b400m/global_step21553/layer_05-model_00-model_states.pt b/146m14b400m/global_step21553/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7770f3e6904bfe46af788af93c559acd59dca7ed --- /dev/null +++ b/146m14b400m/global_step21553/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f528bd9a44408248225523229cdd1368d2bc8e313273fe49a57f4d7f3fb647e0 +size 14180099 diff --git a/146m14b400m/global_step21553/layer_06-model_00-model_states.pt b/146m14b400m/global_step21553/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d2335768a1cf1cfa7d973c9eb326d46d60a9d6e --- /dev/null +++ b/146m14b400m/global_step21553/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a07aaa48fb06e6bf281518ec76fe341c3a7fda32f279195f4e28645a688ac1a +size 14180099 diff --git a/146m14b400m/global_step21553/layer_07-model_00-model_states.pt b/146m14b400m/global_step21553/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..32d98b5cb5c2389c41bc9cb6fdd23b6bb344b38e --- /dev/null +++ b/146m14b400m/global_step21553/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dff4f74266007a0469268b79e551ade79af3e4f22669b1f06859d5e2399f00d7 +size 14180099 diff --git a/146m14b400m/global_step21553/layer_08-model_00-model_states.pt b/146m14b400m/global_step21553/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..94c9babe052537bd6f731f193635898280392b27 --- /dev/null +++ b/146m14b400m/global_step21553/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a553dd10fde368f3ea526e2c61ccdd21a518588f5b096498dbfd6bf4d6e6a308 +size 14180099 diff --git a/146m14b400m/global_step21553/layer_09-model_00-model_states.pt b/146m14b400m/global_step21553/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4a1575d8b44404e545516a951d69aba34acd84a8 --- /dev/null +++ b/146m14b400m/global_step21553/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82124d7a75719c04f460967bc1c84df8ecc575e76641f029628a166a4902d2b1 +size 14180099 diff --git a/146m14b400m/global_step21553/layer_10-model_00-model_states.pt b/146m14b400m/global_step21553/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e785e374f21c0d60fe05099b9cf1cbba7276fee4 --- /dev/null +++ b/146m14b400m/global_step21553/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8de373534ce39e8111b8ecf3d18df16e4a841f0328dc46958b6096d5619f4b0e +size 14180099 diff --git a/146m14b400m/global_step21553/layer_11-model_00-model_states.pt b/146m14b400m/global_step21553/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..546760b4f86b110f5b47dea044db4a7e8c1277e2 --- /dev/null +++ b/146m14b400m/global_step21553/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de12e6fd0f44fdf375be852f13eaf1a5306b28dc4d9fbe1d59facc7b3c4a6f68 +size 14180099 diff --git a/146m14b400m/global_step21553/layer_12-model_00-model_states.pt b/146m14b400m/global_step21553/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d91a6a289a112bbdab39bf186cd47ef015ea882 --- /dev/null +++ b/146m14b400m/global_step21553/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c9665fd6e94832be1e0d2f32d7ba0ce3d7107a1fa92fc6754ee180d314159b6 +size 14180099 diff --git a/146m14b400m/global_step21553/layer_13-model_00-model_states.pt b/146m14b400m/global_step21553/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0fc5538c946e3adf34aea947380ece387e76b9e3 --- /dev/null +++ b/146m14b400m/global_step21553/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:809675e739cc681540269eb540c1bfa19012d82d075f8cceba2136a55217595d +size 14180099 diff --git a/146m14b400m/global_step21553/layer_14-model_00-model_states.pt b/146m14b400m/global_step21553/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a1a5ab09e427e4f45672a5bd0e1eb60d77fe28dc --- /dev/null +++ b/146m14b400m/global_step21553/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d45c36f5865eb3cbb4703aac15262f95813a3efc94aa3fa830b0361104070f1b +size 14180099 diff --git a/146m14b400m/global_step21553/layer_15-model_00-model_states.pt b/146m14b400m/global_step21553/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..842fa64e1378d79c4b1898bde14e535e184b8c34 --- /dev/null +++ b/146m14b400m/global_step21553/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:562c09bc0b216ff7884c7a2ec20033a8dde123fc5edb13d3786092a8265e0dac +size 14180099 diff --git a/146m14b400m/global_step21553/layer_16-model_00-model_states.pt b/146m14b400m/global_step21553/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9107b1437e7e3d3d7d2e5382e229db3211689596 --- /dev/null +++ b/146m14b400m/global_step21553/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc1254c045c7322abf0a289509637d56394e93552da1769beb2248902a57f084 +size 14180099 diff --git a/146m14b400m/global_step21553/layer_17-model_00-model_states.pt b/146m14b400m/global_step21553/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..11bd363dd90aed8a1d0e61e39faad097a9224dc1 --- /dev/null +++ b/146m14b400m/global_step21553/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:68254bf044e8786c489f9d85e466aa90e303818213ec5b22994413feef82a809 +size 14180099 diff --git a/146m14b400m/global_step21553/layer_19-model_00-model_states.pt b/146m14b400m/global_step21553/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a239efcfcd6b4fbc7fa5bd95b7d4a7573896e786 --- /dev/null +++ b/146m14b400m/global_step21553/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2a18634aa4383096ec1837b0ad20119411ddf94a5e8c3e207fce4891afde3e4a +size 4291 diff --git a/146m14b400m/global_step21553/mp_rank_00_model_states.pt b/146m14b400m/global_step21553/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..725b379fbc199ea39db14e0636b3b4b3dbb89fd9 --- /dev/null +++ b/146m14b400m/global_step21553/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ebb394e528ac69f52335ddd7203b53422b6121090f27c4597f816071f516968e +size 35443 diff --git a/146m14b400m/sbatch_146m14b400m.sh b/146m14b400m/sbatch_146m14b400m.sh new file mode 100644 index 0000000000000000000000000000000000000000..ad096bdee7f85b76c2719d9cbf3702d311deed79 --- /dev/null +++ b/146m14b400m/sbatch_146m14b400m.sh @@ -0,0 +1,162 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m14b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=5_517_578 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 55_176 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m14b400m/sbatch_146m14b400mval.sh b/146m14b400m/sbatch_146m14b400mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..fab7d9e144469a3a314a2ad2de9edd0831133f7c --- /dev/null +++ b/146m14b400m/sbatch_146m14b400mval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m14b400mval +VARIANT_CKPT=146m14b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m14b400m/tensorboard_146m14b400m/events.out.tfevents.1678910215.nid005483.78666.0 b/146m14b400m/tensorboard_146m14b400m/events.out.tfevents.1678910215.nid005483.78666.0 new file mode 100644 index 0000000000000000000000000000000000000000..17e4d9ef955cafa47ef1249f3a88b1ff52ca557d --- /dev/null +++ b/146m14b400m/tensorboard_146m14b400m/events.out.tfevents.1678910215.nid005483.78666.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c70aac0d35f4458ba14edb191f352afbb25c8cc9b3209aee3ee70368202b729c +size 38441522 diff --git a/146m14b400m/tensorboard_146m14b400mval/events.out.tfevents.1678950236.nid006617.83630.0 b/146m14b400m/tensorboard_146m14b400mval/events.out.tfevents.1678950236.nid006617.83630.0 new file mode 100644 index 0000000000000000000000000000000000000000..40371cb4d656429a16cfd2d3a07e87bbab89322d --- /dev/null +++ b/146m14b400m/tensorboard_146m14b400mval/events.out.tfevents.1678950236.nid006617.83630.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:377a7957d098f55b92cfd4b3a46cc14a54d8dbb8637aed9af90fc95f5bd35b01 +size 980 diff --git a/146m1b5100mdedup/3406620.err b/146m1b5100mdedup/3406620.err new file mode 100644 index 0000000000000000000000000000000000000000..70754d442e0c360e3fa3cec3bfbd591ee87aa981 --- /dev/null +++ b/146m1b5100mdedup/3406620.err @@ -0,0 +1,2465 @@ +0: Lmod has detected the following error: The following module(s) are unknown: +0: "suse-repo-deps/sam-default" +0: +0: Please check the spelling or version number. Also try "module spider ..." +0: It is also possible your cache file is out-of-date; it may help to try: +0: $ module --ignore-cache load "suse-repo-deps/sam-default" +0: +0: Also make sure that all modulefiles written in TCL start with the string +0: #%Module +0: +0: +0: +0: Lmod has detected the following error: The following module(s) are unknown: +0: "rocm/sam-5.2.3" +0: +0: Please check the spelling or version number. Also try "module spider ..." +0: It is also possible your cache file is out-of-date; it may help to try: +0: $ module --ignore-cache load "rocm/sam-5.2.3" +0: +0: Also make sure that all modulefiles written in TCL start with the string +0: #%Module +0: +0: +0: +3: Lmod has detected the following error: The following module(s) are unknown: +3: "suse-repo-deps/sam-default" +3: +3: Please check the spelling or version number. Also try "module spider ..." +3: It is also possible your cache file is out-of-date; it may help to try: +3: $ module --ignore-cache load "suse-repo-deps/sam-default" +3: +3: Also make sure that all modulefiles written in TCL start with the string +3: #%Module +3: +3: +3: +7: Lmod has detected the following error: The following module(s) are unknown: +7: "suse-repo-deps/sam-default" +7: +7: Please check the spelling or version number. Also try "module spider ..." +7: It is also possible your cache file is out-of-date; it may help to try: +7: $ module --ignore-cache load "suse-repo-deps/sam-default" +7: +7: Also make sure that all modulefiles written in TCL start with the string +7: #%Module +7: +7: +7: +4: Lmod has detected the following error: The following module(s) are unknown: +4: "suse-repo-deps/sam-default" +4: +4: Please check the spelling or version number. Also try "module spider ..." +4: It is also possible your cache file is out-of-date; it may help to try: +4: $ module --ignore-cache load "suse-repo-deps/sam-default" +4: +4: Also make sure that all modulefiles written in TCL start with the string +4: #%Module +4: +4: +4: +1: Lmod has detected the following error: The following module(s) are unknown: +1: "suse-repo-deps/sam-default" +1: +1: Please check the spelling or version number. Also try "module spider ..." +1: It is also possible your cache file is out-of-date; it may help to try: +1: $ module --ignore-cache load "suse-repo-deps/sam-default" +1: +1: Also make sure that all modulefiles written in TCL start with the string +1: #%Module +1: +1: +1: +6: Lmod has detected the following error: The following module(s) are unknown: +6: "suse-repo-deps/sam-default" +6: +6: Please check the spelling or version number. Also try "module spider ..." +6: It is also possible your cache file is out-of-date; it may help to try: +6: $ module --ignore-cache load "suse-repo-deps/sam-default" +6: +6: Also make sure that all modulefiles written in TCL start with the string +6: #%Module +6: +6: +6: +5: Lmod has detected the following error: The following module(s) are unknown: +5: "suse-repo-deps/sam-default" +5: +5: Please check the spelling or version number. Also try "module spider ..." +5: It is also possible your cache file is out-of-date; it may help to try: +5: $ module --ignore-cache load "suse-repo-deps/sam-default" +5: +5: Also make sure that all modulefiles written in TCL start with the string +5: #%Module +5: +5: +5: +0: Lmod has detected the following error: The following module(s) are unknown: +0: "rccl/sam-develop" +0: +0: Please check the spelling or version number. Also try "module spider ..." +0: It is also possible your cache file is out-of-date; it may help to try: +0: $ module --ignore-cache load "rccl/sam-develop" +0: +0: Also make sure that all modulefiles written in TCL start with the string +0: #%Module +0: +0: +0: +3: Lmod has detected the following error: The following module(s) are unknown: +3: "rocm/sam-5.2.3" +3: +3: Please check the spelling or version number. Also try "module spider ..." +3: It is also possible your cache file is out-of-date; it may help to try: +3: $ module --ignore-cache load "rocm/sam-5.2.3" +3: +3: Also make sure that all modulefiles written in TCL start with the string +3: #%Module +3: +3: +3: +7: Lmod has detected the following error: The following module(s) are unknown: +7: "rocm/sam-5.2.3" +7: +7: Please check the spelling or version number. Also try "module spider ..." +7: It is also possible your cache file is out-of-date; it may help to try: +7: $ module --ignore-cache load "rocm/sam-5.2.3" +7: +7: Also make sure that all modulefiles written in TCL start with the string +7: #%Module +7: +7: +7: +1: Lmod has detected the following error: The following module(s) are unknown: +1: "rocm/sam-5.2.3" +1: +1: Please check the spelling or version number. Also try "module spider ..." +1: It is also possible your cache file is out-of-date; it may help to try: +1: $ module --ignore-cache load "rocm/sam-5.2.3" +1: +1: Also make sure that all modulefiles written in TCL start with the string +1: #%Module +1: +1: +1: +4: Lmod has detected the following error: The following module(s) are unknown: +4: "rocm/sam-5.2.3" +4: +4: Please check the spelling or version number. Also try "module spider ..." +4: It is also possible your cache file is out-of-date; it may help to try: +4: $ module --ignore-cache load "rocm/sam-5.2.3" +4: +4: Also make sure that all modulefiles written in TCL start with the string +4: #%Module +4: +4: +4: +6: Lmod has detected the following error: The following module(s) are unknown: +6: "rocm/sam-5.2.3" +6: +6: Please check the spelling or version number. Also try "module spider ..." +6: It is also possible your cache file is out-of-date; it may help to try: +6: $ module --ignore-cache load "rocm/sam-5.2.3" +6: +6: Also make sure that all modulefiles written in TCL start with the string +6: #%Module +6: +6: +6: +5: Lmod has detected the following error: The following module(s) are unknown: +5: "rocm/sam-5.2.3" +5: +5: Please check the spelling or version number. Also try "module spider ..." +5: It is also possible your cache file is out-of-date; it may help to try: +5: $ module --ignore-cache load "rocm/sam-5.2.3" +5: +5: Also make sure that all modulefiles written in TCL start with the string +5: #%Module +5: +5: +5: +0: Lmod has detected the following error: The following module(s) are unknown: +0: "aws-ofi-rccl/sam-default" +0: +0: Please check the spelling or version number. Also try "module spider ..." +0: It is also possible your cache file is out-of-date; it may help to try: +0: $ module --ignore-cache load "aws-ofi-rccl/sam-default" +0: +0: Also make sure that all modulefiles written in TCL start with the string +0: #%Module +0: +0: +0: +3: Lmod has detected the following error: The following module(s) are unknown: +3: "rccl/sam-develop" +3: +3: Please check the spelling or version number. Also try "module spider ..." +3: It is also possible your cache file is out-of-date; it may help to try: +3: $ module --ignore-cache load "rccl/sam-develop" +3: +3: Also make sure that all modulefiles written in TCL start with the string +3: #%Module +3: +3: +3: +7: Lmod has detected the following error: The following module(s) are unknown: +7: "rccl/sam-develop" +7: +7: Please check the spelling or version number. Also try "module spider ..." +7: It is also possible your cache file is out-of-date; it may help to try: +7: $ module --ignore-cache load "rccl/sam-develop" +7: +7: Also make sure that all modulefiles written in TCL start with the string +7: #%Module +7: +7: +7: +2: Lmod has detected the following error: The following module(s) are unknown: +2: "suse-repo-deps/sam-default" +2: +2: Please check the spelling or version number. Also try "module spider ..." +2: It is also possible your cache file is out-of-date; it may help to try: +2: $ module --ignore-cache load "suse-repo-deps/sam-default" +2: +2: Also make sure that all modulefiles written in TCL start with the string +2: #%Module +2: +2: +2: +1: Lmod has detected the following error: The following module(s) are unknown: +1: "rccl/sam-develop" +1: +1: Please check the spelling or version number. Also try "module spider ..." +1: It is also possible your cache file is out-of-date; it may help to try: +1: $ module --ignore-cache load "rccl/sam-develop" +1: +1: Also make sure that all modulefiles written in TCL start with the string +1: #%Module +1: +1: +1: +4: Lmod has detected the following error: The following module(s) are unknown: +4: "rccl/sam-develop" +4: +4: Please check the spelling or version number. Also try "module spider ..." +4: It is also possible your cache file is out-of-date; it may help to try: +4: $ module --ignore-cache load "rccl/sam-develop" +4: +4: Also make sure that all modulefiles written in TCL start with the string +4: #%Module +4: +4: +4: +6: Lmod has detected the following error: The following module(s) are unknown: +6: "rccl/sam-develop" +6: +6: Please check the spelling or version number. Also try "module spider ..." +6: It is also possible your cache file is out-of-date; it may help to try: +6: $ module --ignore-cache load "rccl/sam-develop" +6: +6: Also make sure that all modulefiles written in TCL start with the string +6: #%Module +6: +6: +6: +5: Lmod has detected the following error: The following module(s) are unknown: +5: "rccl/sam-develop" +5: +5: Please check the spelling or version number. Also try "module spider ..." +5: It is also possible your cache file is out-of-date; it may help to try: +5: $ module --ignore-cache load "rccl/sam-develop" +5: +5: Also make sure that all modulefiles written in TCL start with the string +5: #%Module +5: +5: +5: +3: Lmod has detected the following error: The following module(s) are unknown: +3: "aws-ofi-rccl/sam-default" +3: +3: Please check the spelling or version number. Also try "module spider ..." +3: It is also possible your cache file is out-of-date; it may help to try: +3: $ module --ignore-cache load "aws-ofi-rccl/sam-default" +3: +3: Also make sure that all modulefiles written in TCL start with the string +3: #%Module +3: +3: +3: +7: Lmod has detected the following error: The following module(s) are unknown: +7: "aws-ofi-rccl/sam-default" +7: +7: Please check the spelling or version number. Also try "module spider ..." +7: It is also possible your cache file is out-of-date; it may help to try: +7: $ module --ignore-cache load "aws-ofi-rccl/sam-default" +7: +7: Also make sure that all modulefiles written in TCL start with the string +7: #%Module +7: +7: +7: +1: Lmod has detected the following error: The following module(s) are unknown: +1: "aws-ofi-rccl/sam-default" +1: +1: Please check the spelling or version number. Also try "module spider ..." +1: It is also possible your cache file is out-of-date; it may help to try: +1: $ module --ignore-cache load "aws-ofi-rccl/sam-default" +1: +1: Also make sure that all modulefiles written in TCL start with the string +1: #%Module +1: +1: +1: +2: Lmod has detected the following error: The following module(s) are unknown: +2: "rocm/sam-5.2.3" +2: +2: Please check the spelling or version number. Also try "module spider ..." +2: It is also possible your cache file is out-of-date; it may help to try: +2: $ module --ignore-cache load "rocm/sam-5.2.3" +2: +2: Also make sure that all modulefiles written in TCL start with the string +2: #%Module +2: +2: +2: +4: Lmod has detected the following error: The following module(s) are unknown: +4: "aws-ofi-rccl/sam-default" +4: +4: Please check the spelling or version number. Also try "module spider ..." +4: It is also possible your cache file is out-of-date; it may help to try: +4: $ module --ignore-cache load "aws-ofi-rccl/sam-default" +4: +4: Also make sure that all modulefiles written in TCL start with the string +4: #%Module +4: +4: +4: +6: Lmod has detected the following error: The following module(s) are unknown: +6: "aws-ofi-rccl/sam-default" +6: +6: Please check the spelling or version number. Also try "module spider ..." +6: It is also possible your cache file is out-of-date; it may help to try: +6: $ module --ignore-cache load "aws-ofi-rccl/sam-default" +6: +6: Also make sure that all modulefiles written in TCL start with the string +6: #%Module +6: +6: +6: +5: Lmod has detected the following error: The following module(s) are unknown: +5: "aws-ofi-rccl/sam-default" +5: +5: Please check the spelling or version number. Also try "module spider ..." +5: It is also possible your cache file is out-of-date; it may help to try: +5: $ module --ignore-cache load "aws-ofi-rccl/sam-default" +5: +5: Also make sure that all modulefiles written in TCL start with the string +5: #%Module +5: +5: +5: +2: Lmod has detected the following error: The following module(s) are unknown: +2: "rccl/sam-develop" +2: +2: Please check the spelling or version number. Also try "module spider ..." +2: It is also possible your cache file is out-of-date; it may help to try: +2: $ module --ignore-cache load "rccl/sam-develop" +2: +2: Also make sure that all modulefiles written in TCL start with the string +2: #%Module +2: +2: +2: +2: Lmod has detected the following error: The following module(s) are unknown: +2: "aws-ofi-rccl/sam-default" +2: +2: Please check the spelling or version number. Also try "module spider ..." +2: It is also possible your cache file is out-of-date; it may help to try: +2: $ module --ignore-cache load "aws-ofi-rccl/sam-default" +2: +2: Also make sure that all modulefiles written in TCL start with the string +2: #%Module +2: +2: +2: +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: 2023-04-24 12:39:51.807169: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-04-24 12:39:51.807209: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-04-24 12:39:51.807227: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-04-24 12:39:51.807213: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-04-24 12:39:51.807257: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-04-24 12:39:51.807264: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-04-24 12:39:51.807272: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-04-24 12:39:51.807279: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-04-24 12:39:51.808121: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-04-24 12:39:51.808161: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-04-24 12:39:51.808175: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-04-24 12:39:51.808178: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-04-24 12:39:51.808187: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-04-24 12:39:51.808161: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-04-24 12:39:51.808194: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-04-24 12:39:51.808200: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-04-24 12:39:51.808609: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-04-24 12:39:51.808611: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-04-24 12:39:51.808655: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-04-24 12:39:51.808673: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-04-24 12:39:51.808659: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-04-24 12:39:51.808704: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-04-24 12:39:51.808726: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-04-24 12:39:51.808747: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-04-24 12:39:51.808975: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-04-24 12:39:51.808991: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-04-24 12:39:51.809002: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-04-24 12:39:51.809016: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-04-24 12:39:51.809032: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-04-24 12:39:51.809127: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-04-24 12:39:51.809133: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-04-24 12:39:51.809153: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-04-24 12:39:51.809352: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-04-24 12:39:51.809362: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-04-24 12:39:51.809369: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-04-24 12:39:51.809373: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-04-24 12:39:51.809376: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-04-24 12:39:51.809383: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-04-24 12:39:51.809391: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-04-24 12:39:51.809400: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-04-24 12:39:51.809946: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-04-24 12:39:51.809961: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-04-24 12:39:51.810074: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-04-24 12:39:51.810090: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-04-24 12:39:51.810099: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-04-24 12:39:51.810104: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-04-24 12:39:51.810124: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-04-24 12:39:51.810146: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-04-24 12:39:51.810537: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-04-24 12:39:51.810552: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-04-24 12:39:51.810541: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-04-24 12:39:51.810561: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-04-24 12:39:51.810569: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-04-24 12:39:51.810577: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-04-24 12:39:51.810587: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-04-24 12:39:51.810630: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-04-24 12:39:51.811310: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-04-24 12:39:51.811322: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-04-24 12:39:51.811313: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-04-24 12:39:51.811335: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-04-24 12:39:51.811341: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-04-24 12:39:51.811349: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-04-24 12:39:51.811360: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-04-24 12:39:51.811371: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-04-24 12:40:08.474846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.474877: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.474933: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.474964: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.474967: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.474915: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.474975: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.474981: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475184: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475215: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475252: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475272: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475293: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475311: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:08.475265: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:08.475308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:08.475248: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:08.475284: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475313: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:08.475328: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:08.475308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:08.475349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:08.475373: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:08.475370: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:08.475391: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:08.475327: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:08.475350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:08.475387: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:08.475384: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:08.475356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:08.475373: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:08.475389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:08.475368: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:08.475349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:08.475359: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:08.475388: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.475938: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-04-24 12:40:08.475960: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.475431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:08.475444: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.475982: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.475457: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:08.475462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:08.475378: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:08.475422: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.475994: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-04-24 12:40:08.476007: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.475503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:08.475473: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:08.475444: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:08.475417: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475892: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-04-24 12:40:08.475912: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-04-24 12:40:08.476011: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-04-24 12:40:08.476016: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.475472: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:08.475474: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:08.475431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:08.475458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:08.476027: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.475481: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:08.475491: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:08.475463: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:08.475467: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:08.475477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:08.475941: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-04-24 12:40:08.475946: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-04-24 12:40:08.475966: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-04-24 12:40:08.475969: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-04-24 12:40:08.475977: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-04-24 12:40:08.475983: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-04-24 12:40:08.476097: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-04-24 12:40:08.476118: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-04-24 12:40:08.476106: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-04-24 12:40:08.476130: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-04-24 12:40:08.476149: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-04-24 12:40:08.476157: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-04-24 12:40:08.476164: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-04-24 12:40:08.476141: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-04-24 12:40:08.476175: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-04-24 12:40:08.476180: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.476229: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-04-24 12:40:08.476166: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-04-24 12:40:08.476188: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-04-24 12:40:08.476222: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-04-24 12:40:08.476243: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-04-24 12:40:08.476244: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-04-24 12:40:08.476253: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-04-24 12:40:08.476257: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-04-24 12:40:08.476218: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-04-24 12:40:08.476225: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.476250: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.476269: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-04-24 12:40:08.476226: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.476249: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-04-24 12:40:08.476264: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-04-24 12:40:08.476240: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.476271: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-04-24 12:40:08.476268: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.476298: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.476285: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.476283: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-04-24 12:40:08.476292: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.476334: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.476343: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.476292: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.476350: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.476356: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.476306: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-04-24 12:40:08.476310: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:08.476369: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-04-24 12:40:08.476533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:08.476235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:08.476513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:08.476276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:08.476311: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:08.476519: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:08.476334: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:08.476295: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:08.476892: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-04-24 12:40:08.476896: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-04-24 12:40:08.476924: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-04-24 12:40:08.476930: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-04-24 12:40:08.476937: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-04-24 12:40:08.476951: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-04-24 12:40:08.476957: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-04-24 12:40:08.476961: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-04-24 12:40:34.950948: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.950965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.951003: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.951012: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.951019: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.951027: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.951227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.951229: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.966196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.966231: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.966254: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.966263: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.966274: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.966278: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.966366: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.966369: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.968821: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.968825: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.968859: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-04-24 12:40:34.968832: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.968834: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.968859: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-04-24 12:40:34.968849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.968849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.968835: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.968838: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.968848: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-04-24 12:40:34.968855: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.968850: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.968848: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-04-24 12:40:34.968849: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-04-24 12:40:34.968852: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-04-24 12:40:34.968859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.968856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.968957: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.968965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +4: 2023-04-24 12:40:34.968901: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-04-24 12:40:34.968902: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-04-24 12:40:34.968903: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-04-24 12:40:34.968975: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.968983: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-04-24 12:40:34.968989: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-04-24 12:40:34.968903: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-04-24 12:40:34.968906: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-04-24 12:40:34.968905: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-04-24 12:40:34.968995: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-04-24 12:40:34.969033: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +6: 2023-04-24 12:40:34.969049: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.969346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.969385: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.969408: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.969424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.969439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.969503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.969537: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.969367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.969401: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.969398: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.969427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.969444: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.969419: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.969434: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.969457: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.969469: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.969526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.969455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.969480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.969484: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.969753: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.969455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.969469: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.969728: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.969729: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.969544: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.969556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.969574: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.969588: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.969592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.969612: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.969628: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970438: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970443: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970454: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970454: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970465: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-04-24 12:40:34.970495: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.970499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970466: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-04-24 12:40:34.970456: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970464: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.970592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.970593: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.970505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.970505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970463: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970459: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.970598: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.970601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.970510: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.970512: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +5: 2023-04-24 12:40:34.970489: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-04-24 12:40:34.970491: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-04-24 12:40:34.970498: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.970603: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.970602: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.970516: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.970526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.970558: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-04-24 12:40:34.970499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-04-24 12:40:34.970500: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-04-24 12:40:34.970502: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.970604: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.970644: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.970646: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-04-24 12:40:34.970624: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.970627: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +7: 2023-04-24 12:40:34.970560: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.970647: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-04-24 12:40:34.970563: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-04-24 12:40:34.970564: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-04-24 12:40:34.970564: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.970648: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.970648: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.970650: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-04-24 12:40:34.970566: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-04-24 12:40:34.970569: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-04-24 12:40:34.970567: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.970650: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-04-24 12:40:34.970684: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +0: 2023-04-24 12:40:34.970728: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-04-24 12:40:34.970635: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.970634: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.970644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.970645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.970649: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.970685: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-04-24 12:40:34.970685: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-04-24 12:40:34.970688: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-04-24 12:40:34.970689: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-04-24 12:40:34.970692: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-04-24 12:40:34.970694: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-04-24 12:40:34.970696: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-04-24 12:40:34.970875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +3: 2023-04-24 12:40:34.970909: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-04-24 12:40:34.975377: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.975400: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.975407: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.975423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.975426: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.975427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.975474: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.975477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.975476: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.975501: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.975526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.975540: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.975556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.975559: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.975585: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.975595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.976006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.976014: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.976020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.976033: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-04-24 12:40:34.976023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.976040: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-04-24 12:40:34.976032: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.976035: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.976046: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-04-24 12:40:34.976048: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-04-24 12:40:34.976039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.976044: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +1: 2023-04-24 12:40:34.976057: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-04-24 12:40:34.976060: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-04-24 12:40:34.976069: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-04-24 12:40:34.976071: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-04-24 12:40:34.976153: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.976152: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.976160: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.976163: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.976163: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.976167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.976170: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.976180: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-04-24 12:40:34.976181: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-04-24 12:40:34.976185: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-04-24 12:40:34.976182: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-04-24 12:40:34.976182: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-04-24 12:40:34.976183: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-04-24 12:40:34.976187: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-04-24 12:40:34.976198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 +2: 2023-04-24 12:40:34.976219: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: +3: +3: +3: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: +4: +4: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: +6: +6: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: +7: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils...Loading extension module utils... +1: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils...Loading extension module utils... +2: +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: +2: Loading extension module utils...Loading extension module utils... +2: Loading extension module utils... +2: +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils...Loading extension module utils... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m1b5100mdedup/3406620.out b/146m1b5100mdedup/3406620.out new file mode 100644 index 0000000000000000000000000000000000000000..453c1c3c4fc14c6d62c09afb2f6a2240d4628a87 --- /dev/null +++ b/146m1b5100mdedup/3406620.out @@ -0,0 +1,5664 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m1b5100mdedupval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m1b5100mdedupval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m1b5100mdedup --load checkpoints_146m1b5100mdedup --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3406620.json --zero-stage 0 +START 3406620: Mon 24 Apr 2023 12:38:35 PM EEST +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 49.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 39.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 49.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 51.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 44.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 48.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 43.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 46.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 44.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 45.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 40.0c 99.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 40.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 38.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 45.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 44.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 43.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 48.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 43.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 45.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 46.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 48.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +7: Launching on nid006915 (7/8), master nid006908 port 9999, GPUs 8, CUDA: True +0: Launching on nid006908 (0/8), master nid006908 port 9999, GPUs 8, CUDA: True +1: Launching on nid006909 (1/8), master nid006908 port 9999, GPUs 8, CUDA: True +2: Launching on nid006910 (2/8), master nid006908 port 9999, GPUs 8, CUDA: True +5: Launching on nid006913 (5/8), master nid006908 port 9999, GPUs 8, CUDA: True +6: Launching on nid006914 (6/8), master nid006908 port 9999, GPUs 8, CUDA: True +4: Launching on nid006912 (4/8), master nid006908 port 9999, GPUs 8, CUDA: True +3: Launching on nid006911 (3/8), master nid006908 port 9999, GPUs 8, CUDA: True +7: > setting tensorboard ... +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3406620.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m1b5100mdedupval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m1b5100mdedup +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m1b5100mdedup +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m1b5100mdedupval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-04-24 12:41:38,832] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.078 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 10.205 seconds +0: time to initialize megatron (seconds): -32.071 +0: [after megatron is initialized] datetime: 2023-04-24 12:41:51 +0: building GPT model ... +0: [2023-04-24 12:41:52,063] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-04-24 12:41:52,064] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-04-24 12:41:52,064] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 38.62 GB, percent = 7.7% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-04-24 12:41:54,034] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-04-24 12:41:54,328] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-04-24 12:41:54,329] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-04-24 12:41:54,329] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 38.64 GB, percent = 7.7% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-04-24 12:41:54,331] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-04-24 12:41:55,292] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-04-24 12:41:55,292] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-04-24 12:41:55,292] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-04-24 12:41:55,297] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-04-24 12:41:55,297] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-04-24 12:41:55,417] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-04-24 12:41:55,417] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-04-24 12:41:55,418] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.39 GB, percent = 7.8% +0: ninja: no work to do. +2: Time to load utils op: 0.38068175315856934 secondsTime to load utils op: 0.3806889057159424 seconds +2: +2: Time to load utils op: 0.38071131706237793 seconds +2: Time to load utils op: 0.3807861804962158 seconds +2: Time to load utils op: 0.3807973861694336 secondsTime to load utils op: 0.38076186180114746 seconds +2: +2: Time to load utils op: 0.3808155059814453 seconds +2: Time to load utils op: 0.38082075119018555 seconds +3: Time to load utils op: 0.3821988105773926 secondsTime to load utils op: 0.3822062015533447 seconds +3: +3: Time to load utils op: 0.3822181224822998 secondsTime to load utils op: 0.3822317123413086 seconds +3: +3: Time to load utils op: 0.38222718238830566 secondsTime to load utils op: 0.3822314739227295 seconds +3: Time to load utils op: 0.3822348117828369 seconds +3: +3: Time to load utils op: 0.38224220275878906 seconds +1: Time to load utils op: 0.3903639316558838 seconds +1: Time to load utils op: 0.3898046016693115 seconds +1: Time to load utils op: 0.38965272903442383 seconds +1: Time to load utils op: 0.38989830017089844 seconds +1: Time to load utils op: 0.38925838470458984 seconds +1: Time to load utils op: 0.38982272148132324 seconds +1: Time to load utils op: 0.3901710510253906 seconds +1: Time to load utils op: 0.3907661437988281 seconds +7: Time to load utils op: 0.3785834312438965 secondsTime to load utils op: 0.3785851001739502 seconds +7: +7: Time to load utils op: 0.3786051273345947 seconds +7: Time to load utils op: 0.3786125183105469 seconds +7: Time to load utils op: 0.3786141872406006 seconds +7: Time to load utils op: 0.3786492347717285 secondsTime to load utils op: 0.3786506652832031 seconds +7: Time to load utils op: 0.3785536289215088 seconds +7: +4: Time to load utils op: 0.38205671310424805 secondsTime to load utils op: 0.3820505142211914 seconds +4: +4: Time to load utils op: 0.3820614814758301 seconds +4: Time to load utils op: 0.3820688724517822 seconds +4: Time to load utils op: 0.38209080696105957 seconds +4: Time to load utils op: 0.38210153579711914 seconds +4: Time to load utils op: 0.38211703300476074 seconds +4: Time to load utils op: 0.38211989402770996 seconds +0: Time to load utils op: 0.3917970657348633 seconds +0: Time to load utils op: 0.39231109619140625 seconds +0: Time to load utils op: 0.39251255989074707 secondsTime to load utils op: 0.3924567699432373 seconds +0: +0: Time to load utils op: 0.39258265495300293 seconds +0: Time to load utils op: 0.3925509452819824 seconds +0: Time to load utils op: 0.3924288749694824 seconds +5: Time to load utils op: 0.38601016998291016 seconds +5: Time to load utils op: 0.38606858253479004 seconds +5: Time to load utils op: 0.38640356063842773 seconds +5: Time to load utils op: 0.38597917556762695 seconds +0: Time to load utils op: 0.30243992805480957 seconds +5: Time to load utils op: 0.38650012016296387 seconds +5: Time to load utils op: 0.38607263565063477 seconds +5: Time to load utils op: 0.3866903781890869 secondsTime to load utils op: 0.3865673542022705 seconds +5: +6: Time to load utils op: 0.3840341567993164 secondsTime to load utils op: 0.38403892517089844 seconds +6: +6: Time to load utils op: 0.38407349586486816 seconds +6: Time to load utils op: 0.3841056823730469 seconds +6: Time to load utils op: 0.3841407299041748 secondsTime to load utils op: 0.38413429260253906 seconds +6: +6: Time to load utils op: 0.3841867446899414 seconds +6: Time to load utils op: 0.3842332363128662 seconds +0: [2023-04-24 12:41:55,834] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-04-24 12:41:55,835] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-04-24 12:41:55,835] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.38 GB, percent = 7.8% +0: Time to load utils op: 0.0005316734313964844 seconds +0: Time to load utils op: 0.0004417896270751953 seconds +0: Time to load utils op: 0.0005998611450195312 seconds +0: Time to load utils op: 0.0006229877471923828 seconds +1: Time to load utils op: 0.0008046627044677734 seconds +0: Time to load utils op: 0.00054931640625 seconds +0: Time to load utils op: 0.0006489753723144531 seconds +1: Time to load utils op: 0.0007791519165039062 secondsTime to load utils op: 0.0008370876312255859 seconds +1: +0: Time to load utils op: 0.0005376338958740234 seconds +1: Time to load utils op: 0.001194000244140625 seconds +1: Time to load utils op: 0.001169443130493164 seconds +1: Time to load utils op: 0.0012080669403076172 seconds +1: Time to load utils op: 0.001199960708618164 seconds +1: Time to load utils op: 0.0012524127960205078 seconds +6: Time to load utils op: 0.0009307861328125 seconds +5: Time to load utils op: 0.001190185546875 secondsTime to load utils op: 0.0011615753173828125 seconds +5: +5: Time to load utils op: 0.0012667179107666016 seconds +5: Time to load utils op: 0.0013399124145507812 seconds +5: Time to load utils op: 0.0012307167053222656 seconds +5: Time to load utils op: 0.0013775825500488281 seconds +5: Time to load utils op: 0.0012536048889160156 seconds +5: Time to load utils op: 0.0013661384582519531 seconds +7: Time to load utils op: 0.0013141632080078125 seconds +6: Time to load utils op: 0.0013751983642578125 seconds +3: Time to load utils op: 0.0010581016540527344 seconds +7: Time to load utils op: 0.0013146400451660156 seconds +6: Time to load utils op: 0.0014050006866455078 seconds +3: Time to load utils op: 0.001085519790649414 seconds +6: Time to load utils op: 0.0013890266418457031 seconds +6: Time to load utils op: 0.0013659000396728516 seconds +6: Time to load utils op: 0.0014019012451171875 seconds +6: Time to load utils op: 0.0013794898986816406 seconds +2: Time to load utils op: 0.0013124942779541016 seconds +2: Time to load utils op: 0.001314401626586914 seconds +7: Time to load utils op: 0.001558542251586914 seconds +6: Time to load utils op: 0.0014128684997558594 seconds +3: Time to load utils op: 0.0012013912200927734 seconds +7: Time to load utils op: 0.0015151500701904297 secondsTime to load utils op: 0.0015454292297363281 seconds +7: +3: Time to load utils op: 0.0013039112091064453 seconds +7: Time to load utils op: 0.0015981197357177734 seconds +2: Time to load utils op: 0.0014948844909667969 seconds +7: Time to load utils op: 0.001619100570678711 seconds +3: Time to load utils op: 0.0013871192932128906 seconds +7: Time to load utils op: 0.0015597343444824219 seconds +2: Time to load utils op: 0.0015420913696289062 seconds +3: Time to load utils op: 0.0013613700866699219 seconds +3: Time to load utils op: 0.0013887882232666016 seconds +2: Time to load utils op: 0.0016355514526367188 secondsTime to load utils op: 0.0016679763793945312 secondsTime to load utils op: 0.0016148090362548828 seconds +2: +2: +3: Time to load utils op: 0.001455545425415039 seconds +2: Time to load utils op: 0.0017056465148925781 seconds +4: Time to load utils op: 0.0009760856628417969 seconds +4: Time to load utils op: 0.0009486675262451172 seconds +4: Time to load utils op: 0.0009822845458984375 seconds +4: Time to load utils op: 0.0010771751403808594 seconds +4: Time to load utils op: 0.0011432170867919922 seconds +4: Time to load utils op: 0.0013041496276855469 seconds +4: Time to load utils op: 0.0013217926025390625 seconds +4: Time to load utils op: 0.001337289810180664 seconds +0: [2023-04-24 12:41:56,040] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-04-24 12:41:56,041] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-04-24 12:41:56,041] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.38 GB, percent = 7.8% +0: [2023-04-24 12:41:56,142] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-04-24 12:41:56,143] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-04-24 12:41:56,143] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.38 GB, percent = 7.8% +0: [2023-04-24 12:41:56,244] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-04-24 12:41:56,244] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-04-24 12:41:56,244] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.38 GB, percent = 7.8% +0: [2023-04-24 12:41:56,342] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-04-24 12:41:56,343] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-04-24 12:41:56,343] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.38 GB, percent = 7.8% +0: [2023-04-24 12:41:56,445] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-04-24 12:41:56,446] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-04-24 12:41:56,446] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.38 GB, percent = 7.8% +0: [2023-04-24 12:41:56,544] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-04-24 12:41:56,544] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-04-24 12:41:56,544] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.38 GB, percent = 7.8% +0: [2023-04-24 12:41:56,647] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-04-24 12:41:56,648] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-04-24 12:41:56,648] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.38 GB, percent = 7.8% +0: [2023-04-24 12:41:56,746] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-04-24 12:41:56,747] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-04-24 12:41:56,747] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.38 GB, percent = 7.8% +0: [2023-04-24 12:41:56,747] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-04-24 12:41:56,747] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-04-24 12:41:56,747] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-04-24 12:41:56,747] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-04-24 12:41:56,748] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-04-24 12:41:56,748] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-04-24 12:41:56,748] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-04-24 12:41:56,748] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-04-24 12:41:56,748] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-04-24 12:41:56,749] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-04-24 12:41:56,750] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-04-24 12:41:56,750] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.00041604042053222656 seconds +0: [2023-04-24 12:41:56,751] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-04-24 12:41:56,757] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +7: [2023-04-24 12:41:56,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +4: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +6: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +5: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +2: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +3: [2023-04-24 12:41:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +0: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +0: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +2: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +7: [2023-04-24 12:41:56,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt... +1: [2023-04-24 12:41:56,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt. +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:56,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:56,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:56,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:56,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:56,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-04-24 12:41:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-04-24 12:41:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-04-24 12:41:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-04-24 12:41:56,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-04-24 12:41:57,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:57,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:57,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:57,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:57,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:57,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:57,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:57,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:57,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:57,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:57,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:57,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:57,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-04-24 12:41:57,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:57,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:57,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-04-24 12:41:57,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:57,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:57,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:57,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:57,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-04-24 12:41:57,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:57,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:57,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:57,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:57,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:57,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-04-24 12:41:57,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:57,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:57,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:57,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:57,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:57,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-04-24 12:41:57,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-04-24 12:41:57,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-04-24 12:41:57,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-04-24 12:41:57,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-04-24 12:41:57,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-04-24 12:41:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-04-24 12:41:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-04-24 12:41:57,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-04-24 12:41:57,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-04-24 12:41:57,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-04-24 12:41:57,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-04-24 12:41:57,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-04-24 12:41:57,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-04-24 12:41:57,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-04-24 12:41:57,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-04-24 12:41:57,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-04-24 12:41:57,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-04-24 12:41:57,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-04-24 12:41:57,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-04-24 12:41:57,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-04-24 12:41:57,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-04-24 12:41:57,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-04-24 12:41:57,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-04-24 12:41:57,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-04-24 12:41:57,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-04-24 12:41:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-04-24 12:41:57,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-04-24 12:41:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-04-24 12:41:57,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-04-24 12:41:57,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-04-24 12:41:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-04-24 12:41:57,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-04-24 12:41:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-04-24 12:41:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-04-24 12:41:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-04-24 12:41:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-04-24 12:41:57,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-04-24 12:41:57,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-04-24 12:41:57,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-04-24 12:41:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-04-24 12:41:57,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-04-24 12:41:57,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-04-24 12:41:57,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-04-24 12:41:57,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-04-24 12:41:57,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-04-24 12:41:57,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-04-24 12:41:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-04-24 12:41:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-04-24 12:41:57,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-04-24 12:41:57,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-04-24 12:41:57,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-04-24 12:41:57,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-04-24 12:41:57,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-04-24 12:41:57,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-04-24 12:41:57,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-04-24 12:41:57,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-04-24 12:41:57,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-04-24 12:41:57,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-04-24 12:41:57,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-04-24 12:41:57,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-04-24 12:41:57,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-04-24 12:41:57,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:57,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:57,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-04-24 12:41:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:57,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:57,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:57,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:57,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:57,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-04-24 12:41:57,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-04-24 12:41:57,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-04-24 12:41:57,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:57,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:57,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:57,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:57,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:58,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:58,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:58,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:58,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:58,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-04-24 12:41:58,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:58,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-04-24 12:41:58,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:58,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:58,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-04-24 12:41:58,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-04-24 12:41:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-04-24 12:41:58,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-04-24 12:41:58,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-04-24 12:41:58,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-04-24 12:41:58,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-04-24 12:41:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-04-24 12:41:58,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-04-24 12:41:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-04-24 12:41:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-04-24 12:41:58,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-04-24 12:41:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-04-24 12:41:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-04-24 12:41:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-04-24 12:41:58,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-04-24 12:41:58,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-04-24 12:41:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-04-24 12:41:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-04-24 12:41:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-04-24 12:41:58,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-04-24 12:41:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-04-24 12:41:58,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: > overriding learning rate value to 0.0002 +5: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +0: > overriding minimum learning rate value to 2e-05 +6: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1[2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: +6: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: > overriding decay style value to cosine +0: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-04-24 12:41:58,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-04-24 12:41:58,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-04-24 12:41:58,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-04-24 12:41:58,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-04-24 12:41:58,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-04-24 12:41:58,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-04-24 12:41:58,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-04-24 12:41:58,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-04-24 12:41:58,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-04-24 12:41:58,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-04-24 12:41:58,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-04-24 12:41:58,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +6: [2023-04-24 12:41:58,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-04-24 12:41:58,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-04-24 12:41:58,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-04-24 12:41:58,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-04-24 12:41:58,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-04-24 12:41:58,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-04-24 12:41:58,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-04-24 12:41:58,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +1: [2023-04-24 12:41:58,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-04-24 12:41:58,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-04-24 12:41:58,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-04-24 12:41:58,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-04-24 12:41:58,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-04-24 12:41:58,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-04-24 12:41:58,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-04-24 12:41:58,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +3: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +2: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-04-24 12:41:58,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +7: [2023-04-24 12:41:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-04-24 12:41:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-04-24 12:41:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-04-24 12:41:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-04-24 12:41:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-04-24 12:41:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-04-24 12:41:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-04-24 12:41:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +5: [2023-04-24 12:41:58,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-04-24 12:41:58,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-04-24 12:41:58,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-04-24 12:41:58,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-04-24 12:41:58,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-04-24 12:41:58,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-04-24 12:41:58,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-04-24 12:41:58,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +4: [2023-04-24 12:41:58,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-04-24 12:41:58,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-04-24 12:41:58,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-04-24 12:41:58,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-04-24 12:41:58,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-04-24 12:41:58,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-04-24 12:41:58,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-04-24 12:41:58,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +2: [2023-04-24 12:41:58,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,324] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +2: [2023-04-24 12:41:58,326] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +1: [2023-04-24 12:41:58,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-04-24 12:41:58,326] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +0: [2023-04-24 12:41:58,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-04-24 12:41:58,327] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +1: [2023-04-24 12:41:58,328] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +6: [2023-04-24 12:41:58,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-04-24 12:41:58,328] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +0: [2023-04-24 12:41:58,329] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +0: [2023-04-24 12:41:58,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-04-24 12:41:58,330] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +6: [2023-04-24 12:41:58,330] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +0: [2023-04-24 12:41:58,332] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +0: [2023-04-24 12:41:58,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-04-24 12:41:58,333] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +5: [2023-04-24 12:41:58,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-04-24 12:41:58,334] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +0: [2023-04-24 12:41:58,334] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +6: [2023-04-24 12:41:58,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-04-24 12:41:58,334] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +6: [2023-04-24 12:41:58,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-04-24 12:41:58,335] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +1: [2023-04-24 12:41:58,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-04-24 12:41:58,335] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +6: [2023-04-24 12:41:58,336] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +1: [2023-04-24 12:41:58,336] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +6: [2023-04-24 12:41:58,337] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +5: [2023-04-24 12:41:58,337] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +5: [2023-04-24 12:41:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-04-24 12:41:58,337] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +1: [2023-04-24 12:41:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-04-24 12:41:58,338] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +5: [2023-04-24 12:41:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-04-24 12:41:58,338] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +5: [2023-04-24 12:41:58,339] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +1: [2023-04-24 12:41:58,339] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +0: [2023-04-24 12:41:58,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-04-24 12:41:58,339] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +5: [2023-04-24 12:41:58,340] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +0: [2023-04-24 12:41:58,341] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +2: [2023-04-24 12:41:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,346] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +4: [2023-04-24 12:41:58,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-04-24 12:41:58,347] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +7: [2023-04-24 12:41:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,348] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +7: [2023-04-24 12:41:58,348] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +0: [2023-04-24 12:41:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,348] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +0: [2023-04-24 12:41:58,349] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +4: [2023-04-24 12:41:58,349] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +0: [2023-04-24 12:41:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +0: [2023-04-24 12:41:58,350] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +7: [2023-04-24 12:41:58,350] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +0: [2023-04-24 12:41:58,350] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +7: [2023-04-24 12:41:58,350] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +4: [2023-04-24 12:41:58,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,351] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +4: [2023-04-24 12:41:58,351] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +0: [2023-04-24 12:41:58,351] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +7: [2023-04-24 12:41:58,351] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +7: [2023-04-24 12:41:58,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,352] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +4: [2023-04-24 12:41:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-04-24 12:41:58,353] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +4: [2023-04-24 12:41:58,353] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +7: [2023-04-24 12:41:58,353] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +4: [2023-04-24 12:41:58,355] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +4: [2023-04-24 12:41:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-04-24 12:41:58,356] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +4: [2023-04-24 12:41:58,358] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +2: [2023-04-24 12:41:58,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,366] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +2: [2023-04-24 12:41:58,368] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +2: [2023-04-24 12:41:58,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,371] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +0: [2023-04-24 12:41:58,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +6: [2023-04-24 12:41:58,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +0: [2023-04-24 12:41:58,372] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +6: [2023-04-24 12:41:58,372] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +2: [2023-04-24 12:41:58,372] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +0: [2023-04-24 12:41:58,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-04-24 12:41:58,373] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +0: [2023-04-24 12:41:58,373] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +6: [2023-04-24 12:41:58,374] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +2: [2023-04-24 12:41:58,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,374] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +3: [2023-04-24 12:41:58,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-04-24 12:41:58,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +6: [2023-04-24 12:41:58,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +3: [2023-04-24 12:41:58,375] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +3: [2023-04-24 12:41:58,375] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +6: [2023-04-24 12:41:58,375] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +0: [2023-04-24 12:41:58,375] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +2: [2023-04-24 12:41:58,376] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +6: [2023-04-24 12:41:58,377] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +3: [2023-04-24 12:41:58,377] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +3: [2023-04-24 12:41:58,377] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +6: [2023-04-24 12:41:58,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-04-24 12:41:58,380] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +4: [2023-04-24 12:41:58,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-04-24 12:41:58,381] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +2: [2023-04-24 12:41:58,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,381] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +6: [2023-04-24 12:41:58,382] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +4: [2023-04-24 12:41:58,382] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +2: [2023-04-24 12:41:58,382] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +7: [2023-04-24 12:41:58,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,383] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +7: [2023-04-24 12:41:58,383] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +4: [2023-04-24 12:41:58,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,385] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +7: [2023-04-24 12:41:58,385] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +4: [2023-04-24 12:41:58,385] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +3: [2023-04-24 12:41:58,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-04-24 12:41:58,385] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +1: [2023-04-24 12:41:58,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-04-24 12:41:58,386] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +6: [2023-04-24 12:41:58,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-04-24 12:41:58,386] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +4: [2023-04-24 12:41:58,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-04-24 12:41:58,386] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +4: [2023-04-24 12:41:58,387] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +5: [2023-04-24 12:41:58,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +1: [2023-04-24 12:41:58,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +5: [2023-04-24 12:41:58,387] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +1: [2023-04-24 12:41:58,387] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +3: [2023-04-24 12:41:58,387] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +3: [2023-04-24 12:41:58,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-04-24 12:41:58,387] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +1: [2023-04-24 12:41:58,387] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +6: [2023-04-24 12:41:58,388] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +4: [2023-04-24 12:41:58,388] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +1: [2023-04-24 12:41:58,389] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +3: [2023-04-24 12:41:58,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-04-24 12:41:58,389] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +5: [2023-04-24 12:41:58,389] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +3: [2023-04-24 12:41:58,389] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +2: [2023-04-24 12:41:58,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,390] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +1: [2023-04-24 12:41:58,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-04-24 12:41:58,390] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +3: [2023-04-24 12:41:58,391] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +1: [2023-04-24 12:41:58,392] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +1: [2023-04-24 12:41:58,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,392] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +1: [2023-04-24 12:41:58,392] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +1: [2023-04-24 12:41:58,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-04-24 12:41:58,393] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +3: [2023-04-24 12:41:58,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-04-24 12:41:58,393] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +1: [2023-04-24 12:41:58,394] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +1: [2023-04-24 12:41:58,394] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +2: [2023-04-24 12:41:58,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,395] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +3: [2023-04-24 12:41:58,395] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +5: [2023-04-24 12:41:58,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +2: [2023-04-24 12:41:58,396] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +5: [2023-04-24 12:41:58,396] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +5: [2023-04-24 12:41:58,398] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +7: [2023-04-24 12:41:58,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,399] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +7: [2023-04-24 12:41:58,400] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +5: [2023-04-24 12:41:58,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-04-24 12:41:58,401] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +4: [2023-04-24 12:41:58,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-04-24 12:41:58,402] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +5: [2023-04-24 12:41:58,404] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +4: [2023-04-24 12:41:58,404] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +5: [2023-04-24 12:41:58,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-04-24 12:41:58,407] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +5: [2023-04-24 12:41:58,408] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +7: [2023-04-24 12:41:58,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-04-24 12:41:58,411] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +3: [2023-04-24 12:41:58,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-04-24 12:41:58,411] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +7: [2023-04-24 12:41:58,412] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +6: [2023-04-24 12:41:58,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-04-24 12:41:58,412] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +5: [2023-04-24 12:41:58,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-04-24 12:41:58,413] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +3: [2023-04-24 12:41:58,413] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +6: [2023-04-24 12:41:58,414] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +5: [2023-04-24 12:41:58,415] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +3: [2023-04-24 12:41:58,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-04-24 12:41:58,422] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +3: [2023-04-24 12:41:58,423] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +0: successfully loaded checkpoint from checkpoints_146m1b5100mdedup at iteration 0 +7: time (ms) | load-checkpoint: 1668.75 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-04-24 12:41:59 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.029815 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.056 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.002943 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.089 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-04-24 12:42:03 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 7325.84 | train/valid/test-data-iterators-setup: 3990.78 +0: [after training is done] datetime: 2023-04-24 12:42:03 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 4.169387E+00 | lm loss PPL: 6.467578E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3406620: Mon 24 Apr 2023 12:42:32 PM EEST diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dcbb2a3f975265ad7fdaa4f215e5e81932fb6e96 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b31b74da8cb0acdea232d7b9ad7ae9a4b84330f2590f8f09641444046617e4be +size 27478295 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a84338f99e55ac1fc2ff11fb6d44ea61897a1e9 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:707d8101536a240ff081289abcdfd9e77da9f41596c2a6dae3f88d674c887a6d +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8689548fe1749b57a33c508588726d615f031f14 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a7aeed85fc61386456e692b6e84dfe7ba969159ed98e63069d14f46a508030ff +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b307e9239f205c36e4dda6ef9e9eb6efda8d7787 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e62cb1abc7e0b5d05829bc04bbcaa193eb6b9403d58ac12532ea6c30972c885a +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6a5401cab1c26f7bd44002a58860be420c57b1f7 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9b2394126285cb9fe2b86fb680e3b3111a738a733acfec013df9f30faa9e218b +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..24ff845a31b7bd37c043f35851400bd8b48485b0 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3de55d16ba663670b9c24b7fe83b31c9baaf4cfba6f7437e755c8395a38a0a04 +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1852105ed482b2108998cb43aa3863ec2cdb453a --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab4e8ecf62cdfdd59a485634c2f25bc2476072ed534a9cde2474976914e93897 +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..abe41e64d1189c356a6339d8ef2636a9106ae0be --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6137290e59a01c7e9f6f18a41304493a51c7ec32a7cf9922362b086f3db5a084 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2f862e008263f8bc9dda549b2825afb494283e54 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa3e8c24eacd1dff95c0af061752f28c0beb325369d5d8a6342434f0955b5acd +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9fdc8ba859f9e290965c96d4c7b8239d1c26d541 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23a6acbcacc56612be965d7711e227adf09023e9e3a21784481d3b8ab486f8be +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e058ee98e1b20c074071881e9e395971ac946d70 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:86325d50aeb8cbf3033123086816fc422a2fc90d6c85ac9a0ac24a385e77c6de +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4bb03673d76c09c7da5aa173b5a175f6b0a3ec1 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e813a8601e8c9f1acda71a4276c4b4af4576ba1ad9c1bee5d9169ea4d4192903 +size 27478231 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b92de3eeb5c53a934afee00c699761e2af1f74b9 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c8d0f136eeacbfa28a02095f99bea5750932a27a94a3de1829570ad572b94e4 +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..20fac7f3636d4f0a0bb30d4dbd67abeef18c968e --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e496914135dec6e38112de8b1399aa1d0478956ea14041bf9a1b70524e867a52 +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..567ff5facbccef58fd086f0c6d326c1b5e5f8f2d --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34832f33f9b028c233f5216bb354746fa275d0c077db5fd4d058889bd0ca946a +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7626c3cc963a136750d6b7b80677650f9786c35b --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:41e2b2da58a136f8ee2ef24d870fc6e2e8662216de39a7d7d0da91d1e8e8fdf8 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..70685a77f550e01df14d1a9782e024eb0da14f9b --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91dd0ea347f0ccfb56f5166727a689ed4898c1457ea28af75c9bd6116de462aa +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5993704e5d1f518e937fd02a963e582f5db74630 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b9cfc33cdbdaaef594f238adcd982304f9fb6eb0bcffdca8f7d8e7dd47caa7b +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d9eb314707dc1b04602ef32654f05dc205ea7a51 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a7a257f7cbac530e101fab57dacee7f9107c641951ad613f4aaba32cde5f8aae +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d738bef79884103bb6b306730e6495827e20afd6 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:996129d8452d599a0d078388516d506d4b00e4a696a6ee31a4a6e7210ebc2ab0 +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..83550d579f3125a99a36506da2e9db0dbcb5b571 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a879bf96d2106ccfee989b8d9883b70da1c588a4f59c1ae652ef721836211bb +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6408aa8fbf2f002029521a38199215ab135019b9 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ec79c0ff0a80f7f6007b9812efa2fe6912fd93bf6938479463ff9d4d6003695 +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3df58ea217d5f85fd03da441c848b617bf072962 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:910a4c83f27b2f0fb93476c47514c233428d32c2264e15d71f3ebf6474d7d1e2 +size 27478231 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c9f57a284be247430a34138477ddeedbcfbd9a00 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5be72776b3e0a7d8c30bbeca531500322d0e896bc9e2287204f97e5a73b327a +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3ba45192c5aea48f019d973b4e291cf72e71bef5 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d529545f6bf311f8c6c3738297ad733dcbb1c65a1f8ed5fa37c07547e5ae8e7 +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b515bc1b58181bec22042db1b82f9f62909a6f2c --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f31c432be51391c1ae93c7be4d391e5d179569433c4d6bdb9f80ad1fcdf6a17c +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7de0475349b911a365d8e112ee834d466f7fe934 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c4a1bc1f38da848615d1620c1249aca87c0239c9bd93c05fdaf26fb80184f55c +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..acab16d39c7814078a19ad1fa109e53ba5cdb66d --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cbf90c0662f9a5380110da1ca65722b9855cd34888dc96cef809fab943e57caa +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fef363a49a7ae50314bc3dfb8cbad8f730527bb4 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:57bf62a095e0becc28763fb5d839a74fd79f033ff9493c1ef5bf31792fbae23f +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..38e3a654600429d80fd50ff6b81e7adef8b671ee --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d49a4bbea4e135bd0af496885002ae1345f4703cece8bcd90eafb4f9518e2dac +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1093a612d9b108f244fc4d846f512a70dd6e6022 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8772ae486d2bb9867e1f123563d0782dba0a4f37c010f8bb4ee75a16a21418b3 +size 27478114 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c2551fd28c2d4151bee01b0591a00cdd0deee7f --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:189df0262f232d33feeb6460d71bfb5fe882cc27e28931992d71f82629ab9a30 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9307a6c9384bf5cc7416237661bf0597e4ccf42 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bbd935e6f3acd79ac7a2d651a0c32d0bafb2aa36de9d6f2d6c11f15781e66f0 +size 27478434 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..830bc8abc44bbd3a68714537a4975b310f9f9f07 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:07bc6aa47963cda5ddd0c21334dd43ac7505cf336c74da8200b95bfd4b3a59ba +size 27478167 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c4f7af6233e5368e54587c570aa0b7d07c5e49be --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:029f8f912c23cbbc3f82e879b826976aac73a659e271e77d7a9d573a90e3b71b +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..24dba6873989de60ca61a44294bc14f058dcc546 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b4bda9e4459db8eb521ab1cd8e1d2be9ad092bbf942e3bd77c38f1e8bf1e94c4 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9a3126ad62b28936926a3d1fbfb7e04d42c83644 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9d343c1b88b33a210347004ea09219105524af81917f3dc6c0598aac083cda40 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1049ec030d0e876251e378bcc7d3a77be464a71e --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed9be8afbc42fa4b875de0f3503af8209c4284bc69fb97d2edaebe6f58f4dfeb +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f5f90939f06bb70707e84ca9625866fbfc5321f6 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:837c7895fd29dd627775e7487f422e5ca05e4c2729b3193062129b182b20ca8b +size 27478434 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..44ea0cacbf8ae13b2d6122c56cb1f21338ca2a8d --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f0cb08773bb5ebd3de7affd474247820e88b8ef55681eeb991b2404608e8ebf +size 27478050 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5858d2a4ae2d2f52d89702ecb417ef3e503cd65 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6ce4e0fa2133a1cf2962423d489e86099c03119fb10dc4b4f5e58ef2f744034 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..355baa426d84935c18cde63be886e625b8e13b51 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c60f7b92e0eddbaaaf04a9fd2ef0bfcab5e3b4aa00240952457af8afab9385b4 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d12b4cfe058ea426521e084a31dba13755abe2e6 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d73b65f432735a9841f1566da503864ce94aa65bc98789165af805b39ced327 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..61474971f8c609a94a4def60e47e53cf5450ad7f --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:137acdec3a379600bbb4d0626cc184515685eff76f8484c4072a89f920981576 +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4eb83f0f7e1b0515d921ecb08e4fef5abf99e89f --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bddf795c363f8aec0e9d40a04b3afbb5347651fbca429e7401b599c3f8d9177 +size 27478231 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a33e1184dcb9881e7c0b138bc21cba27b743861c --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1fea52f60a48c4b6c404086fbab0ca135d3346d8277671d090f870c6230646f3 +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..18e8a0564caafc87a282744a6c41c1c6eecbc90f --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6b0d67c5d7b3690f0f09ee048716e6de76a78569e7abbdda1db0070944770f3 +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f65fac0bf05cb169e4ba0cc22826be7738aa5625 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85e298d96d7813d157d394dd2020ca904c985598c02a82302c437c2358ab2517 +size 27478434 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cdc09f40caed35634d7aeb748670e55a12d06aa4 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:334886bf5ab5b95f29569fbb78d7adc00a03b0a42a47fb9374c4c96f2bd351e0 +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2169a81b67d4373694bb58266c2cbed3080b97d8 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d6217de2be3b693963912124c9a6b029fc80aa775d882c49b52cf94b52990b5 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6479d1a16d2fed68a7343f94a37a20e9c2971ae7 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9329f0bf4989d51a1de3b381e077ccb35acccf192c0710c4cd8a20e8eb2702db +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8186d8adb0bae940509b540e6d4a20b5c77e6c89 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6f2873a253349b881ebd718d7e74181c24f82718cfff3b5e3136fce122da7be1 +size 27478306 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c815ec2972483e251a86758051a2b993b6b232ac --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:562eb462f536f06adba938753745fb3f7ddc3f4325d6a53cbe86c2bcaca4967f +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..81838c7ac1664e796011dc5d77d3e7a424a448d7 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a25b9bf4fd97c6b8c626468e09ee5658cf92c63c9a07a77efc4dffb410aa2f0 +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0316187f52532815df0659dfb60b7bee0b3369c --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:087d0f9321283473895cbadf680f12fc1be768c1391c4369439b74ca0d90f5d2 +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5857d75fc9ceecebcfe4ae8484168e65f5a8847a --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32d3522275bf18c843c7a3bd96c3a335755fe35539aee3d95f88f50eed6503a4 +size 27478167 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..74966d25788ec224a3d897a208e6fad0f60e586f --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5ab15e215a2af661c36f4b08f25b54bb99e1a8a791582b7b49ec9ee1ffd5574 +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..832037c6d62b192c11136724a42ee319706cd35a --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:15fb004be191e9b7334d2119ebd3cd01194fbfb52925d89581a63b3b394b08fb +size 27478370 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ebb76d2523e90078701403deb8ec0688903afeb3 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bfc3c404c33506859252ca925dad7ac876dd0a3f4818b42a99e3ecbb185f806c +size 27478178 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5a17083d0b9d5236c120cb15c07bae0d8d26a474 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa699909b641ba304c037f9d72333dd8fd8a0877b28311df93439f2bd5a1b281 +size 27478242 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..62d8e87f82ed73cad57cebbe4f08d910461dd580 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b10f1533e4ca3b1862bdc83b91ff68df3369b4273e550b14d4c141bab3179f0 +size 27478359 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b9b53960c5a6db67533c47e8048b4ca55bb79927 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8cbaf23caf786dafc3770cf3398bc79418c340b41e5850300df3ba4e482b1d9d +size 27478103 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..99fd10aae850f9ebd4e6af9fdb740e591fcac056 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:50a2d957b164bcf62edbb0b9eeeac204ff70add1c128d210ac8e236f1d802814 +size 27478359 diff --git a/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cdb97c237ae540742f82fc24b615e8ba9c89bb89 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7717c538d833fadcd0b37dcbf05264b02c60486f3c8b4d5990ddfaad84fca973 +size 27478167 diff --git a/146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e2fe7535f179d42729e0b5c5e7006fa385f793de --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e3701980f28e5b52414cf4fbec0a418c94f1305b84544d18454a5cd2b8d78bf5 +size 80413955 diff --git a/146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bd6c54d8a3d6de88a50289575e51c9c7096a648d --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a3cca6ab47f814b94f6cdea901675a0db46585b50e31fa94905c5f26f48c621a +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..365a50bafb65fd1b115d8ff56c1648b98b62c66f --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6f231328c17248b3f0058636b3292cd39585fd2dcad72fd9a993bdced252d510 +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9efa2800aa947c4c1368e9ed8fb58d10afd6265b --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6e8324cbd6ff53866449910cb14b4c50475f45413f2127d68e0f9101262a9a71 +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..66991813fdae3a8c2427982ae0df7d7e1497a203 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:905f78c63e072ea8648c773321669252ad27707aa82f8df409938e2cb43f126a +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f30d821b2eb089ca4086dfe16dc6448d0fb9db80 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ec848a6c71e87f5488ba886c08e4d3665941ecdeb92ea7868dccc762be4f1cd9 +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e97527723fd991852c7aea86ead498f312aa100 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ddad8c1724d1551bb524c85c35f528207ba1a42c3ded44c6d21f68c7b7eadc6 +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..28baf230c19fa88cef1335d21a7eb4c04d27dabd --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67057f48ee5d389a0baaabf9f080c765a1e6c9834fd8ed05a50bdd204f41108d +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d5c6abe1134349eccdf340855d2bd6de62c84991 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a56c048b400b55b909d7a36dec773f4d9e3da897d1f7420a9fb92134a8e14802 +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..32e68e85d2e388518a6d6c2d1227be19df61eb2d --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0eebef9bc9273d83fe2177b39cf76b1050391533cdc24726e727c08349c82ff7 +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..30a61684ba3c8b73fa98441483ef1c368c68eee1 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8783824f084cdebcfa1732196b01aae07ca66068a97bebc8a9d2e7c4e5f27368 +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6002fa8b418a151205e0cd80a8d58b599d1e5112 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:111336cef9d0d25da772392c6a9c36d09beee9bd7fe4a4f31ad8107de3b0cffa +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..91884d9499dfa3010f6ba839909b63bc67db5ece --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3ce5eab6d938946a780a98a2008167a417e50568691243cd060033b67aaa2f4f +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f335e47eff4fb127c99c8ab149a3e65391ba3705 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:164afe72a4646ba15c7e5b7eb2a987efbe62a3615014f844fba11d3125e1db9b +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9ae9dfe7fe20c27d0318bd9b2c8e7f7bb7b21a9e --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bd6d3a798cd56b38ac8dec2b2297883c8deba1841778b2fded959c2184e2a88 +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..75caceab92b5d29345982481ab50f48e7cc916bd --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c6db1639980338ceb84edda99aede13a89bd653c4dfa5e00101e8102202001d6 +size 14180099 diff --git a/146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt b/146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fce07dddda3dcf0d47536ab608524d3b36a49353 --- /dev/null +++ b/146m1b5100mdedup/global_step2891/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7353249943a7b4675ad68b6c07ec01577ab8ea454e1bddd35d055373a34de384 +size 4291 diff --git a/146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt b/146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bde5697a56afccbbfa2f9fd854b8863a7fb7e40a --- /dev/null +++ b/146m1b5100mdedup/global_step2891/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2268c3e1a0f1f42c94d7c9f2224ff9257ff2081c8718d6092315a400fb3fdbf +size 35443 diff --git a/146m1b5100mdedup/sbatch_146m1b5100mdedup.sh b/146m1b5100mdedup/sbatch_146m1b5100mdedup.sh new file mode 100644 index 0000000000000000000000000000000000000000..f886c6bac03301a7a86243bd65d8e19d4e96947a --- /dev/null +++ b/146m1b5100mdedup/sbatch_146m1b5100mdedup.sh @@ -0,0 +1,166 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m1b5100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100mdedup.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 1516071000 +# -> Samples: 740269 +TRAIN_SAMPLES=740_269 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 7403 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m1b5100mdedup/sbatch_146m1b5100mdedupval.sh b/146m1b5100mdedup/sbatch_146m1b5100mdedupval.sh new file mode 100644 index 0000000000000000000000000000000000000000..4a73e9948dd5e8c516baef52e45c7f0508c49542 --- /dev/null +++ b/146m1b5100mdedup/sbatch_146m1b5100mdedupval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m1b5100mdedupval +VARIANT_CKPT=146m1b5100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m1b5100mdedup/tensorboard_146m1b5100mdedup/events.out.tfevents.1679043255.nid006724.94844.0 b/146m1b5100mdedup/tensorboard_146m1b5100mdedup/events.out.tfevents.1679043255.nid006724.94844.0 new file mode 100644 index 0000000000000000000000000000000000000000..5080ed451ef349cdc5b2d1c2f01b684ad9a2eee5 --- /dev/null +++ b/146m1b5100mdedup/tensorboard_146m1b5100mdedup/events.out.tfevents.1679043255.nid006724.94844.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b988c3cad0139aa6821ba421b4fa9ff9c8d9ae40850b71b7f62363df3b6b9a0 +size 5152964 diff --git a/146m1b5100mdedup/tensorboard_146m1b5100mdedupval/events.out.tfevents.1679047332.nid005365.100416.0 b/146m1b5100mdedup/tensorboard_146m1b5100mdedupval/events.out.tfevents.1679047332.nid005365.100416.0 new file mode 100644 index 0000000000000000000000000000000000000000..e12843fe9a2fa063d6b583c88c1e18d4917ec80f --- /dev/null +++ b/146m1b5100mdedup/tensorboard_146m1b5100mdedupval/events.out.tfevents.1679047332.nid005365.100416.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b71b6665ec5c1232c60b1f258eee2a984ddfb6fc2867ac9aa62bb63f46f4bcc4 +size 40 diff --git a/146m1b5100mdedup/tensorboard_146m1b5100mdedupval/events.out.tfevents.1682328556.nid006915.69364.0 b/146m1b5100mdedup/tensorboard_146m1b5100mdedupval/events.out.tfevents.1682328556.nid006915.69364.0 new file mode 100644 index 0000000000000000000000000000000000000000..875224f2b6e13dd2d0fed95068fbb2a36f5f0460 --- /dev/null +++ b/146m1b5100mdedup/tensorboard_146m1b5100mdedupval/events.out.tfevents.1682328556.nid006915.69364.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b5c9f7a7460b7d969132c538ba37111bc7256856aec0505707149b911721d46c +size 980 diff --git a/146m1b5100mdedup/tensorboard_146m1b5100mdedupval/events.out.tfevents.1682329298.nid006915.79204.0 b/146m1b5100mdedup/tensorboard_146m1b5100mdedupval/events.out.tfevents.1682329298.nid006915.79204.0 new file mode 100644 index 0000000000000000000000000000000000000000..8022a930ee7e20089989b3e094cfad728a2523ca --- /dev/null +++ b/146m1b5100mdedup/tensorboard_146m1b5100mdedupval/events.out.tfevents.1682329298.nid006915.79204.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b11afbd78f1b06b35d701045352adebf5893c05c47845cc2be00847aef2c5f15 +size 980 diff --git a/146m2b7100mdedup/3327412.err b/146m2b7100mdedup/3327412.err new file mode 100644 index 0000000000000000000000000000000000000000..03973f6c075b1a150adb88ac61965568690d0195 --- /dev/null +++ b/146m2b7100mdedup/3327412.err @@ -0,0 +1,1121 @@ +0: 2023-03-17 00:58:40.218626: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:58:40.218638: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:58:40.218640: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:58:40.218633: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:58:40.218627: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:58:40.218642: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:58:40.218654: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:58:40.218650: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:58:40.219587: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:58:40.219596: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:58:40.219596: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:58:40.219608: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:58:40.219623: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:58:40.219642: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:58:40.219678: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:58:40.219673: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:58:40.219957: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:58:40.219967: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:58:40.219967: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:58:40.219969: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:58:40.219964: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:58:40.219973: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:58:40.220006: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:58:40.220017: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:58:40.221322: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:58:40.221331: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:58:40.221337: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:58:40.221328: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:58:40.221337: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:58:40.221340: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:58:40.221332: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:58:40.221341: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:58:40.221459: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:58:40.221461: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:58:40.221471: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:58:40.221470: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:58:40.221472: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:58:40.221481: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:58:40.221483: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:58:40.221473: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:58:40.221724: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:58:40.221735: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:58:40.221727: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:58:40.221743: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:58:40.221740: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:58:40.221747: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:58:40.221753: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:58:40.221760: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:58:40.222205: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:58:40.222214: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:58:40.222226: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:58:40.222236: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:58:40.222238: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:58:40.222238: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:58:40.222241: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:58:40.222245: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:58:40.222531: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:58:40.222535: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:58:40.222542: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:58:40.222538: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:58:40.222527: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:58:40.222546: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:58:40.222527: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:58:40.222550: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:58:41.771437: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:41.771436: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:41.771433: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:41.771444: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:41.771442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:41.771433: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:41.771443: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:41.771443: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:41.771803: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:58:41.771805: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:58:41.771812: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:58:41.771812: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:58:41.771814: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:58:41.771817: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:58:41.771816: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:58:41.771819: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:58:41.810516: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:41.810510: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:41.810515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:41.810510: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:41.810526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:41.810521: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:41.810523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:41.810520: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:41.810712: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:58:41.810715: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:58:41.810720: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:58:41.810720: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:58:41.810723: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:58:41.810722: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:58:41.810724: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:58:41.810729: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:58:41.814285: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:41.814287: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:41.814296: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:41.814293: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:41.814291: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:41.814298: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:41.814296: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:41.814297: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:41.814691: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:58:41.814692: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:58:41.814697: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:58:41.814696: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:58:41.814699: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:58:41.814701: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:58:41.814703: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:58:41.814706: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:58:41.835647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:41.835660: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:41.835657: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:41.835657: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:41.835654: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:41.835660: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:41.835661: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:41.835668: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:41.836088: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:58:41.836087: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:58:41.836095: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:58:41.836093: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:58:41.836095: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:58:41.836098: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:58:41.836098: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:58:41.836100: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:58:41.836270: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:41.836269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:41.836277: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:41.836282: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:41.836277: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:41.836277: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 00:58:41.836461: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:41.836279: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:41.836278: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 00:58:41.836455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:41.836665: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:58:41.836668: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:58:41.836670: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:58:41.836671: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:58:41.836461: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-17 00:58:41.836673: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:58:41.836673: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:58:41.836675: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:58:41.836678: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:41.836462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:41.836464: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:41.836466: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:41.836466: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:41.836471: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:41.836861: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:58:41.836866: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:58:41.836870: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:58:41.836873: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:58:41.836871: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:58:41.836874: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:58:41.836875: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:58:41.836878: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:58:41.898827: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:41.898834: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:41.898832: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:41.898841: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:41.898841: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:41.898836: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:41.898837: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:41.898848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:41.899238: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:58:41.899242: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:58:41.899245: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:58:41.899248: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:58:41.899252: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:58:41.899258: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:58:41.899260: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:58:41.899264: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:58:42.007040: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:42.007039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:42.007051: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:42.007054: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:42.007049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:42.007044: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:42.007046: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:42.007048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:42.007448: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:58:42.007455: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:58:42.007455: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:58:42.007457: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:58:42.007458: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:58:42.007460: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:58:42.007466: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:58:42.007466: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:58:51.789106: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.789121: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.789133: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.789138: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.789146: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.789144: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.789201: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.789203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.796149: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.796165: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:58:51.796161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.796162: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.796162: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.796168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.796165: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.796182: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:58:51.796183: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:58:51.796183: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:58:51.796184: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:58:51.796185: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:58:51.796198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.796199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:58:51.796213: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:58:51.796214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:58:51.797360: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.797387: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.797404: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.797419: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.797433: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.797442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.797448: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.797525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.799789: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.799791: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.799805: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:58:51.799797: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.799798: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.799804: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.799803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.799807: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.799817: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:58:51.799818: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:58:51.799820: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:58:51.799825: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:58:51.799829: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:58:51.799828: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:58:51.799844: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:58:51.799867: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:58:51.805848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.805875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.805899: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.805914: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.805931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.805952: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.805949: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.805957: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.806762: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.806787: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.806808: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.806814: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.806826: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.806831: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.806851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.807045: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.807089: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.807107: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.807121: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.807129: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.807145: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.807150: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.807150: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.807161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.807256: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.807299: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.807305: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.807326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.807329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.807508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.807517: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.807535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.808217: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.808219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.808223: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.808227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.808228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.808234: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:58:51.808233: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:58:51.808230: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.808231: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.808244: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:58:51.808249: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:58:51.808248: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:58:51.808252: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:58:51.808253: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:58:51.808283: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:58:51.808299: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:58:51.809297: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 00:58:51.809319: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.809319: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 00:58:51.809302: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.809320: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 00:58:51.809300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.809320: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 00:58:51.809303: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.809323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 00:58:51.809307: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.809323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 00:58:51.809304: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.809318: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:58:51.809328: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.809325: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:58:51.809337: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:58:51.809337: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:58:51.809339: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:58:51.809340: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:58:51.809342: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:58:51.809343: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:58:51.809344: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:58:51.809345: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:58:51.809306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.809313: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:58:51.809321: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:58:51.809323: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:58:51.809324: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:58:51.809327: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:58:51.809328: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:58:51.809342: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:58:51.809358: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:58:51.810235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.810253: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.810262: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.810272: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.810276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.810280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.810290: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.810292: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.812368: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.812373: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.812376: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.812378: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.812380: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.812384: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:58:51.812380: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.812381: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.812382: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:58:51.812390: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:58:51.812393: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:58:51.812395: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:58:51.812397: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:58:51.812399: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:58:51.812402: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:58:51.812402: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:58:51.822612: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.822640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.822652: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.822676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.822689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.822700: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.822702: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.822894: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.824968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.824971: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.824976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.824977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.824986: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:58:51.824978: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.824980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.824987: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:58:51.824995: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:58:51.824997: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:58:51.825000: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:58:51.825001: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:58:51.825011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.825025: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:58:51.825023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:58:51.825040: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:58:51.809923: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.809928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.809929: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.809934: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.809932: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.809936: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.809940: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.809932: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:58:51.809943: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:58:51.809950: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:58:51.809950: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:58:51.809954: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:58:51.809956: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:58:51.809956: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:58:51.809956: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:58:51.809960: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +6: Successfully preprocessed all matching files. +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: +2: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: +7: +7: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +3: Building extension module utils... +3: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +3: Loading extension module utils... +0: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +0: Loading extension module utils... +5: Loading extension module utils... +0: Loading extension module utils... +5: Loading extension module utils... +0: Loading extension module utils... +5: Loading extension module utils... +0: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +2: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +2: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Loading extension module utils... +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +0: +0: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: +5: Loading extension module utils...Loading extension module utils... +5: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +0: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +3: +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: +3: +3: Loading extension module utils...Loading extension module utils...Loading extension module utils... +3: +3: +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m2b7100mdedup/3327412.out b/146m2b7100mdedup/3327412.out new file mode 100644 index 0000000000000000000000000000000000000000..945eb9245de13fffd4e9cbfa88e50660f6f341d9 --- /dev/null +++ b/146m2b7100mdedup/3327412.out @@ -0,0 +1,5664 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m2b7100mdedupval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m2b7100mdedupval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m2b7100mdedup --load checkpoints_146m2b7100mdedup --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3327412.json --zero-stage 0 +START 3327412: Fri 17 Mar 2023 12:58:08 AM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 40.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 49.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 43.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 46.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 47.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 44.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 39.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 48.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 38.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 45.0c 103.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 38.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 46.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 44.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 46.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 41.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 43.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 43.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +4: Launching on nid006526 (4/8), master nid006522 port 9999, GPUs 8, CUDA: True +5: Launching on nid006527 (5/8), master nid006522 port 9999, GPUs 8, CUDA: True +7: Launching on nid006529 (7/8), master nid006522 port 9999, GPUs 8, CUDA: True +2: Launching on nid006524 (2/8), master nid006522 port 9999, GPUs 8, CUDA: True +0: Launching on nid006522 (0/8), master nid006522 port 9999, GPUs 8, CUDA: True +1: Launching on nid006523 (1/8), master nid006522 port 9999, GPUs 8, CUDA: True +3: Launching on nid006525 (3/8), master nid006522 port 9999, GPUs 8, CUDA: True +6: Launching on nid006528 (6/8), master nid006522 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3327412.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m2b7100mdedupval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m2b7100mdedup +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m2b7100mdedup +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m2b7100mdedupval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-17 01:00:12,369] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.096 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: [1/1] c++ layer_norm_cuda.o layer_norm_hip_kernel.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so +0: >>> done with compiling and loading fused kernels. Compilation time: 27.213 seconds +0: time to initialize megatron (seconds): 26.470 +0: [after megatron is initialized] datetime: 2023-03-17 01:00:42 +0: building GPT model ... +0: [2023-03-17 01:00:42,611] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-17 01:00:42,612] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-17 01:00:42,612] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.89 GB, percent = 6.1% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-17 01:00:44,614] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-17 01:00:45,027] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-17 01:00:45,028] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-17 01:00:45,028] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.91 GB, percent = 6.1% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-17 01:00:45,030] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-17 01:00:58,290] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-17 01:00:58,290] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-17 01:00:58,290] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-17 01:00:58,295] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-17 01:00:58,295] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-17 01:00:58,416] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-17 01:00:58,416] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 01:00:58,417] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.59 GB, percent = 6.3% +3: ninja: no work to do. +3: Time to load utils op: 0.3082616329193115 seconds +3: Time to load utils op: 0.30515599250793457 seconds +3: Time to load utils op: 0.33014559745788574 seconds +0: Time to load utils op: 0.22171616554260254 seconds +0: Time to load utils op: 0.30805492401123047 seconds +0: Time to load utils op: 0.30771374702453613 secondsTime to load utils op: 0.3081479072570801 seconds +0: +0: Time to load utils op: 0.306995153427124 seconds +0: Time to load utils op: 0.3080897331237793 seconds +0: Time to load utils op: 0.31109118461608887 seconds +0: Time to load utils op: 0.30840396881103516 seconds +5: Time to load utils op: 0.30831241607666016 secondsTime to load utils op: 0.30889892578125 seconds +5: +5: Time to load utils op: 0.3088960647583008 seconds +5: Time to load utils op: 0.3080422878265381 seconds +5: Time to load utils op: 0.3386411666870117 seconds +5: Time to load utils op: 0.3077542781829834 secondsTime to load utils op: 0.3078644275665283 secondsTime to load utils op: 0.3079979419708252 seconds +5: +5: +3: Time to load utils op: 0.30359864234924316 seconds +3: Time to load utils op: 0.3030819892883301 seconds +3: Time to load utils op: 0.3030104637145996 seconds +3: Time to load utils op: 0.30310869216918945 seconds +3: Time to load utils op: 0.3030421733856201 seconds +1: Time to load utils op: 0.31154441833496094 seconds +1: Time to load utils op: 0.31156349182128906 seconds +1: Time to load utils op: 0.31156325340270996 seconds +1: Time to load utils op: 0.3115670680999756 seconds +1: Time to load utils op: 0.31157875061035156 seconds +1: Time to load utils op: 0.3115816116333008 seconds +1: Time to load utils op: 0.31159520149230957 seconds +1: Time to load utils op: 0.311596155166626 seconds +6: Time to load utils op: 0.310760498046875 seconds +6: Time to load utils op: 0.31037139892578125 seconds +6: Time to load utils op: 0.3104398250579834 seconds +6: Time to load utils op: 0.3111753463745117 seconds +6: Time to load utils op: 0.3104426860809326 seconds +6: Time to load utils op: 0.31043338775634766 secondsTime to load utils op: 0.3104286193847656 seconds +6: Time to load utils op: 0.3103790283203125 seconds +6: +2: Time to load utils op: 0.3111386299133301 seconds +2: Time to load utils op: 0.31115007400512695 seconds +2: Time to load utils op: 0.3111562728881836 seconds +2: Time to load utils op: 0.3111839294433594 secondsTime to load utils op: 0.31118297576904297 seconds +2: +2: Time to load utils op: 0.3111903667449951 seconds +2: Time to load utils op: 0.31121397018432617 seconds +2: Time to load utils op: 0.3112170696258545 seconds +4: Time to load utils op: 0.31171417236328125 seconds +4: Time to load utils op: 0.3117234706878662 seconds +4: Time to load utils op: 0.31173276901245117 seconds +4: Time to load utils op: 0.3117411136627197 seconds +4: Time to load utils op: 0.3117496967315674 seconds +4: Time to load utils op: 0.3116300106048584 seconds +4: Time to load utils op: 0.31177306175231934 seconds +4: Time to load utils op: 0.3117692470550537 seconds +7: Time to load utils op: 0.31281256675720215 seconds +7: Time to load utils op: 0.31282830238342285 seconds +7: Time to load utils op: 0.31281232833862305 seconds +7: Time to load utils op: 0.31284451484680176 seconds +7: Time to load utils op: 0.3128678798675537 seconds +7: Time to load utils op: 0.3128504753112793 seconds +7: Time to load utils op: 0.3128652572631836 seconds +7: Time to load utils op: 0.31273961067199707 seconds +0: [2023-03-17 01:00:58,758] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-17 01:00:58,759] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 01:00:58,759] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.6 GB, percent = 6.3% +5: Time to load utils op: 0.0005125999450683594 seconds +5: Time to load utils op: 0.0003757476806640625 seconds +0: Time to load utils op: 0.00038504600524902344 seconds +5: Time to load utils op: 0.0004992485046386719 seconds +5: Time to load utils op: 0.00039649009704589844 seconds +6: Time to load utils op: 0.0007691383361816406 seconds +0: Time to load utils op: 0.0006043910980224609 seconds +5: Time to load utils op: 0.00041961669921875 secondsTime to load utils op: 0.0004036426544189453 seconds +5: +0: Time to load utils op: 0.0004405975341796875 seconds +0: Time to load utils op: 0.0004546642303466797 seconds +5: Time to load utils op: 0.00039887428283691406 seconds +0: Time to load utils op: 0.00045013427734375 seconds +5: Time to load utils op: 0.0003981590270996094 seconds +0: Time to load utils op: 0.00044345855712890625 seconds +0: Time to load utils op: 0.000446319580078125 seconds +6: Time to load utils op: 0.0011248588562011719 seconds +6: Time to load utils op: 0.0012285709381103516 secondsTime to load utils op: 0.0013637542724609375 seconds +6: +6: Time to load utils op: 0.0013163089752197266 seconds +6: Time to load utils op: 0.0013294219970703125 seconds +6: Time to load utils op: 0.001344442367553711 seconds +6: Time to load utils op: 0.0013766288757324219 seconds +2: Time to load utils op: 0.0008792877197265625 seconds +2: Time to load utils op: 0.000850677490234375 seconds +3: Time to load utils op: 0.00054168701171875 seconds +2: Time to load utils op: 0.0011930465698242188 seconds +2: Time to load utils op: 0.0009124279022216797 seconds +2: Time to load utils op: 0.0012662410736083984 seconds +2: Time to load utils op: 0.0011394023895263672 seconds +2: Time to load utils op: 0.00104522705078125 seconds +2: Time to load utils op: 0.0012137889862060547 seconds +3: Time to load utils op: 0.0005130767822265625 seconds +3: Time to load utils op: 0.00046372413635253906 secondsTime to load utils op: 0.00048470497131347656 seconds +3: +3: Time to load utils op: 0.0004899501800537109 seconds +3: Time to load utils op: 0.0004906654357910156 secondsTime to load utils op: 0.0004761219024658203 seconds +3: +1: Time to load utils op: 0.0011909008026123047 seconds +3: Time to load utils op: 0.0005352497100830078 seconds +1: Time to load utils op: 0.0011646747589111328 seconds +1: Time to load utils op: 0.0014221668243408203 seconds +4: Time to load utils op: 0.0014064311981201172 seconds +1: Time to load utils op: 0.0014522075653076172 seconds +1: Time to load utils op: 0.0014700889587402344 seconds +1: Time to load utils op: 0.0014448165893554688 seconds +1: Time to load utils op: 0.0014805793762207031 seconds +1: Time to load utils op: 0.001474618911743164 seconds +4: Time to load utils op: 0.0014262199401855469 seconds +4: Time to load utils op: 0.0013422966003417969 secondsTime to load utils op: 0.001424551010131836 seconds +4: Time to load utils op: 0.0015180110931396484 seconds +4: +4: Time to load utils op: 0.001392364501953125 seconds +4: Time to load utils op: 0.0015425682067871094 seconds +4: Time to load utils op: 0.0015227794647216797 seconds +7: Time to load utils op: 0.0012054443359375 seconds +7: Time to load utils op: 0.0015821456909179688 seconds +7: Time to load utils op: 0.001512289047241211 seconds +7: Time to load utils op: 0.0015726089477539062 seconds +7: Time to load utils op: 0.0016160011291503906 seconds +7: Time to load utils op: 0.0016672611236572266 seconds +7: Time to load utils op: 0.0017731189727783203 seconds +7: Time to load utils op: 0.0016329288482666016 seconds +0: [2023-03-17 01:00:58,884] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-17 01:00:58,884] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 01:00:58,885] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.74 GB, percent = 6.3% +0: [2023-03-17 01:00:58,987] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-17 01:00:58,988] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 01:00:58,988] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.74 GB, percent = 6.3% +0: [2023-03-17 01:00:59,092] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-17 01:00:59,093] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 01:00:59,093] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.74 GB, percent = 6.3% +0: [2023-03-17 01:00:59,194] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-17 01:00:59,194] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 01:00:59,195] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.74 GB, percent = 6.3% +0: [2023-03-17 01:00:59,298] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-17 01:00:59,299] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 01:00:59,299] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.74 GB, percent = 6.3% +0: [2023-03-17 01:00:59,400] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-17 01:00:59,401] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 01:00:59,401] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.74 GB, percent = 6.3% +0: [2023-03-17 01:00:59,508] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-17 01:00:59,509] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 01:00:59,509] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.74 GB, percent = 6.3% +0: [2023-03-17 01:00:59,611] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-17 01:00:59,612] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 01:00:59,612] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.74 GB, percent = 6.3% +0: [2023-03-17 01:00:59,612] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-17 01:00:59,612] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-17 01:00:59,612] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-17 01:00:59,612] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-17 01:00:59,612] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-17 01:00:59,613] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-17 01:00:59,614] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-17 01:00:59,615] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-17 01:00:59,615] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-17 01:00:59,615] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.00042128562927246094 seconds +0: [2023-03-17 01:00:59,615] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-17 01:00:59,676] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +6: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +6: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +3: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +1: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +5: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +6: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +6: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +5: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +6: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +6: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +3: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +2: [2023-03-17 01:00:59,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +7: [2023-03-17 01:00:59,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +7: [2023-03-17 01:00:59,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt... +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt. +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:00:59,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +2: [2023-03-17 01:00:59,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +6: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +1: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +6: [2023-03-17 01:00:59,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +4: [2023-03-17 01:00:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:00:59,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +6: [2023-03-17 01:00:59,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +6: [2023-03-17 01:00:59,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +5: [2023-03-17 01:00:59,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:00:59,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:00:59,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:00:59,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +6: [2023-03-17 01:00:59,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +6: [2023-03-17 01:00:59,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +3: [2023-03-17 01:00:59,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt... +7: [2023-03-17 01:00:59,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:00:59,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:00:59,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:00:59,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +5: [2023-03-17 01:00:59,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:00:59,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:00:59,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:00:59,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:00:59,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +3: [2023-03-17 01:00:59,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:00:59,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:00:59,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:00:59,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:00:59,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:00:59,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:00:59,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:00:59,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:00:59,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +7: [2023-03-17 01:00:59,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +1: [2023-03-17 01:00:59,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:00:59,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:00:59,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +6: [2023-03-17 01:00:59,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +4: [2023-03-17 01:00:59,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt. +2: [2023-03-17 01:00:59,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:00:59,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:00:59,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:00:59,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:00:59,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:00:59,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:00:59,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:00:59,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:00:59,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:00:59,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:00:59,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:00:59,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:00:59,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +3: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +1: [2023-03-17 01:01:00,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +7: [2023-03-17 01:01:00,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:01:00,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +4: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt... +5: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +2: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +5: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:01:00,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +6: [2023-03-17 01:01:00,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +4: [2023-03-17 01:01:00,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +7: [2023-03-17 01:01:00,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +3: [2023-03-17 01:01:00,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt. +1: [2023-03-17 01:01:00,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +4: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +6: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +2: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +1: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +3: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +7: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt... +5: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +1: [2023-03-17 01:01:00,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +5: [2023-03-17 01:01:00,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +4: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +6: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +3: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +2: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt. +7: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +7: [2023-03-17 01:01:00,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +6: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +3: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +1: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +5: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt... +2: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +5: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +1: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +6: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +2: [2023-03-17 01:01:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +3: [2023-03-17 01:01:00,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +4: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt. +7: [2023-03-17 01:01:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +4: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +3: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +2: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +5: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +1: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +7: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +7: [2023-03-17 01:01:00,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt... +6: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +5: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +3: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +6: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +1: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +2: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +4: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:01:00,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +4: [2023-03-17 01:01:00,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +1: [2023-03-17 01:01:00,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +7: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +2: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +6: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +5: [2023-03-17 01:01:00,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt... +3: [2023-03-17 01:01:00,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +2: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +1: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +4: [2023-03-17 01:01:00,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +6: [2023-03-17 01:01:00,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +7: [2023-03-17 01:01:00,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +5: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +3: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:01:00,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +3: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +2: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +5: [2023-03-17 01:01:00,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +7: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +6: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +2: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt... +1: [2023-03-17 01:01:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +5: [2023-03-17 01:01:00,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +4: [2023-03-17 01:01:00,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +1: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +6: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +3: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +7: [2023-03-17 01:01:00,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:01:00,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +5: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +1: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +7: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +2: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +5: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +6: [2023-03-17 01:01:00,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt... +3: [2023-03-17 01:01:00,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +7: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +1: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +2: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +6: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +3: [2023-03-17 01:01:00,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:01:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt. +4: [2023-03-17 01:01:00,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +7: [2023-03-17 01:01:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +5: [2023-03-17 01:01:00,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +3: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +5: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +1: [2023-03-17 01:01:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +6: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +6: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +7: [2023-03-17 01:01:00,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +3: [2023-03-17 01:01:00,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +2: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +2: [2023-03-17 01:01:00,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +1: [2023-03-17 01:01:00,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +4: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:01:00,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:01:00,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt. +4: [2023-03-17 01:01:00,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +6: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +2: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +7: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +5: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +4: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +1: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +1: [2023-03-17 01:01:00,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +2: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +3: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt... +3: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +5: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +7: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +6: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:01:00,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt. +4: [2023-03-17 01:01:00,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +1: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +6: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +2: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +7: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:01:00,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt... +5: [2023-03-17 01:01:00,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +5: [2023-03-17 01:01:00,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +3: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +1: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +2: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +6: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +7: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +4: [2023-03-17 01:01:00,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:01:00,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +5: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +7: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +4: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +1: [2023-03-17 01:01:00,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +6: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +5: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +3: [2023-03-17 01:01:00,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:00,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt... +2: [2023-03-17 01:01:00,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:00,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +1: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:00,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +7: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:00,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +2: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +6: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:00,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:00,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +4: [2023-03-17 01:01:00,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt. +3: [2023-03-17 01:01:00,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:00,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:00,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:00,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:01,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +3: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +7: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +5: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +2: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +5: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +6: [2023-03-17 01:01:01,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +6: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +6: [2023-03-17 01:01:01,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +7: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt... +1: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +4: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +2: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +1: [2023-03-17 01:01:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +3: [2023-03-17 01:01:01,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:01:01,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +5: [2023-03-17 01:01:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +6: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +6: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +6: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +6: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +6: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +6: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +6: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +7: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +2: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +3: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +4: [2023-03-17 01:01:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt... +1: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +7: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +2: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +5: [2023-03-17 01:01:01,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +6: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +1: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +3: [2023-03-17 01:01:01,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt. +4: [2023-03-17 01:01:01,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +2: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +1: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +3: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +7: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +6: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +6: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +6: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt... +5: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +3: [2023-03-17 01:01:01,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +7: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +5: [2023-03-17 01:01:01,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +2: [2023-03-17 01:01:01,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +1: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:01:01,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt. +4: [2023-03-17 01:01:01,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +6: [2023-03-17 01:01:01,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +5: [2023-03-17 01:01:01,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +1: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +3: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +4: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt... +7: [2023-03-17 01:01:01,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +7: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +2: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +5: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +6: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +3: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +1: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +4: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:01:01,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +3: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +3: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +4: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +6: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +6: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +7: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +2: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +5: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +1: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +4: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:01:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:01:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:01:01,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:01:01,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:01:01,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:01:01,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:01:01,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:01:01,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:01:01,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:01:01,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:01:01,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +0: [2023-03-17 01:01:01,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:01:01,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:01:01,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:01:01,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:01:01,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:01:01,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:01:01,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:01:01,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:01:01,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:01:01,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:01:01,442] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +2: [2023-03-17 01:01:01,444] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +2: [2023-03-17 01:01:01,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:01:01,444] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +6: [2023-03-17 01:01:01,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:01:01,444] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +2: [2023-03-17 01:01:01,446] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +6: [2023-03-17 01:01:01,446] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +0: [2023-03-17 01:01:01,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:01:01,447] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +0: [2023-03-17 01:01:01,449] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +7: [2023-03-17 01:01:01,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:01:01,452] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +7: [2023-03-17 01:01:01,454] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +6: [2023-03-17 01:01:01,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:01:01,454] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +3: [2023-03-17 01:01:01,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:01:01,456] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +1: [2023-03-17 01:01:01,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:01:01,456] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +1: [2023-03-17 01:01:01,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:01:01,456] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +1: [2023-03-17 01:01:01,456] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +3: [2023-03-17 01:01:01,458] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +1: [2023-03-17 01:01:01,459] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +1: [2023-03-17 01:01:01,459] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +7: [2023-03-17 01:01:01,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:01:01,463] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +5: [2023-03-17 01:01:01,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:01:01,463] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +7: [2023-03-17 01:01:01,464] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +5: [2023-03-17 01:01:01,464] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +5: [2023-03-17 01:01:01,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:01:01,467] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +3: [2023-03-17 01:01:01,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:01:01,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:01:01,468] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +5: [2023-03-17 01:01:01,468] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +3: [2023-03-17 01:01:01,466] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +3: [2023-03-17 01:01:01,468] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +4: [2023-03-17 01:01:01,470] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +4: [2023-03-17 01:01:01,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:01:01,472] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +4: [2023-03-17 01:01:01,473] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +0: [2023-03-17 01:01:01,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:01:01,481] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +0: [2023-03-17 01:01:01,483] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +6: [2023-03-17 01:01:01,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:01:01,486] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +2: [2023-03-17 01:01:01,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:01:01,487] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +6: [2023-03-17 01:01:01,488] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +2: [2023-03-17 01:01:01,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:01:01,488] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +2: [2023-03-17 01:01:01,489] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +0: [2023-03-17 01:01:01,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:01:01,490] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +2: [2023-03-17 01:01:01,490] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +6: [2023-03-17 01:01:01,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:01:01,491] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +0: [2023-03-17 01:01:01,492] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +7: [2023-03-17 01:01:01,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:01:01,492] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +6: [2023-03-17 01:01:01,493] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +7: [2023-03-17 01:01:01,494] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +1: [2023-03-17 01:01:01,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:01:01,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:01:01,494] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +6: [2023-03-17 01:01:01,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:01:01,494] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +6: [2023-03-17 01:01:01,495] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +5: [2023-03-17 01:01:01,496] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +7: [2023-03-17 01:01:01,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:01:01,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:01:01,496] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +6: [2023-03-17 01:01:01,496] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +1: [2023-03-17 01:01:01,496] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +6: [2023-03-17 01:01:01,496] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +3: [2023-03-17 01:01:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:01:01,497] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +6: [2023-03-17 01:01:01,497] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +7: [2023-03-17 01:01:01,497] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +2: [2023-03-17 01:01:01,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:01:01,499] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +2: [2023-03-17 01:01:01,498] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +1: [2023-03-17 01:01:01,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:01:01,498] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +3: [2023-03-17 01:01:01,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:01:01,500] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +1: [2023-03-17 01:01:01,500] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +7: [2023-03-17 01:01:01,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:01:01,499] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +5: [2023-03-17 01:01:01,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:01:01,503] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +7: [2023-03-17 01:01:01,501] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +3: [2023-03-17 01:01:01,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:01:01,502] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +3: [2023-03-17 01:01:01,500] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +3: [2023-03-17 01:01:01,501] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +3: [2023-03-17 01:01:01,502] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +5: [2023-03-17 01:01:01,505] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +1: [2023-03-17 01:01:01,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:01:01,506] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +1: [2023-03-17 01:01:01,508] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +3: [2023-03-17 01:01:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:01:01,511] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +6: [2023-03-17 01:01:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:01:01,512] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +0: [2023-03-17 01:01:01,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:01:01,513] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +5: [2023-03-17 01:01:01,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:01:01,513] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +3: [2023-03-17 01:01:01,513] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +6: [2023-03-17 01:01:01,513] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +4: [2023-03-17 01:01:01,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:01:01,514] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +0: [2023-03-17 01:01:01,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:01:01,514] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +5: [2023-03-17 01:01:01,515] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +4: [2023-03-17 01:01:01,516] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +0: [2023-03-17 01:01:01,516] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +4: [2023-03-17 01:01:01,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:01:01,517] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +4: [2023-03-17 01:01:01,517] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +4: [2023-03-17 01:01:01,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:01:01,518] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +4: [2023-03-17 01:01:01,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:01:01,518] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +0: [2023-03-17 01:01:01,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:01:01,519] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +0: [2023-03-17 01:01:01,519] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +4: [2023-03-17 01:01:01,520] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +4: [2023-03-17 01:01:01,520] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +1: [2023-03-17 01:01:01,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:01:01,520] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +1: [2023-03-17 01:01:01,520] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +5: [2023-03-17 01:01:01,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:01:01,522] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +5: [2023-03-17 01:01:01,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:01:01,522] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +1: [2023-03-17 01:01:01,522] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +5: [2023-03-17 01:01:01,523] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +5: [2023-03-17 01:01:01,524] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +2: [2023-03-17 01:01:01,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:01:01,524] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +2: [2023-03-17 01:01:01,525] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +7: [2023-03-17 01:01:01,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:01:01,529] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +1: [2023-03-17 01:01:01,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:01:01,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:01:01,530] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +1: [2023-03-17 01:01:01,530] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +7: [2023-03-17 01:01:01,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:01:01,531] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +7: [2023-03-17 01:01:01,531] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +6: [2023-03-17 01:01:01,531] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +1: [2023-03-17 01:01:01,531] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +3: [2023-03-17 01:01:01,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:01:01,532] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +5: [2023-03-17 01:01:01,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:01:01,532] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +7: [2023-03-17 01:01:01,532] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +3: [2023-03-17 01:01:01,533] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +5: [2023-03-17 01:01:01,534] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +2: [2023-03-17 01:01:01,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:01:01,535] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +2: [2023-03-17 01:01:01,537] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +3: [2023-03-17 01:01:01,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:01:01,538] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +0: [2023-03-17 01:01:01,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:01:01,539] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +3: [2023-03-17 01:01:01,540] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +1: [2023-03-17 01:01:01,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:01:01,540] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +0: [2023-03-17 01:01:01,540] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +1: [2023-03-17 01:01:01,542] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +4: [2023-03-17 01:01:01,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:01:01,550] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +4: [2023-03-17 01:01:01,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:01:01,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:01:01,551] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +4: [2023-03-17 01:01:01,551] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +7: [2023-03-17 01:01:01,551] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +4: [2023-03-17 01:01:01,553] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +7: [2023-03-17 01:01:01,553] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +2: [2023-03-17 01:01:01,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:01:01,589] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +2: [2023-03-17 01:01:01,591] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +0: [2023-03-17 01:01:01,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:01:01,604] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +0: [2023-03-17 01:01:01,605] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +0: successfully loaded checkpoint from checkpoints_146m2b7100mdedup at iteration 0 +7: time (ms) | load-checkpoint: 1935.93 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-17 01:01:02 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.036777 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.090 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.035074 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.079 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-17 01:01:13 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 19800.04 | train/valid/test-data-iterators-setup: 10473.50 +0: [after training is done] datetime: 2023-03-17 01:01:13 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.949182E+00 | lm loss PPL: 5.189290E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3327412: Fri 17 Mar 2023 01:01:37 AM EET diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c31a4a1ae14286929ba4fb7f78ee26a257c06429 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f3b9a32360828074d6fd770d8efbeae762c00213fbb4c7069a9f02cd6a034f3d +size 27478295 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c0ab7eeae4ada8bb79dc6f69a52c2ff664ed212d --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4ec65f3f81f67d2518bec69edd93020571d92b5bcdd286d13590c25af94cea9 +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0ae465692bb98b1721f7a7c90b957bc36efea174 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6987452803b3f1a2021579e79d6e1fca1495c9f03ef5dd637a22acb4c7ac6dea +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..975a278fd98639f0110c40e2a15c3976fa9959c1 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:24cfc04927fdb2b54ea51d78585f9a81d2b251d389a10bbe4cf9766e222cc579 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb149edaee4dca6e20be115d7c27b64222d91518 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11bfbc8ee4f800a8f12ac3d1053cec545f28229e630956cd2985ea81fa574cf6 +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..30dde194b729e8ee719cd4154c6c10f0dee87e0f --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:49923ad718440d15bba58cf91dc19c48ec3026a65560d4dae0c4ad416f5226fc +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ce5080c51043763037e1f799ace3242d3a6ae823 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e03ebcef00e9eb82ef661f73d38367b1a521dce8eac84207bd72d542bd29fccb +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c44db6fb15c137c468d01fb86c0f827564d1ae1 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3fd3ba1ee83efbe20d17b700431c44d3d8fe1213d88199cd6b76902914c17dd +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f01003640ce9c52efa4724c70595a9ad91f5c500 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bea5bbeefb9cd5a773209725312f05bfdbc93b0b0cb040817c66c2b802038318 +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7a1d4829538e01492e2b13c58dc14e3048e1346 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:79b79f5221ff9106437d556f2aeb803916c72674c87904eb46d22352986b0d70 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..48f446a5b87d23ec6cdf4c86c86b5d730e1b775c --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee69232211c75e389cff4118e8dc5e01c443694d0b5e1f1bca7a1d2c581a10a8 +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fbdd95d94a2c029be157baf1a94fc3ea571299c3 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4d02430ba2d527f5830e146007a64737fa072d41e93d95ff86423d6ff951cbc9 +size 27478231 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f75e067555d94cb2ae257964d29047136a14b28a --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:782b1bf87e6ddbb6c2d6164460ad7e4ef161e4de601663a919a274deb2353bd9 +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..596edf703f7901406fe3d1545fb685bc47518ff8 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8dc71cea562a21fee3e1e027f35c16597c35f6c69baefba1dfe840f869981137 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4bf376c5208ce7d022c17ad1ad2a7d2eb498cc48 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:107261a999a32a922dfb8693cf90dca15515aaf1cbc68b2166f98d89e5dd1273 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4721d9f2b3e8439c4c7d9107586c813668fa534e --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d15b4e000989ea5fe03977e23b8f40909619910fa97446ca904f11185666e4ab +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dfeb070470ce31ed947d5ccca1497963053b789a --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72ac726c86b4ac1506f3db6f8967c398f11d63d57ed451abbf7da4caea4103f9 +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c79ca09db660b83c31416ccf0e06661b70613b11 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c06fd8844bf8ffae45d0eb5aadccd69ec4fdd19563f515ac9832b2e39eb6bacd +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d37e293deea53bc07874e5dadd11f64e0a98d1a5 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a7a54322bc91ba7739c8feba164facf29e405437a8f36ca276a50521ab5240f6 +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe6cb57df3a37dc2d7df92b1adf9f6258f277aad --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:873d2b1dcbc2078f11f1c9f646b6502a0f578d6330f7fde0edbebf6e3a95c604 +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f5dcd421c745eef2d6153d843d08d9068eddd163 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4632946ed2a2b4ec3daeff629956f8d24eab9f4f39ea83f9a7bfd513691a05d7 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..da611bbcaeb7df64e81c2c9196bc2ebef8d45544 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59ee73ad42ab5dfe443daa4c8b07c73bc7d8558187c819133ffe8e6e9792b509 +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7a3734994cdd3bc1dc52c5b8d297eab456516d2a --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05b468340e51ff5e538d12f964b8e8a21106d42d1848a3cf5353fc607c523fd5 +size 27478231 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fb5c0c6bbb95cb0d59a98b86c2b28b530557ceca --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:448cd6f73fafc89a7725ebe90bf6f7623a0a1e624c2bf65027a2b600b01a1f31 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..59a51839cad07d5ef0b9f8c8f156ccfffd24dea7 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf8d1a66cd3fb09d21575af7428faca0566b609b850e1563df22b3b85341560b +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..68d4644c994e7461a39ff78064e3ad80599cc811 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:61fb64398c0588291bcebf108680a55bc854a7ac155f89c9bdb6202b97851095 +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..abea5b4e48847a56b6b77b7b06401bb18073f3dc --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:15190b6c9e1aed121afd08cd6983cb09fa6bc13cc747d2eac193b8007491dc86 +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5f2b14f4bcf6c2b0784e1ab12029895375336ea9 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3463087bd83082c3432f8a5366c5aa60e9efb2e9771d237fa06dce2630e3f00 +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..44fd03e3fe974af0ce84a2dc6da9e826155002c1 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8395275c76b604358b66a618d1935a66776fba9b79deef99c539bbca2178e7b6 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6113bc5d95ae583d5f66c46c9a2ab460a6da041a --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19aeaf23288bed886cacae0ea2ee2b61e5ee6fc9f5f95146054c415f6b5d53be +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..26bbfc6316cd604aa1c4ec26641da003f37fbf19 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d420e9e82d29a3cf5e635ab744005110a60dc26fcc51c7a94f01588f7a4794f5 +size 27478114 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b1656d3fe6c3ebbabd459833e3b09abca0e136ec --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f5337e2dc3e6521336fc639434d005fd006e9377a7ca2783570bd2ddb6d76c8c +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..03fb0bed4823680f160ff1d573c3f8fa4813674b --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5949a24fe694dacd90dbcc008366a0529ee775d81caa6bc3c157e7b77e939986 +size 27478434 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3bc0eaf744818eb67b21db15fba6e6be5589bb2a --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:401014b0e1d32fee61d49f544cc8f970f9a2b11a5aad5e5c9bfb29c79c5b256c +size 27478167 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9fe20587b2c45a0a20fc93836f3febe397411de7 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:64154d6dc2f1fa08664d71f0616d2cedfd9b12d28750f24644c84c29af38c110 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..55fb035185358fd82d3a95d5b5683d21f4eb7112 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:852b651da72fa4271d7aa91057ebd765d6793131749e06bfc1162565660d354c +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3d6661a595ca242437f4ea18675446b76dd060e1 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b37b97be6a134da383fa95313d06aa540d766e1831aab6be0990a25905a283df +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fcdfeded193d5ef22483561efe51c2a1e39cf0da --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc091912884792da4f8864edd30d61334795b24e551f29ced47fc3377df8663a +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ec1f756a1a8d9ab54cbee0e997969867faff713e --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfbdfbaaebf4e3cf2cdb277f79fe7ff950e426a05b475e5e3e6b4df98323bfa3 +size 27478434 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9dd676945783d8cc44cc47db9027c2ca904bb3af --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:458b1607d1afd388766edff47695c78d2b63e1aeb7bda4fc6b01ace9910a0b77 +size 27478050 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7986e48e2999b96ea9ecc71cbde15ad5d01a79de --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee6eef29de9d646b03f575757ea1d7607fb951bb48c77fa52dae6d5540e4e6ed +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..305efe831c9b1ca8f8ae3f80587ce899383cf925 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ae1c36db903ceb38d98e1b267257522c13d53a1d5b84af0fbdf1792a6ec88ec +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..69d415047174b1a46aaa41572dd09cff15df9ca8 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19929ae7633c5dbc084861d39d5775fdd11c00126c35e0ffd8105f39b0e5d657 +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6a2eb01d05e46fcbd8b65e89c4113d9625f0d65e --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09cc4f9926748e02567a811e6a1174929c8fd372a20ff06294c9093d941d7f20 +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c869e8d84a27f15c8a17e98572cd6a3427ebcc2b --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:be10cff28d7fb7880681080a34ac9aa74e455a0eb795e43a90f8d415d0ae559f +size 27478231 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f85ea2ee099a7cd3168853437785ce7ffdcb5258 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66d7cbb755d6086f822f64df671df3da86d316382d29bdee49038840fc15f429 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c03b0404a9f1637db6f69b9aad2a1462e999fae4 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c677f17448c914224f03917ea8d0f67c1566364198e4f14666f97dd7bf0e567b +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6f966b43aabc68fbfee32f11a5f1ef3c7d92d97c --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69aa8840e462873b69c56a434d7e6d3b275edb1f10df0149213c2ac0c2d217b4 +size 27478434 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e203d7ae1ee5ab8c58269d60dfc3283afbeaf50d --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21278da85efead9c8b05ec6e01992a3cb0224b65ecc0d640d3d1f02c79cccc1a +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..953c3c233599cf247a3dc43fd584a470cb836c9b --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fbe4b4761c498c268ed19db65bbc087b66031974ed9e90466b450ff31f1f7265 +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8f62ef92a4bfba89d1874e3c98964350ce86884c --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e1edcd9ded50a119270c514238a24fa0e62f5f3f38ecccb88b2e63a282f2e52 +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e66f36e2722c72399871205749d0ddf1e7f67592 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0a98681964f1c029efb3a246423b816b87577e2ba136745f891ae56e66b45ac2 +size 27478306 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..023e802a5ab0778ac09ce1c69695101aa4008953 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9f428950682637434072dcda5339fd2a2fe8b0bd8d8a31e9fc6e7d6c8824fa8d +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..894de4456144044fe3efb04408710792e373f827 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8923d7849fed6f52592febbe06412b7c49d4bcdc32b182d18b4f1692f17e79a +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c3107866193d946e47517a1195973fbda8a7e8c3 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:379d19ced68c28fff5e2b6544b1cbe8a3d83071e50c0c7b93f2205d34f34e395 +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f234fa11306b1b74808267b18b77039c6e2beeb3 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:07d440a4ebbc2a4a1fd73f9c3e4ab41e2871d4be5bdea476e2cd8fd857f5a23c +size 27478167 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c1e9a764a7b4e34f48e0ffa372266d79bbf45920 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5267aa1e09efc5ab1fe548ba320d4fb2748a963e9ea2e8d473b3ae4533bf2870 +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6aeb3f7d71eae2a7127dad32156c1a6d144d1f3a --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f91f458b3e98b5319951852f3c400b9ba25579438a57d54a98407927dc566fa6 +size 27478370 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5f96e79ac13e103dc813e40a365c2dac4eacc828 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c3910c7f584cf8cb7b5930c31cf5a39d3e5fd529f0839b52fd5d4c1868aa3ad9 +size 27478178 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..933a5c0527695135b9a554d7adcfaf31bf17c777 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:634c5c52477663eb2cf69e483a9784fdcb06d288cd8305998ab80d27c58488d2 +size 27478242 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..47882ce1c6712c83ed6ac8a467201fdfedf4e4e2 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:740915ab0135721d4177881a214c6d200f44609fb3f640c97dcbb676a811ea92 +size 27478359 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b21543893159541cc97c53015032796c00bd0b2b --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e5d568abf495676fe99fcb7781e6d140312ae3c25a074073ef185a0985da667 +size 27478103 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..56f1cf45dbce563a7ec9ecc85dba19b1cbeb5373 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d0722059652e4831fdb9e2b09906591f51da0010d799835c823545f8477ee88 +size 27478359 diff --git a/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2f7bb6bef9b9fc40c7d6edd6a511470c080c402b --- /dev/null +++ b/146m2b7100mdedup/global_step5111/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d485165e120fda6eee74ce52e331715774b16e8aa3f355eddc99ce311913acc1 +size 27478167 diff --git a/146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eca5c6a9c8bb18699b5409167b4dd9cead1a053e --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48498445e232c15d772911b17ee168728b13570ad309823069b6109b1ee35ace +size 80413955 diff --git a/146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2077972676cee5f20bd583473d8a025c855d2641 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:927c42ecdedd47331fbdc9f9c10ff515f3ba5d24a4a2ca53b9a12543f37b4945 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c6b75840a7fe786ce8d79272ac6725dab59db6a3 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee639ddd9f2398ecb57a44621762fe8a5105728fcbbc13dd0bd28d433fc7f750 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e36ee8f0be93dd042eb6d865a68bd276f27390e8 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fb6554f20f3010c3f794d6c1d46545306504b692fb8ac0b4d67fb7d47b441af2 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9fdf07fb121fe27431a55256114c2009ea4df5a --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b09569d77f551c96bea6f0791dfcaf1a10b1a0e68d6efdbc821c5750901c45a +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe1b5f6c30605f4a52fed42d3ac68da523c98c6b --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a6a084594a42a28bfe09dabc2de31192bdad98b2f08aa7f5f5b29e2ca44bc3d +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c973e4d84ffcb0efb05de96ef0b3994caf585cbf --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7bab53c7f55f779acc0276f9ea2c1ff373248483df0167b2911c5ea9ef38e746 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..271667d347a78a3f3598cd7cfb00b3a1a979d234 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cdc9272e4e71a551121a1ad805f1307a609e284a16276360385732379a69eac4 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c071a400a2ea0a607c3fd88f6c3ac8f7499109ea --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2f27d3471faa6d84ded0dca35a83b118994f4f31d8c37d2ad3e12cdc1ebbbfc3 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d132a6a473d2d99044c5ba777bfc002ddf2bb62 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5403f48cf60364511f06c499d2f790bd4dcad6c63202a5aabc1c05ed047b892 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b324337c30fc1b260f81340582a2e76c570b26c --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97239fa949c59dd63bc81fb54922c3f4cdb769848e2ead1099f88f7e58731448 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b93f2be4fe52cfa3813a5c0c741cd1712e99abf2 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff3cf4d0034fc4bf7bf65732d9ef164e93f140612f30c4094286dde3a8375fc1 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..73ce0b5804fad3dbcb4a5a6dc6697dae72adda9b --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3d593c80c198c49123a11155da331d1f410417278d973dce68939170d08d3cd +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..25d697e2ef9bf64199402ad1303eba206a4a6d78 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:60f5b53ac7f17458cfbf2c41c0f4fa1d9e8657347c5b3163413c3537692047b7 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0472f0f12b3ce08dea288772870ad6a4230d3ef0 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:58d32e4b0b210b3a99c4df6e093080b658ce497e8c6e2c80c232c9d25300c23c +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b346d87b2a3c1f613befa7e860655181ad3a69a --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a12114bc8726bbfc01392a20661d47c0594c323ec99d0da614cc6e5ed5b8305 +size 14180099 diff --git a/146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt b/146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1270009dd3bc97f19a32ef69ee43249b20931c66 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db93d2a1b4cb59cfe2697f7d1c7dbb71af5acd7632f985bd28fae1142b61a0af +size 4291 diff --git a/146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt b/146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..20dd02acc40add4badaa343defe4b78f653e65e4 --- /dev/null +++ b/146m2b7100mdedup/global_step5111/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05b9b31f96050ad3bfc6cb0e1930d30d7c3402a0147ae746ab08396beb17d601 +size 35443 diff --git a/146m2b7100mdedup/sbatch_146m2b7100mdedup.sh b/146m2b7100mdedup/sbatch_146m2b7100mdedup.sh new file mode 100644 index 0000000000000000000000000000000000000000..2b241d7a937ef24718edd772a4258989d89e0caf --- /dev/null +++ b/146m2b7100mdedup/sbatch_146m2b7100mdedup.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m2b7100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100mdedup.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 2680000000 +# -> Samples: 1_308_594 +TRAIN_SAMPLES=1_308_594 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 13_086 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m2b7100mdedup/sbatch_146m2b7100mdedupval.sh b/146m2b7100mdedup/sbatch_146m2b7100mdedupval.sh new file mode 100644 index 0000000000000000000000000000000000000000..2a51ffd24d76c8999f69b58acf2815f43cc41e20 --- /dev/null +++ b/146m2b7100mdedup/sbatch_146m2b7100mdedupval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m2b7100mdedupval +VARIANT_CKPT=146m2b7100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m2b7100mdedup/tensorboard_146m2b7100mdedup/events.out.tfevents.1679003674.nid006716.102687.0 b/146m2b7100mdedup/tensorboard_146m2b7100mdedup/events.out.tfevents.1679003674.nid006716.102687.0 new file mode 100644 index 0000000000000000000000000000000000000000..de79cf5cd0254769e439c2dd539c72c535a5a032 --- /dev/null +++ b/146m2b7100mdedup/tensorboard_146m2b7100mdedup/events.out.tfevents.1679003674.nid006716.102687.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ed53e2bf5df65b9abadd3d92ed0c51e0f460d3da5089e15d414b1bab37017af +size 9099287 diff --git a/146m2b7100mdedup/tensorboard_146m2b7100mdedupval/events.out.tfevents.1679007059.nid006876.125338.0 b/146m2b7100mdedup/tensorboard_146m2b7100mdedupval/events.out.tfevents.1679007059.nid006876.125338.0 new file mode 100644 index 0000000000000000000000000000000000000000..1acffa3224b2f755dfd84b66bbaa08d08121fcec --- /dev/null +++ b/146m2b7100mdedup/tensorboard_146m2b7100mdedupval/events.out.tfevents.1679007059.nid006876.125338.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23f81470d382d241672be6688c4c9677187ccc35794c3b39107e69523ebfe415 +size 40 diff --git a/146m2b7100mdedup/tensorboard_146m2b7100mdedupval/events.out.tfevents.1679007612.nid006529.88010.0 b/146m2b7100mdedup/tensorboard_146m2b7100mdedupval/events.out.tfevents.1679007612.nid006529.88010.0 new file mode 100644 index 0000000000000000000000000000000000000000..28126c4ec63803eca39d884920ad10a9fd84ad73 --- /dev/null +++ b/146m2b7100mdedup/tensorboard_146m2b7100mdedupval/events.out.tfevents.1679007612.nid006529.88010.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:580ba813833e75ab40759c3d6b3fef3bd2a118921c99937ded07980f099ae9be +size 980 diff --git a/146m32b100m/3324348.err b/146m32b100m/3324348.err new file mode 100644 index 0000000000000000000000000000000000000000..d1954273061b44c1da26ae85caf8a2567ea3121a --- /dev/null +++ b/146m32b100m/3324348.err @@ -0,0 +1,1121 @@ +2: 2023-03-16 18:52:29.222421: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 18:52:29.222424: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 18:52:29.222416: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 18:52:29.222425: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 18:52:29.222429: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 18:52:29.222473: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 18:52:29.222468: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 18:52:29.222478: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: 2023-03-16 18:52:29.222423: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 18:52:29.222431: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 18:52:29.222440: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 18:52:29.222486: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 18:52:29.222473: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 18:52:29.222473: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 18:52:29.222491: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 18:52:29.222489: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 18:52:29.254505: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 18:52:29.254508: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 18:52:29.254516: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 18:52:29.254515: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 18:52:29.254516: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 18:52:29.254513: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 18:52:29.254519: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 18:52:29.254518: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:52:29.323949: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:52:29.323956: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:52:29.323946: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:52:29.323952: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:52:29.323947: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:52:29.323963: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:52:29.323944: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:52:29.323979: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 18:52:29.324505: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 18:52:29.324516: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 18:52:29.324517: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 18:52:29.324518: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 18:52:29.324509: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 18:52:29.324516: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 18:52:29.324527: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 18:52:29.324526: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:52:29.324737: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:52:29.324745: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:52:29.324744: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:52:29.324738: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:52:29.324732: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:52:29.324738: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:52:29.324738: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:52:29.324744: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 18:52:29.403247: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 18:52:29.403258: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 18:52:29.403255: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 18:52:29.403250: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 18:52:29.403254: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 18:52:29.403254: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 18:52:29.403271: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 18:52:29.403267: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 18:52:29.529920: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 18:52:29.529928: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 18:52:29.529924: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 18:52:29.529931: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 18:52:29.529935: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 18:52:29.529921: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 18:52:29.529938: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 18:52:29.529934: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:52:31.622195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:31.622191: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:31.622198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:31.622201: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:31.622203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:31.622209: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:31.622233: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:31.622224: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:31.622879: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:52:31.622881: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:52:31.622884: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:52:31.622888: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:52:31.622891: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:52:31.622892: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:52:31.622918: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:52:31.622919: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:52:31.630360: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:31.630367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:31.630368: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:31.630370: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:31.630397: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:31.630373: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:31.630399: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:31.630396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:31.630809: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:52:31.630812: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:52:31.630814: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:52:31.630819: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:52:31.630821: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:52:31.630826: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:52:31.630824: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:52:31.630845: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 18:52:31.630938: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:31.630947: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:31.630950: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:31.630949: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:31.630949: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:31.630955: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:31.630943: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:31.630976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:31.631378: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 18:52:31.631387: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 18:52:31.631387: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 18:52:31.631390: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 18:52:31.631394: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 18:52:31.631385: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 18:52:31.631399: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 18:52:31.631399: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 18:52:31.680186: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:31.680181: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:31.680179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:31.680196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:31.680188: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:31.680196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:31.680215: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:31.680222: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:31.680598: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 18:52:31.680602: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 18:52:31.680608: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 18:52:31.680613: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 18:52:31.680613: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 18:52:31.680618: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 18:52:31.680637: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 18:52:31.680642: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 18:52:31.681140: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 18:52:31.681076: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:31.681080: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 18:52:31.681148: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:31.681087: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 18:52:31.681152: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:31.681088: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 18:52:31.681152: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:31.681321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 18:52:31.681090: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 18:52:31.681160: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:31.681327: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 18:52:31.681091: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 18:52:31.681399: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 18:52:31.681158: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:31.681329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 18:52:31.681114: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 18:52:31.681402: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 18:52:31.681165: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:31.681338: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 18:52:31.681116: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 18:52:31.681412: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 18:52:31.681162: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:31.681336: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 18:52:31.681498: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 18:52:31.681502: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 18:52:31.681506: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681416: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 18:52:31.681539: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 18:52:31.681541: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:31.681507: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 18:52:31.681510: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 18:52:31.681511: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:31.681547: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 18:52:31.681551: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 18:52:31.681553: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 18:52:31.681339: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 18:52:31.681517: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 18:52:31.681535: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681409: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 18:52:31.681554: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 18:52:31.681556: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:31.681573: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 18:52:31.681335: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 18:52:31.681437: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:31.681518: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 18:52:31.681519: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 18:52:31.681521: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 18:52:31.681526: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:31.681529: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 18:52:31.681532: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 18:52:31.681535: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681407: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-16 18:52:31.681356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:31.681553: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681399: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:31.681816: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681820: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681823: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681825: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681830: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681832: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681833: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 18:52:31.681836: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:52:36.681671: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.681682: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 18:52:36.681759: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.681683: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 18:52:36.681838: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 18:52:36.681771: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.681680: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.681773: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 18:52:36.681839: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.681685: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 18:52:36.681768: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:36.681846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.681690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.681770: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 18:52:36.681848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:36.681895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 18:52:36.681853: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.681690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.681770: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 18:52:36.681851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:36.681855: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.681691: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.681775: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 18:52:36.681846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:36.681896: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 18:52:36.681852: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.681777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 18:52:36.681851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 18:52:36.681906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 18:52:36.681859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:36.681854: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 18:52:36.681903: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 18:52:36.681859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:36.681856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 18:52:36.681905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 18:52:36.681865: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:36.681857: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 18:52:36.681902: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:36.681909: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:36.681912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.682293: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.682300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.682302: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.682309: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.682310: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.682307: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.682308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.682314: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.682528: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.682536: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.682534: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.682544: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.682544: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.682546: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.682551: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.682542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.683641: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.683678: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 18:52:36.683648: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.683646: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.683692: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 18:52:36.683647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 18:52:36.683687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.683648: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 18:52:36.683688: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.683651: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 18:52:36.683689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.683655: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 18:52:36.683663: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.683664: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 18:52:36.683666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 18:52:36.683664: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:52:36.683694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 18:52:36.683838: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.683689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 18:52:36.683666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.683708: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:52:36.683709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.683679: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 18:52:36.683710: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:52:36.683712: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:52:36.683714: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.683718: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 18:52:36.683681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:36.683842: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 18:52:36.683693: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 18:52:36.683694: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:52:36.683716: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:52:36.683732: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:52:36.683735: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 18:52:36.683851: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 18:52:36.683846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:36.683961: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 18:52:36.683846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:36.683847: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:36.683856: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 18:52:36.683962: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 18:52:36.683851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:36.683964: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 18:52:36.683851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:36.683867: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 18:52:36.683866: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 18:52:36.683966: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:36.683967: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.684167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 18:52:36.683969: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:36.683976: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 18:52:36.683868: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 18:52:36.683871: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 18:52:36.683976: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 18:52:36.683979: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 18:52:36.683981: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 18:52:36.683871: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 18:52:36.684200: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 18:52:36.683982: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 18:52:36.683984: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 18:52:36.683870: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 18:52:36.684168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 18:52:36.683987: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 18:52:36.683888: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:36.684204: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 18:52:36.684167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 18:52:36.683991: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:36.684215: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 18:52:36.684214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 18:52:36.684000: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 18:52:36.684005: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 18:52:36.684205: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:36.684208: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:36.684210: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 18:52:36.684172: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:36.684213: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 18:52:36.684174: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.684364: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 18:52:36.684220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 18:52:36.684175: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.684180: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 18:52:36.684222: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 18:52:36.684184: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 18:52:36.684184: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 18:52:36.684185: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.684189: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 18:52:36.684190: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 18:52:36.684235: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 18:52:36.684235: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 18:52:36.684238: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 18:52:36.684215: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 18:52:36.684367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 18:52:36.684238: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 18:52:36.684239: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 18:52:36.684241: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 18:52:36.684220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 18:52:36.684378: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.684372: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 18:52:36.684227: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 18:52:36.684233: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.684381: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 18:52:36.684375: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.684376: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.684378: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.684382: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.684384: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 18:52:36.684391: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 18:52:36.684393: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 18:52:36.684396: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 18:52:36.684397: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 18:52:36.684397: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 18:52:36.684399: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:52:36.713680: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.713701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.713705: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.713700: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.713710: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.713712: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.713717: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.713721: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.715881: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.715880: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.715884: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.715885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.715888: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.715890: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.715897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:52:36.715901: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:52:36.715902: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:52:36.715904: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:52:36.715904: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:52:36.715906: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:52:36.715966: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.715968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:52:36.715983: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:52:36.715985: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +6: Successfully preprocessed all matching files. +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: +2: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +6: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +6: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +0: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +2: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +7: Loading extension module utils... +3: Loading extension module utils... +7: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +3: Loading extension module utils... +7: Loading extension module utils... +3: Loading extension module utils... +7: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +0: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: +6: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +5: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +5: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +0: Loading extension module utils...Loading extension module utils... +0: +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m32b100m/3324348.out b/146m32b100m/3324348.out new file mode 100644 index 0000000000000000000000000000000000000000..65449c63659f6de513a69c8cb0348b4e8e5681dd --- /dev/null +++ b/146m32b100m/3324348.out @@ -0,0 +1,5664 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m32b100mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m32b100mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m32b100m --load checkpoints_146m32b100m --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3324348.json --zero-stage 0 +START 3324348: Thu 16 Mar 2023 06:52:05 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 37.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 42.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 40.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 48.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 44.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 37.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 46.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 41.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 42.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 40.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 36.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 41.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 41.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +4: Launching on nid005140 (4/8), master nid005136 port 9999, GPUs 8, CUDA: True +5: Launching on nid005141 (5/8), master nid005136 port 9999, GPUs 8, CUDA: True +6: Launching on nid005142 (6/8), master nid005136 port 9999, GPUs 8, CUDA: True +2: Launching on nid005138 (2/8), master nid005136 port 9999, GPUs 8, CUDA: True +3: Launching on nid005139 (3/8), master nid005136 port 9999, GPUs 8, CUDA: True +1: Launching on nid005137 (1/8), master nid005136 port 9999, GPUs 8, CUDA: True +7: Launching on nid005143 (7/8), master nid005136 port 9999, GPUs 8, CUDA: True +0: Launching on nid005136 (0/8), master nid005136 port 9999, GPUs 8, CUDA: True +7: > setting tensorboard ... +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3324348.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m32b100mval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m32b100m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m32b100m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m32b100mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 18:52:54,657] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.098 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: [1/1] c++ scaled_masked_softmax_hip.cuda.o scaled_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: [1/1] c++ layer_norm_cuda.o layer_norm_hip_kernel.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so +0: >>> done with compiling and loading fused kernels. Compilation time: 26.915 seconds +0: time to initialize megatron (seconds): 4.361 +0: [after megatron is initialized] datetime: 2023-03-16 18:53:24 +0: building GPT model ... +0: [2023-03-16 18:53:24,498] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 18:53:24,499] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 18:53:24,499] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.05 GB, percent = 6.8% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-16 18:53:26,470] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 18:53:26,669] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 18:53:26,670] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-16 18:53:26,670] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.07 GB, percent = 6.8% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 18:53:26,672] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 18:53:39,778] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 18:53:39,779] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 18:53:39,779] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 18:53:39,790] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 18:53:39,790] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 18:53:39,908] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 18:53:39,909] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-16 18:53:39,909] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.76 GB, percent = 6.9% +0: ninja: no work to do. +0: Time to load utils op: 0.20732498168945312 seconds +0: Time to load utils op: 0.24021267890930176 seconds +2: Time to load utils op: 0.2116868495941162 seconds +2: Time to load utils op: 0.21171069145202637 seconds +2: Time to load utils op: 0.21174120903015137 seconds +2: Time to load utils op: 0.21175432205200195 seconds +2: Time to load utils op: 0.2117605209350586 seconds +2: Time to load utils op: 0.21177911758422852 seconds +2: Time to load utils op: 0.2117762565612793 seconds +2: Time to load utils op: 0.21177172660827637 seconds +1: Time to load utils op: 0.21283602714538574 seconds +1: Time to load utils op: 0.2128596305847168 seconds +1: Time to load utils op: 0.21286749839782715 seconds +1: Time to load utils op: 0.21289610862731934 seconds +1: Time to load utils op: 0.21291518211364746 secondsTime to load utils op: 0.21291303634643555 secondsTime to load utils op: 0.21291589736938477 seconds +1: +1: +1: Time to load utils op: 0.21291685104370117 seconds +6: Time to load utils op: 0.21203088760375977 seconds +6: Time to load utils op: 0.21209073066711426 seconds +3: Time to load utils op: 0.21165060997009277 secondsTime to load utils op: 0.21165060997009277 seconds +3: +6: Time to load utils op: 0.21216893196105957 seconds +6: Time to load utils op: 0.21230220794677734 secondsTime to load utils op: 0.21217799186706543 seconds +6: +6: Time to load utils op: 0.21215367317199707 seconds +6: Time to load utils op: 0.21220827102661133 secondsTime to load utils op: 0.21240568161010742 seconds +3: Time to load utils op: 0.21166157722473145 seconds +3: Time to load utils op: 0.2116851806640625 seconds +3: Time to load utils op: 0.21164417266845703 secondsTime to load utils op: 0.21168851852416992 secondsTime to load utils op: 0.21169161796569824 seconds +3: +3: Time to load utils op: 0.21169209480285645 seconds +3: +6: +4: Time to load utils op: 0.2103748321533203 seconds +7: Time to load utils op: 0.21054482460021973 secondsTime to load utils op: 0.21190619468688965 seconds +7: +4: Time to load utils op: 0.2103865146636963 secondsTime to load utils op: 0.21038413047790527 seconds +4: +4: Time to load utils op: 0.2103896141052246 seconds +4: Time to load utils op: 0.21039533615112305 secondsTime to load utils op: 0.21039843559265137 secondsTime to load utils op: 0.21039676666259766 seconds +4: +4: +4: Time to load utils op: 0.21041512489318848 seconds +7: Time to load utils op: 0.21054434776306152 seconds +7: Time to load utils op: 0.2104332447052002 seconds +7: Time to load utils op: 0.2103712558746338 secondsTime to load utils op: 0.2105579376220703 seconds +7: +7: Time to load utils op: 0.2119154930114746 seconds +7: Time to load utils op: 0.21149921417236328 seconds +3: Time to load utils op: 0.0007908344268798828 seconds +3: Time to load utils op: 0.0008590221405029297 seconds +3: Time to load utils op: 0.0010223388671875 seconds +3: Time to load utils op: 0.001020193099975586 seconds +3: Time to load utils op: 0.0010020732879638672 secondsTime to load utils op: 0.0010530948638916016 secondsTime to load utils op: 0.0009565353393554688 seconds +3: +3: +3: Time to load utils op: 0.0010960102081298828 seconds +2: Time to load utils op: 0.0009121894836425781 seconds +2: Time to load utils op: 0.0010995864868164062 seconds +2: Time to load utils op: 0.0013523101806640625 seconds +2: Time to load utils op: 0.0013623237609863281 seconds +2: Time to load utils op: 0.0013692378997802734 seconds +2: Time to load utils op: 0.0013821125030517578 seconds +2: Time to load utils op: 0.0013527870178222656 seconds +2: Time to load utils op: 0.0013582706451416016 seconds +0: Time to load utils op: 0.20281124114990234 seconds +6: Time to load utils op: 0.0009720325469970703 seconds +7: Time to load utils op: 0.0007522106170654297 seconds +6: Time to load utils op: 0.0008471012115478516 seconds +6: Time to load utils op: 0.0009946823120117188 secondsTime to load utils op: 0.0009684562683105469 seconds +6: Time to load utils op: 0.0011856555938720703 seconds +6: +6: Time to load utils op: 0.0011489391326904297 seconds +6: Time to load utils op: 0.0010628700256347656 secondsTime to load utils op: 0.0008516311645507812 seconds +6: +7: Time to load utils op: 0.0006413459777832031 secondsTime to load utils op: 0.0009644031524658203 seconds +7: +7: Time to load utils op: 0.0008611679077148438 seconds +7: Time to load utils op: 0.0009446144104003906 seconds +7: Time to load utils op: 0.0010101795196533203 seconds +7: Time to load utils op: 0.0006809234619140625 seconds +7: Time to load utils op: 0.001081228256225586 seconds +0: Time to load utils op: 0.3033483028411865 seconds +0: Time to load utils op: 0.3031425476074219 seconds +0: Time to load utils op: 0.30246543884277344 seconds +1: Time to load utils op: 0.0009009838104248047 seconds +1: Time to load utils op: 0.000827789306640625 seconds +1: Time to load utils op: 0.0008733272552490234 seconds +1: Time to load utils op: 0.0012578964233398438 seconds +1: Time to load utils op: 0.0011286735534667969 seconds +1: Time to load utils op: 0.0013115406036376953 seconds +1: Time to load utils op: 0.0012836456298828125 seconds +1: Time to load utils op: 0.00130462646484375 seconds +4: Time to load utils op: 0.0008556842803955078 seconds +4: Time to load utils op: 0.0009884834289550781 seconds +4: Time to load utils op: 0.0007648468017578125 seconds +4: Time to load utils op: 0.0010838508605957031 seconds +4: Time to load utils op: 0.0008752346038818359 seconds +4: Time to load utils op: 0.0009047985076904297 seconds +4: Time to load utils op: 0.001012563705444336 seconds +4: Time to load utils op: 0.0009577274322509766 seconds +0: Time to load utils op: 0.0006604194641113281 seconds +0: Time to load utils op: 0.0005347728729248047 seconds +0: Time to load utils op: 0.30279088020324707 seconds +0: Time to load utils op: 0.30323362350463867 seconds +0: Time to load utils op: 0.0004112720489501953 seconds +0: Time to load utils op: 0.00044846534729003906 seconds +0: Time to load utils op: 0.00039696693420410156 seconds +0: Time to load utils op: 0.0004134178161621094 seconds +0: Time to load utils op: 0.00040531158447265625 seconds +5: Time to load utils op: 0.3355381488800049 seconds +5: Time to load utils op: 0.33536195755004883 seconds +5: Time to load utils op: 0.33554816246032715 secondsTime to load utils op: 0.33521533012390137 seconds +5: +5: Time to load utils op: 0.3613736629486084 seconds +5: Time to load utils op: 0.3355731964111328 secondsTime to load utils op: 0.3358285427093506 seconds +5: +5: Time to load utils op: 0.33559155464172363 seconds +5: Time to load utils op: 0.0004951953887939453 seconds +5: Time to load utils op: 0.0005266666412353516 seconds +5: Time to load utils op: 0.0005371570587158203 seconds +5: Time to load utils op: 0.0004520416259765625 secondsTime to load utils op: 0.0005557537078857422 seconds +5: +5: Time to load utils op: 0.0005841255187988281 seconds +5: Time to load utils op: 0.0006058216094970703 seconds +5: Time to load utils op: 0.0005753040313720703 seconds +0: [2023-03-16 18:53:40,223] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 18:53:40,224] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-16 18:53:40,224] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.9 GB, percent = 6.9% +0: [2023-03-16 18:53:40,338] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 18:53:40,338] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-16 18:53:40,339] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.9 GB, percent = 6.9% +0: [2023-03-16 18:53:40,440] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 18:53:40,441] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-16 18:53:40,441] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.9 GB, percent = 6.9% +0: [2023-03-16 18:53:40,544] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 18:53:40,544] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 18:53:40,544] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.9 GB, percent = 6.9% +0: [2023-03-16 18:53:40,645] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 18:53:40,645] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 18:53:40,646] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.9 GB, percent = 6.9% +0: [2023-03-16 18:53:40,749] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 18:53:40,750] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 18:53:40,750] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.9 GB, percent = 6.9% +0: [2023-03-16 18:53:40,851] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 18:53:40,852] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 18:53:40,852] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.9 GB, percent = 6.9% +0: [2023-03-16 18:53:40,959] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 18:53:40,960] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 18:53:40,960] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.9 GB, percent = 6.9% +0: [2023-03-16 18:53:41,062] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 18:53:41,063] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 18:53:41,063] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.9 GB, percent = 6.9% +0: [2023-03-16 18:53:41,063] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 18:53:41,063] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 18:53:41,063] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 18:53:41,063] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 18:53:41,063] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 18:53:41,064] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 18:53:41,065] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 18:53:41,066] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 18:53:41,066] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 18:53:41,066] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0004284381866455078 seconds +0: [2023-03-16 18:53:41,066] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-16 18:53:41,103] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +7: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-16 18:53:41,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:41,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-16 18:53:41,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-16 18:53:41,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-16 18:53:41,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-16 18:53:41,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-16 18:53:41,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-16 18:53:41,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-16 18:53:41,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-16 18:53:41,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-16 18:53:41,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-16 18:53:41,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:41,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-16 18:53:41,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-16 18:53:41,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:41,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-16 18:53:41,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-16 18:53:41,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-16 18:53:41,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-16 18:53:41,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-16 18:53:41,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-16 18:53:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:41,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-16 18:53:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-16 18:53:41,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:41,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-16 18:53:41,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-16 18:53:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-16 18:53:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-16 18:53:41,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-16 18:53:41,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-16 18:53:41,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-16 18:53:41,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-16 18:53:41,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-16 18:53:41,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:41,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-16 18:53:41,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-16 18:53:41,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-16 18:53:41,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-16 18:53:41,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:41,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-16 18:53:41,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:41,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-16 18:53:41,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:41,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-16 18:53:41,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-16 18:53:41,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:41,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-16 18:53:41,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:41,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:41,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:41,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:41,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:41,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:41,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:41,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:41,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:41,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:41,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:41,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:41,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:41,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:41,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-16 18:53:41,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:41,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:41,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:41,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:41,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:42,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-16 18:53:42,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-16 18:53:42,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:42,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:42,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-16 18:53:42,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-16 18:53:42,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-16 18:53:42,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-16 18:53:42,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-16 18:53:42,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-16 18:53:42,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:42,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-16 18:53:42,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-16 18:53:42,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-16 18:53:42,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-16 18:53:42,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-16 18:53:42,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-16 18:53:42,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:42,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-16 18:53:42,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-16 18:53:42,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-16 18:53:42,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-16 18:53:42,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-16 18:53:42,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-16 18:53:42,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-16 18:53:42,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:42,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-16 18:53:42,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-16 18:53:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-16 18:53:42,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:42,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:42,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-16 18:53:42,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-16 18:53:42,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:42,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-16 18:53:42,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-16 18:53:42,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +6: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: > overriding total number of iterations value to 1 +1: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: > overriding decay style value to cosine +1: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-16 18:53:42,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-16 18:53:42,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-16 18:53:42,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-16 18:53:42,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-16 18:53:42,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +3: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +7: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +6: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +7: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +6: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-16 18:53:42,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-16 18:53:42,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +7: [2023-03-16 18:53:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:42,919] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +5: [2023-03-16 18:53:42,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-16 18:53:42,921] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +2: [2023-03-16 18:53:42,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:42,921] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +7: [2023-03-16 18:53:42,921] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +5: [2023-03-16 18:53:42,922] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +2: [2023-03-16 18:53:42,923] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +1: [2023-03-16 18:53:42,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +3: [2023-03-16 18:53:42,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +6: [2023-03-16 18:53:42,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:42,925] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +6: [2023-03-16 18:53:42,925] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +3: [2023-03-16 18:53:42,925] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +4: [2023-03-16 18:53:42,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-16 18:53:42,925] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +1: [2023-03-16 18:53:42,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:42,926] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +5: [2023-03-16 18:53:42,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +6: [2023-03-16 18:53:42,926] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +1: [2023-03-16 18:53:42,926] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +5: [2023-03-16 18:53:42,926] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +3: [2023-03-16 18:53:42,927] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +4: [2023-03-16 18:53:42,927] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +1: [2023-03-16 18:53:42,927] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +5: [2023-03-16 18:53:42,928] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +2: [2023-03-16 18:53:42,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:42,930] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +7: [2023-03-16 18:53:42,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:42,930] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +2: [2023-03-16 18:53:42,931] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +7: [2023-03-16 18:53:42,932] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +6: [2023-03-16 18:53:42,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-16 18:53:42,935] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +3: [2023-03-16 18:53:42,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-16 18:53:42,936] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +6: [2023-03-16 18:53:42,936] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +3: [2023-03-16 18:53:42,937] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +4: [2023-03-16 18:53:42,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-16 18:53:42,940] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +4: [2023-03-16 18:53:42,941] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +0: [2023-03-16 18:53:42,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,947] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +0: [2023-03-16 18:53:42,949] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +1: [2023-03-16 18:53:42,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:42,953] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +0: [2023-03-16 18:53:42,953] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +1: [2023-03-16 18:53:42,954] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +6: [2023-03-16 18:53:42,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-16 18:53:42,954] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +0: [2023-03-16 18:53:42,954] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +6: [2023-03-16 18:53:42,956] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +5: [2023-03-16 18:53:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-16 18:53:42,959] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +5: [2023-03-16 18:53:42,960] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +2: [2023-03-16 18:53:42,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:42,961] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +2: [2023-03-16 18:53:42,962] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +5: [2023-03-16 18:53:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-16 18:53:42,963] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +1: [2023-03-16 18:53:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:42,963] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +1: [2023-03-16 18:53:42,964] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +5: [2023-03-16 18:53:42,964] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +7: [2023-03-16 18:53:42,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:42,965] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +7: [2023-03-16 18:53:42,965] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +1: [2023-03-16 18:53:42,966] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +7: [2023-03-16 18:53:42,967] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +2: [2023-03-16 18:53:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:42,967] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +3: [2023-03-16 18:53:42,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:42,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:42,969] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +3: [2023-03-16 18:53:42,969] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +2: [2023-03-16 18:53:42,969] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +1: [2023-03-16 18:53:42,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:42,971] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +5: [2023-03-16 18:53:42,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +3: [2023-03-16 18:53:42,970] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +6: [2023-03-16 18:53:42,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +5: [2023-03-16 18:53:42,972] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +1: [2023-03-16 18:53:42,972] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +7: [2023-03-16 18:53:42,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +3: [2023-03-16 18:53:42,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +6: [2023-03-16 18:53:42,970] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +7: [2023-03-16 18:53:42,971] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +3: [2023-03-16 18:53:42,971] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +6: [2023-03-16 18:53:42,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-16 18:53:42,972] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +6: [2023-03-16 18:53:42,972] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +3: [2023-03-16 18:53:42,973] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +3: [2023-03-16 18:53:42,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-16 18:53:42,973] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +7: [2023-03-16 18:53:42,973] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +1: [2023-03-16 18:53:42,973] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +4: [2023-03-16 18:53:42,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-16 18:53:42,973] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +5: [2023-03-16 18:53:42,973] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +6: [2023-03-16 18:53:42,974] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +3: [2023-03-16 18:53:42,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-16 18:53:42,975] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +3: [2023-03-16 18:53:42,975] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +0: [2023-03-16 18:53:42,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +4: [2023-03-16 18:53:42,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-16 18:53:42,975] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +0: [2023-03-16 18:53:42,975] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +4: [2023-03-16 18:53:42,975] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +3: [2023-03-16 18:53:42,976] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +0: [2023-03-16 18:53:42,976] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +4: [2023-03-16 18:53:42,976] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +4: [2023-03-16 18:53:42,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-16 18:53:42,978] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +4: [2023-03-16 18:53:42,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:42,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +4: [2023-03-16 18:53:42,979] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +6: [2023-03-16 18:53:42,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:42,979] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +6: [2023-03-16 18:53:42,979] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +4: [2023-03-16 18:53:42,980] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +6: [2023-03-16 18:53:42,980] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +2: [2023-03-16 18:53:42,981] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +4: [2023-03-16 18:53:42,981] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +0: [2023-03-16 18:53:42,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,987] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +0: [2023-03-16 18:53:42,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,988] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +1: [2023-03-16 18:53:42,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,989] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +1: [2023-03-16 18:53:42,989] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +0: [2023-03-16 18:53:42,990] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +0: [2023-03-16 18:53:42,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,990] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +1: [2023-03-16 18:53:42,990] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +0: [2023-03-16 18:53:42,992] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +5: [2023-03-16 18:53:42,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-16 18:53:42,995] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +6: [2023-03-16 18:53:42,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-16 18:53:42,996] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +5: [2023-03-16 18:53:42,997] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +6: [2023-03-16 18:53:42,997] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +2: [2023-03-16 18:53:42,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:42,998] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +0: [2023-03-16 18:53:43,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:43,000] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +0: [2023-03-16 18:53:43,000] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +7: [2023-03-16 18:53:43,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:43,001] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +5: [2023-03-16 18:53:43,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-16 18:53:43,001] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +0: [2023-03-16 18:53:43,002] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +4: [2023-03-16 18:53:43,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:43,002] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +4: [2023-03-16 18:53:43,002] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +5: [2023-03-16 18:53:43,003] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +4: [2023-03-16 18:53:43,004] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +3: [2023-03-16 18:53:43,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-16 18:53:43,005] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +3: [2023-03-16 18:53:43,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-16 18:53:43,006] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +3: [2023-03-16 18:53:43,007] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +6: [2023-03-16 18:53:43,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-16 18:53:43,008] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +3: [2023-03-16 18:53:43,008] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +7: [2023-03-16 18:53:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-16 18:53:43,009] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +6: [2023-03-16 18:53:43,009] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +4: [2023-03-16 18:53:43,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-16 18:53:43,010] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +7: [2023-03-16 18:53:43,010] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +5: [2023-03-16 18:53:43,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-16 18:53:43,012] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +4: [2023-03-16 18:53:43,012] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +5: [2023-03-16 18:53:43,013] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +2: [2023-03-16 18:53:43,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:43,014] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +0: [2023-03-16 18:53:43,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:43,014] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +2: [2023-03-16 18:53:43,015] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +0: [2023-03-16 18:53:43,016] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +2: [2023-03-16 18:53:43,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-16 18:53:43,034] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +2: [2023-03-16 18:53:43,035] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +1: [2023-03-16 18:53:43,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:43,095] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +1: [2023-03-16 18:53:43,097] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +1: [2023-03-16 18:53:43,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100m/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:43,134] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +1: [2023-03-16 18:53:43,136] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +0: successfully loaded checkpoint from checkpoints_146m32b100m at iteration 0 +7: time (ms) | load-checkpoint: 2035.36 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 18:53:43 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.026446 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.092 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.019929 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.046 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-16 18:53:57 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 19112.08 | train/valid/test-data-iterators-setup: 13316.47 +0: [after training is done] datetime: 2023-03-16 18:53:57 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.793078E+00 | lm loss PPL: 4.439282E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3324348: Thu 16 Mar 2023 06:54:27 PM EET diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0a8a81e1e25a6e86a945b62f5fe5b7ca48d3533c --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfb65115f3eb04e4067ac673dcbc8173367cc0f0fbbc7fd53bf408f0be57ce0a +size 27478295 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..765bdaf57e03e21e217c82ccf368a4fcf749df90 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:83c55c7661e3245b21e1fbb1ede749fcd036a7a7ec176e583c5958eb51aa1b73 +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..35c37fb9bc14a87023814df4b9d5170fb83e564d --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48cf3ecdef5d953787a9f96d3e526beebf7b03e983a5c2ce4f35f980ffe236f1 +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4005f3b46735fefc5f70e5452c10462a6c516c38 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8eced4c6f485396ccfdfe88dfb30ad2e423254428d10d5cf2bcaac887212f1c0 +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cfff184e01c33e2a08c4f5a3fbcb0211f3d95bb5 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c42de4be589c01075c34604d107bb50f6196dc28caa55c2d822998b63ac76b4f +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..28eeb5798ce2ee49b8e5de6cad83ff6fed2c975f --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67a728c923c067424ef1e1e5823967035cd0383c462da3c380e5322542a0bb15 +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..49d8c641ae4a9b317e626f2ad09d2253c2cda727 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:276797ee4eca98c60594c8f1d5c02b794e854a2d43300c778968197da387b111 +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c36828be08b2623d684d696908a778916285807b --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ad7c8181ec4612db4bb5e34cc42722d6d20fa7dc0f0db1153545c2c2045d324 +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80454fc79d5e53648000517537a6addc9ba355d7 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f8db658a8a45182e351bf1f8e9d5b5de53ba73f1a06d25a5abf41b66a1c80a47 +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ca429fa6504ab94bbdaa07626d6f7784377a3449 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:93907313f7c4d607fa5b8ff83723dc234b6e9673f5f4488f02a18a87f2d25679 +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3deaf74155b9a2b015fe05e4cdb9c9821b87c97f --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f80ad1d6d7ea4b54ba39fb9196fa618a6ebca450d09dd7cf025b6cf1dbc6a99b +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d6fd9a03392b40576449a04e362165a50cc38d22 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33b18ceb0c87f0fb49576eb4812ea73e4a7cc878c62c22d76ea159a0c9b8839d +size 27478231 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ffb05ce20f533cf4c24de2a6370f1693a5664e90 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ad1b40640820af64a62dbbf7bc5260dcba65fd597e44f2a0125ee11886125c47 +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..84fa9fb4b6e77f50d8beaed98af27f931cb4797b --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4e17dccff1259ddda4f2f48c161cdafbb2c01092f4b66807037992869662a1ce +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6127ca85ae732f0497d2eb081a22af26738ee122 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3dadbd964311ad1cba9013ec0c97c2256ac9b0784f0655bd8e2baca050f95fd8 +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6246943a63d733d2e3c5466742667d4e59217a5b --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4239e9e5313c046768067da9916660c822bf572b9b310d1fbc83cd4f209c36ee +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a6cbaf72fdc196ac6c8b144477308778970c3953 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26456d2e578b57aa085ec6a47c200e81f4196ab5f112ef0af20a9c748211e7d4 +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c31745c7a9872fb575d87cc9578d5b714d810bfb --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37d697aab60d1c39b186597818609239c84025b16079cf33cd5384eb35bc8471 +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..39345c7afa458ee6ba155af825e8bc23526dd22e --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:38f173b7f160b48f54583f6334ac8da971d6963100f219c52ddd67abebe40983 +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ff4f30e208028883b67994816cd23ce462734f60 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef9f9b2f924e7519028f7162517656135b6fca539e6a1987ab584f965f090366 +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80f34a16ba1a9e338c822dc0db322e894ce85ddf --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b238aeb5beb4ac2cd04915f645a31d63304d126d7c1b9c3953af73399a79cba7 +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6bfb4d0c901a6e8aed714576431cd654017090c6 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6ebc70142ea65b5305dd7cc0965cf82b93fa504fc93e77ed07b9c53dcb1dd01b +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5602c4c653a447a812f78b67de7df92f64cd3a81 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed3a88cc7a471ff7b314821c057dbd7178860ffcfebc3db7d196e86a6576630f +size 27478231 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c408eab50f68c09b7d4984495f09d1c5acf6c633 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9048c4e28d8e4d394a9884867d3d2f91459265d896ddb05ff516ce0bf8d386d +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c24144ff3a85bf2d670c2db12f754556ed76171c --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd9fc686285ad07006846d164a33e52684a4f08ffd11a0a05fc7884daca75e9b +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ca7ac6fdc84aacac7cf49b3b8537c161ba22f890 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ad956f367c96b8d849c0d3ec1a2935a43adbaed8962ac8b337402ee1c59cb04d +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..60ec7bdc56a312118068f061bdf272216d532222 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c159c76fa562eedab10545cbda9e62ac48ed183ff4547215b7215fe2e8633f46 +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a3326d2cb20ec43c61347a1b7b4f53a7b221d9b2 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ee06ccc278da71fcb278e254b7827ea86c1b985915ab390b08604ef3bccc89b +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..156e90d01f36720aa89b32b80943158f1919d081 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:93673b15354d840d444adfd17e999c7e3eddc1a1b972d68cbf39ee056ad6d3bd +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3f936e7d13285443b36077570614ad1b526cee72 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32b5f0aa041fad6175b4c5e125997808c64660ace457d3c5d84ec3238202d8c8 +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..da4663c500029f6ff4bbfdf128896dfca1f10d89 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:79155507d8aa20818e6bfe8ece5a3c83d8b1e182be4a20bbd94c3c8fb6b65e23 +size 27478114 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c8410857254ae13f614fa10e09c4fea70cf53f5b --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf8388e4a778aaacb362fb8cf5f22a382a7842f2bd758cd41786d1e8207be35f +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c98345d8602812d6c729456573cd72be3d7e70c4 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aa84b4822f7cee716f1617b84c64b19a59bc17e62238ddeebb450f4e9d012544 +size 27478434 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..413e1976f4d6a70908b69d824baa0d844a4c95b7 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:20c89709bd51361087dae09edd40ab0500341db4237e973005e8cf6e743b37ed +size 27478167 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..40e425d36f10b32c2823f2d607829c1f665b9dfa --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:638ca343598efe7165d54c1ea5957bb9cbf65d2724d82691adbc73e97ba0693e +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0696b1b2adc3ab3931ca7685ae2a22bb64504c75 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b1509e18f5c6d5c7a522b522a292e2a87df873e091c9e27d233898776732db85 +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eb58c631499ddbe618927ec447f3cecf91167f93 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dca6d839bbe95170d217bc1b2e172e506c09b255b502a809d13de61ae1aaddaa +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..88812d3f075794b422d945368043c5346e4240d2 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:699235ac9734bcbe846f380fbc6cf368a66d0e7be1eaecb06f1d4ff279bb3521 +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1c3c13c6138dbcdf0c7e02d6950ac352a433c07b --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c6884fb7435d9eb7406f47197d0fe9828008f9fca8038e1e685f8f03ca58ca25 +size 27478434 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..452fa29a60d957f96f2ef2982f36fb56e4cb742a --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:24697fc5006913d16b2ce7f6c83fc87083f39468641e506628ea705f6cfb7468 +size 27478050 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ccf0ab9fa707825f1c1cad046294b6f95d545ccc --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b842d6268a2b8b60ce3787c27d89ccb6d61de1d187140a8d2eecdfb248fe3b43 +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a7603dce6af75d413aa93a6682c559def12f0765 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:39841b753d72c3db49a1cbc0456ae0aaf50bbf8549975aabbcb7d67c24cda859 +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..24caa1bdcf77185c76ec15c230ad2df01a55a2f7 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:75f1a7a553ca8c71f9bb3864be0888332dc430a8fe4f2928fe03683402743527 +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..525dd510d076a0bd0b311664d780a37be65ffe9f --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f30f42a71801ee3b821c403d37b85ae11ef35c7d6b34c0093da2483c1db37d83 +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5d4f0fc936f5a0dfd4ee8b50285a14e1a302fda2 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b84bb83e20e0f17cf49acafdc9afa35fcadf2a4faaa8f3e442904774f4a4bfa6 +size 27478231 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..790818c20409298c0b59909eaf6ed2d26e3a240c --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:639b7ade9ad73ee29430be59c8eede787bfd7ef2161522d3fce33d065c2749cd +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a2a50fb0077b93ec90be69bbd81d36a1211cd93b --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c7c3caa8a7563001bb032153a734a5b5ab02a5a07cc97202565929cfd1e8700d +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..00c7d6372772572f88e51382bd311674fe508ec0 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:30ddf72b638dd141c46d8cbf33fe8af2ef3f03f6d15b8e14b939115176591a3c +size 27478434 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1cfc0ac54259ad2544aa64f2571a7358a20d187b --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b27d17b7b68366ca5b72829c0bcb207d8091b64e59020a2eb16d8ff0188d7780 +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f3178b09786d6f005f197b351f1c4f78c5bbc391 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c578b51502a17485190b02631230da1f14a3b698971bf4eeaab6bd3fa10ca68d +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e360f8c33c74dde1fa79fae23b4e5637b87837a5 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c47c665ebbc61078bc3741f0c0a067dd3b14a123996bffb1fbe020f4dedd71cb +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..98d50c96a0d6833a05fdfb5a42455ef16245b069 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bcd0c2026bff65b893794fd93b97c72fc3e5f0e4bedfc842a66c59e2f6e957d4 +size 27478306 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..41d433bcb3ec607300193246cf165cf4d9f2ee61 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:677f9536f61c73c03aa935951d81e2fc63b922cc9cc7763a40f2166ae9995250 +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c31a357f4713009f7eea5ec82ab71451d47039da --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00686df7c2206187f4e3e77952e89449c3811320a2e5cedf44c2e6dbd0b48529 +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..011dfafc99fbf8d61ea926b27d42f55416df0d6d --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6322e39acdcea88fb2ae08b15d444bdc9fe7588d02f219b6909a1f510d92f8b +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0f24218c8f57fdea222c60e3f90cb0ce56e57cee --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dfcdd24c6a559027d3e5deb88fdf3fad9b62b757236eb7a818380b97376a2cbc +size 27478167 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4c29cdf18eaeebf6e0102d7860e2d336e32091e6 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54cedc7a3300464f936af8f39f0ef4fa3454700d3bbfbecee8f51b28368a688b +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..038035a4390247252c99de2dd8b3942b24658d47 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:463871192453f1042eba3d630a896dbabc953f36fb64f645a4c89ba0527b6413 +size 27478370 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..397a94954bc73912c03325b0df5747717fe0aaf2 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4fb051762ba471b466c748a06bb7d36adf74905e1a00f38c301d93834a35c186 +size 27478178 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b83ec1150a29d47e231148037c277ff0dcfb3d9e --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:846236c06645ef0ab90d5f721e644ec9514ff0a660c43107e8f5b406b9c6d637 +size 27478242 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..82075fe97df5ddcc101ffc46a551ea85e7229fc3 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ab6b1160634e9760272586f55ab1714d25d9e196e375f5619066b8b8d747c88 +size 27478359 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ea8044d50b07f20e4cc94114c2b672e0edc68271 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4b094882b1852fc522923881d3ddbafa6b640d96ee5c668dd5a86fe6cf74072 +size 27478103 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc4749bf1a1fbe47567d42ef8c009329c1af9af2 --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ffd88ad02f8adadbcf35414af037b856b357fda776cf60b9c4672a7025d5e9d7 +size 27478359 diff --git a/146m32b100m/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m32b100m/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..87cd0183a37f731f17b8f27e4d3f88b659071eff --- /dev/null +++ b/146m32b100m/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d21cc5841a63c3d595a329b0d0046427d6ae800d616160ee4df715e34f2de83a +size 27478167 diff --git a/146m32b100m/global_step60336/layer_01-model_00-model_states.pt b/146m32b100m/global_step60336/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3bbf953acc1d0bc4597a87c7b7eaf0963651c2a1 --- /dev/null +++ b/146m32b100m/global_step60336/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:726f2b5f4a94386be2e746b09bc09146bee4edf85233d2e6670da28df43abc90 +size 80413955 diff --git a/146m32b100m/global_step60336/layer_03-model_00-model_states.pt b/146m32b100m/global_step60336/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4b0af22a36433f43db17551f97f12eb195620de9 --- /dev/null +++ b/146m32b100m/global_step60336/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c0e5c88f6426632826bdd1f0207397a71f2b1cf9f15c80513fcc9006657221d7 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_04-model_00-model_states.pt b/146m32b100m/global_step60336/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..60dfb1e8a0ea8c7e00950d42c292beb23e19cea8 --- /dev/null +++ b/146m32b100m/global_step60336/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f91d5d0cfa1420215b0249f5f8dfda9f3bcc2f6fd6aac1e16a40cdc1a7d1e11 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_05-model_00-model_states.pt b/146m32b100m/global_step60336/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7a58af9c9022eaeac372b276201179dcc76df126 --- /dev/null +++ b/146m32b100m/global_step60336/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b265c55a763c790ede6f607d436473aabfe9e46419f7589e86e0eabeeea12dcb +size 14180099 diff --git a/146m32b100m/global_step60336/layer_06-model_00-model_states.pt b/146m32b100m/global_step60336/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..96115032d66115264343adcf77601b14bc45a052 --- /dev/null +++ b/146m32b100m/global_step60336/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:257605a2b394927e392914550b12c656ba35d7f555d7cb701f9d4b7cc65bf881 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_07-model_00-model_states.pt b/146m32b100m/global_step60336/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..872bd4f1cb8d4ea3b28d1dcca0d7619862a78b97 --- /dev/null +++ b/146m32b100m/global_step60336/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b84b65f6ad9fa916e7d03d5eb7b3a6d74b067072d65e59da594bad37b3a0bc77 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_08-model_00-model_states.pt b/146m32b100m/global_step60336/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9114b90f2ac04889ea2a4c3cb1af022cbb6eb36c --- /dev/null +++ b/146m32b100m/global_step60336/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6fc379c30f4ba4839888531de40c2e6a32a07c349c90063ad2d5bee6742c7ca6 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_09-model_00-model_states.pt b/146m32b100m/global_step60336/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dc289002e7738a86d4eae1d7c4263a931aa49abe --- /dev/null +++ b/146m32b100m/global_step60336/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22f5040e1a0b7665b2b88e53f25d31ee453ba777d4367223f48266835d5c287a +size 14180099 diff --git a/146m32b100m/global_step60336/layer_10-model_00-model_states.pt b/146m32b100m/global_step60336/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..13d622f1dda5d532601cbaa83549d507819fd063 --- /dev/null +++ b/146m32b100m/global_step60336/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e17ca02d65e6ef8a52e0eed67393a16187250ce2d6455d42545aebf17cdf9540 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_11-model_00-model_states.pt b/146m32b100m/global_step60336/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0596c2274deb5d1feaa0792fc6b5445ca2330198 --- /dev/null +++ b/146m32b100m/global_step60336/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:df219f5bba90ba9b2d68c39be33def453f3c6da0d331921b24a0e46c5f95b02f +size 14180099 diff --git a/146m32b100m/global_step60336/layer_12-model_00-model_states.pt b/146m32b100m/global_step60336/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..be1a308bdca9f132e90cdef02edf4943de1f43ef --- /dev/null +++ b/146m32b100m/global_step60336/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e93f95de1f4e1ff642c460618034de0b0a91260536476a008a569d724b444cb3 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_13-model_00-model_states.pt b/146m32b100m/global_step60336/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c67c30fba205678770f8c39a3e5ce3baf82d715a --- /dev/null +++ b/146m32b100m/global_step60336/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:828e94c828cdae560627d90b14e41626b6a9bfc9f3f11b2c6da279505bd7365a +size 14180099 diff --git a/146m32b100m/global_step60336/layer_14-model_00-model_states.pt b/146m32b100m/global_step60336/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..60d6e99ff586349814b02fac093c38e9d262e1d4 --- /dev/null +++ b/146m32b100m/global_step60336/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6fc2a02026a97cae73f6c33632db0a44489101b3a032536a5460c8e8c1019660 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_15-model_00-model_states.pt b/146m32b100m/global_step60336/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d5c66ef90bb549f7bf7dac05f9cf67321fe664b9 --- /dev/null +++ b/146m32b100m/global_step60336/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a3dfc85b79d101eab4e014282706be80117f3ee38bb301479325c8245e956a7 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_16-model_00-model_states.pt b/146m32b100m/global_step60336/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3662c8217449e465cd65512af129c5fcb8636eb --- /dev/null +++ b/146m32b100m/global_step60336/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34894dd3163a39dba88c0547cf287822727f78f471935076281e722b19a48136 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_17-model_00-model_states.pt b/146m32b100m/global_step60336/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f1b66ab007a5c1b8ec146806901a33a7345d73b4 --- /dev/null +++ b/146m32b100m/global_step60336/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d726d65c3831ca5af472dcd48ab7ffe9dcb194d9aa14045fa45ba31160f6f914 +size 14180099 diff --git a/146m32b100m/global_step60336/layer_19-model_00-model_states.pt b/146m32b100m/global_step60336/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..74d2a148a5d8098843ee2b3bd9779bc68a54e2bc --- /dev/null +++ b/146m32b100m/global_step60336/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:946abce72a3cfddb03feb798b5cc0e3cd0071d8d6619d615233a08074b911028 +size 4291 diff --git a/146m32b100m/global_step60336/mp_rank_00_model_states.pt b/146m32b100m/global_step60336/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..863f7a26c2df2919e0d09901ec2d0036189997f5 --- /dev/null +++ b/146m32b100m/global_step60336/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c50dda7462bbd0c22c2d23dd6fcd8b3a62b6f8ba000608bdc1d799c0d54951ab +size 35443 diff --git a/146m32b100m/tensorboard_146m32b100m/events.out.tfevents.1678953713.nid005627.83657.0 b/146m32b100m/tensorboard_146m32b100m/events.out.tfevents.1678953713.nid005627.83657.0 new file mode 100644 index 0000000000000000000000000000000000000000..cb73ad5fad150c34bdfeaf57515c05b3c6b2ae7c --- /dev/null +++ b/146m32b100m/tensorboard_146m32b100m/events.out.tfevents.1678953713.nid005627.83657.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56949a0e55721ab170a7e989e0ca9a54a0e20ec0024245c2e2fd09b68a439675 +size 107901161 diff --git a/146m32b100m/tensorboard_146m32b100m/events.out.tfevents.1678953713.nid007029.8157.0 b/146m32b100m/tensorboard_146m32b100m/events.out.tfevents.1678953713.nid007029.8157.0 new file mode 100644 index 0000000000000000000000000000000000000000..818b1d6c0263f070006726655175d28846c4c2ae --- /dev/null +++ b/146m32b100m/tensorboard_146m32b100m/events.out.tfevents.1678953713.nid007029.8157.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e75db0d4160ae5f30f2198274499076849b9a7f9298dd409420c4fe2a4ae9d44 +size 4620688 diff --git a/146m32b100m/tensorboard_146m32b100m/events.out.tfevents.1678954864.nid007029.16466.0 b/146m32b100m/tensorboard_146m32b100m/events.out.tfevents.1678954864.nid007029.16466.0 new file mode 100644 index 0000000000000000000000000000000000000000..1de1f3bf1af69cdb9bdcd42f95d923ba78748eb8 --- /dev/null +++ b/146m32b100m/tensorboard_146m32b100m/events.out.tfevents.1678954864.nid007029.16466.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:30762440047cc914775b1f7cd133ad62ac0892ae2ba85d8cd853c23c2eb17ea1 +size 40 diff --git a/146m32b100m/tensorboard_146m32b100mval/events.out.tfevents.1678985574.nid005143.38460.0 b/146m32b100m/tensorboard_146m32b100mval/events.out.tfevents.1678985574.nid005143.38460.0 new file mode 100644 index 0000000000000000000000000000000000000000..c0ffdfe98af6cc09c09cbc00ef178285c2eda450 --- /dev/null +++ b/146m32b100m/tensorboard_146m32b100mval/events.out.tfevents.1678985574.nid005143.38460.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:284d1490b8b190a50e029cb69b7ad04163fcf71c725d9c68bd3f39355560c9e6 +size 980 diff --git a/146m32b100mdedup/3328732.err b/146m32b100mdedup/3328732.err new file mode 100644 index 0000000000000000000000000000000000000000..50cd37da3e25b7fcd6befb5b26d41c0ce5ed8185 --- /dev/null +++ b/146m32b100mdedup/3328732.err @@ -0,0 +1,1105 @@ +5: 2023-03-17 10:25:14.823186: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:25:14.823188: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:25:14.823198: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:25:14.823199: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:25:14.823199: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:25:14.823314: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:25:14.823327: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:25:14.823317: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: 2023-03-17 10:25:14.823380: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:25:14.823383: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:25:14.823386: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: 2023-03-17 10:25:14.823200: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:25:14.823202: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 10:25:14.823210: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: 2023-03-17 10:25:14.823307: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:25:14.823311: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:25:14.823316: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:25:14.823326: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:25:14.823318: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:25:14.823395: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:25:14.823396: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:25:14.823327: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:25:14.823323: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:25:14.823331: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:25:14.823337: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:25:14.823338: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: 2023-03-17 10:25:14.823387: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:25:14.823388: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 10:25:14.823391: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: 2023-03-17 10:25:14.823324: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:25:14.823323: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 10:25:14.823331: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:25:14.823796: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:25:14.823792: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:25:14.823804: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:25:14.823799: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:25:14.823791: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:25:14.823811: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:25:14.823809: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 10:25:14.823807: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:25:14.838088: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:25:14.838100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:25:14.838093: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:25:14.838094: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:25:14.838096: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:25:14.838092: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:25:14.838109: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 10:25:14.838112: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:25:14.960000: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:25:14.960014: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:25:14.960011: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:25:14.960023: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:25:14.960027: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:25:14.960029: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:25:14.960034: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 10:25:14.960020: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:25:14.989085: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:25:14.989082: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:25:14.989086: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:25:14.989093: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:25:14.989080: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:25:14.989097: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:25:14.989098: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 10:25:14.989093: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 10:25:25.513284: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.513253: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-17 10:25:25.513314: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:25.513337: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.513437: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:25.513457: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 10:25:25.513286: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-17 10:25:25.513351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.532086: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:25:25.513376: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.513299: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:25.513402: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.513455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 10:25:25.513326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.513491: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-17 10:25:25.513491: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:25.513402: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.532116: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:25.513480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.513345: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-17 10:25:25.513397: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:25.513502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.513520: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.513484: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.513339: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-17 10:25:25.532336: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.532265: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:25.513542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.532138: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:25:25.532368: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:25:25.513536: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.513488: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.513359: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-17 10:25:25.532376: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:25:25.532385: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:25.513534: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:25.513458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-17 10:25:25.513550: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-17 10:25:25.513458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.513497: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.513350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:25.513560: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.532170: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:25:25.532161: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:25:25.532182: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 10:25:25.532212: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:25:25.513474: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-17 10:25:25.532288: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:25:25.513488: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.513505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:25.532215: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.513559: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:25.532234: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:25.513531: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-17 10:25:25.513485: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:25.513521: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.513494: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.513581: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:25.513577: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:25:25.513351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-17 10:25:25.513493: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.532308: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:25:25.513549: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.532534: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.513589: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:25.532563: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:25:25.532647: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:25:25.513502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:25.532261: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:25:25.532584: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:25:25.532594: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:25:25.532700: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:25:25.532711: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:25:25.532714: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.513623: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-17 10:25:25.513558: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.532613: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:25:25.532730: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:25.513502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.532322: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:25:25.532330: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:25:25.532335: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:25:25.532347: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:25.532745: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:25:25.532751: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.513375: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:25.532356: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:25:25.513587: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-17 10:25:25.532670: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:25:25.532673: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:25.513510: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:25.532287: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 10:25:25.532699: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.532531: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:25.513592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-17 10:25:25.513406: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-17 10:25:25.513525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:25.532290: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.513389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-17 10:25:25.532584: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:25:25.513599: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:25.532555: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:25:25.532613: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:25.532320: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:25:25.532326: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:25:25.532348: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 10:25:25.532353: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.513397: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-17 10:25:25.532637: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:25:25.532649: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 10:25:25.532959: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:25.532670: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.513445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-17 10:25:25.532687: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 10:25:25.532695: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:25.532893: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.513446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:25.532574: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.532571: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.513469: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:25.532603: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.532616: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.532613: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 10:25:25.532614: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:25:25.532736: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:25:25.532752: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:25:25.532760: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 10:25:25.532769: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 10:25:56.076252: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.076306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.076326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.076338: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.076343: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.076350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.076356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.076436: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.077016: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.077047: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.077063: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.077083: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.077093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.077107: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.077117: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 10:25:56.077317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.077125: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.077419: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.077350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.077382: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.077377: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.077409: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.077431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 10:25:56.077435: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.077439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 10:25:56.077446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.077449: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 10:25:56.077456: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.077488: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.077497: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.077498: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.077507: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.077963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.078003: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.078018: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.078044: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.078056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.078080: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.078085: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.078106: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.078984: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.078988: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.078988: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.078990: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.078991: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.078988: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.078992: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.079001: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:25:56.078993: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 10:25:56.079003: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:25:56.079005: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:25:56.079008: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:25:56.079009: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:25:56.079008: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:25:56.079012: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 10:25:56.079013: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:25:56.079769: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.079772: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.079777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.079777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.079779: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.079781: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.079784: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 10:25:56.080136: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.080138: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.080136: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.080258: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 10:25:56.080352: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 10:25:56.080142: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.080259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.080354: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 10:25:56.080258: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.080358: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 10:25:56.080264: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.080141: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.080357: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.080265: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.080142: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.080360: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.080268: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.080368: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:25:56.080144: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.080275: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:25:56.080363: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 10:25:56.080279: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:25:56.080279: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:25:56.080282: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.080145: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 10:25:56.080283: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:25:56.080285: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:25:56.080363: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 10:25:56.080154: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:25:56.080153: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:25:56.080319: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.080375: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:25:56.080376: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:25:56.080155: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:25:56.080161: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:25:56.080161: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.080379: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:25:56.080378: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 10:25:56.080382: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:25:56.080162: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:25:56.080164: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 10:25:56.080166: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:25:56.080325: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 10:25:56.080384: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.080387: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 10:25:56.080333: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 10:25:56.080339: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 10:25:56.080402: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:25:56.091031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.091067: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.091080: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.091103: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.091114: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.091135: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.091211: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.091415: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 10:25:56.091213: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.091447: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 10:25:56.091499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.091463: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.091477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.091505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.091503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.091515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.091536: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 10:25:56.091525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.091549: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.091567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.091594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.091601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.091621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.091628: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093736: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093739: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093739: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093740: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093741: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093743: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093742: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093752: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:25:56.093758: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:25:56.094162: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 10:25:56.093758: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:25:56.093761: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:25:56.093762: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093762: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:25:56.093765: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 10:25:56.093777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 10:25:56.094177: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 10:25:56.093791: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:25:56.094168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.094171: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.094173: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.094177: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.094177: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.094180: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.094194: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:25:56.094195: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:25:56.094196: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:25:56.094200: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:25:56.094200: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:25:56.094202: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 10:25:56.094234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 10:25:56.094254: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.079780: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 10:25:56.079785: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:25:56.079785: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:25:56.079792: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:25:56.079799: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:25:56.079798: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:25:56.079800: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:25:56.079803: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 10:25:56.079803: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:25:56.093485: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.093484: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.093488: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.093490: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.093489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.093489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.093489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.093489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 10:25:56.093501: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:25:56.093502: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:25:56.093503: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:25:56.093511: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:25:56.093513: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:25:56.093514: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:25:56.093517: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 10:25:56.093518: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +3: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +1: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +1: Building extension module utils... +1: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +1: Loading extension module utils... +7: Loading extension module utils... +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: No modifications detected for re-loaded extension module utils, skipping build step...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils...Loading extension module utils... +7: +7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils...Loading extension module utils... +7: +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +2: +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils...Loading extension module utils... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: +1: Loading extension module utils...Loading extension module utils... +1: Loading extension module utils... +1: +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m32b100mdedup/3328732.out b/146m32b100mdedup/3328732.out new file mode 100644 index 0000000000000000000000000000000000000000..9975a5c518f46314e1bbe40ff8360e1b6b547679 --- /dev/null +++ b/146m32b100mdedup/3328732.out @@ -0,0 +1,5641 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m32b100mdedupval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m32b100mdedupval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m32b100mdedup --load checkpoints_146m32b100mdedup --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3328732.json --zero-stage 0 +START 3328732: Fri 17 Mar 2023 10:24:52 AM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 49.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 46.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 48.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 49.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 46.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 53.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 49.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 44.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 49.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 39.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 42.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 44.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 45.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 44.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 46.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 39.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +0: Launching on nid006717 (0/8), master nid006717 port 9999, GPUs 8, CUDA: True +4: Launching on nid006721 (4/8), master nid006717 port 9999, GPUs 8, CUDA: True +2: Launching on nid006719 (2/8), master nid006717 port 9999, GPUs 8, CUDA: True +1: Launching on nid006718 (1/8), master nid006717 port 9999, GPUs 8, CUDA: True +7: Launching on nid006724 (7/8), master nid006717 port 9999, GPUs 8, CUDA: True +6: Launching on nid006723 (6/8), master nid006717 port 9999, GPUs 8, CUDA: True +5: Launching on nid006722 (5/8), master nid006717 port 9999, GPUs 8, CUDA: True +3: Launching on nid006720 (3/8), master nid006717 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3328732.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m32b100mdedupval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m32b100mdedup +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m32b100mdedup +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m32b100mdedupval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-17 10:27:13,162] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.097 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 23.681 seconds +0: time to initialize megatron (seconds): -4.253 +0: [after megatron is initialized] datetime: 2023-03-17 10:27:39 +0: building GPT model ... +0: [2023-03-17 10:27:39,888] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-17 10:27:39,889] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-17 10:27:39,889] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.42 GB, percent = 6.2% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-17 10:27:41,886] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-17 10:27:42,226] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-17 10:27:42,227] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-17 10:27:42,227] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-17 10:27:42,229] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-17 10:27:55,275] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-17 10:27:55,276] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-17 10:27:55,276] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-17 10:27:55,284] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-17 10:27:55,284] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-17 10:27:55,405] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-17 10:27:55,406] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 10:27:55,406] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.12 GB, percent = 6.4% +1: ninja: no work to do. +7: Time to load utils op: 0.3310244083404541 seconds +1: Time to load utils op: 0.2032175064086914 seconds +1: Time to load utils op: 0.3086864948272705 secondsTime to load utils op: 0.30877089500427246 seconds +1: +1: Time to load utils op: 0.3086864948272705 seconds +1: Time to load utils op: 0.3088669776916504 seconds +1: Time to load utils op: 0.2034006118774414 seconds +1: Time to load utils op: 0.20322370529174805 seconds +1: Time to load utils op: 0.3339076042175293 seconds +0: Time to load utils op: 0.21469712257385254 secondsTime to load utils op: 0.3101489543914795 seconds +0: +0: Time to load utils op: 0.3108959197998047 seconds +0: Time to load utils op: 0.3111557960510254 seconds +0: Time to load utils op: 0.3087313175201416 seconds +0: Time to load utils op: 0.31070733070373535 seconds +0: Time to load utils op: 0.3085603713989258 seconds +0: Time to load utils op: 0.3106818199157715 seconds +7: Time to load utils op: 0.2028656005859375 seconds +7: Time to load utils op: 0.20259952545166016 seconds +7: Time to load utils op: 0.20267438888549805 secondsTime to load utils op: 0.20337820053100586 seconds +7: +7: Time to load utils op: 0.20257258415222168 seconds +7: Time to load utils op: 0.20368671417236328 seconds +7: Time to load utils op: 0.20286965370178223 seconds +3: Time to load utils op: 0.21543598175048828 seconds +3: Time to load utils op: 0.215989351272583 seconds +3: Time to load utils op: 0.2174084186553955 seconds +3: Time to load utils op: 0.21100616455078125 seconds +3: Time to load utils op: 0.21087431907653809 secondsTime to load utils op: 0.2089385986328125 secondsTime to load utils op: 0.21063637733459473 seconds +3: +3: +3: Time to load utils op: 0.20897126197814941 seconds +4: Time to load utils op: 0.21254777908325195 secondsTime to load utils op: 0.21254873275756836 seconds +4: +4: Time to load utils op: 0.21256589889526367 seconds +4: Time to load utils op: 0.21254754066467285 seconds +4: Time to load utils op: 0.21258878707885742 seconds +4: Time to load utils op: 0.21256661415100098 seconds +4: Time to load utils op: 0.2125716209411621 secondsTime to load utils op: 0.2125692367553711 seconds +4: +6: Time to load utils op: 0.21112370491027832 seconds +6: Time to load utils op: 0.2111527919769287 seconds +6: Time to load utils op: 0.21116113662719727 seconds +6: Time to load utils op: 0.21117258071899414 seconds +6: Time to load utils op: 0.2111954689025879 seconds +6: Time to load utils op: 0.211181640625 seconds +6: Time to load utils op: 0.2112133502960205 seconds +6: Time to load utils op: 0.21122121810913086 seconds +5: Time to load utils op: 0.21295595169067383 seconds +5: Time to load utils op: 0.21299338340759277 seconds +5: Time to load utils op: 0.2129676342010498 seconds +5: Time to load utils op: 0.21300125122070312 secondsTime to load utils op: 0.21303033828735352 seconds +5: Time to load utils op: 0.2129983901977539 seconds +5: +5: Time to load utils op: 0.21303009986877441 secondsTime to load utils op: 0.21300148963928223 seconds +5: +2: Time to load utils op: 0.21416354179382324 secondsTime to load utils op: 0.21416473388671875 seconds +2: +2: Time to load utils op: 0.2142200469970703 seconds +2: Time to load utils op: 0.21423888206481934 seconds +2: Time to load utils op: 0.21423888206481934 seconds +2: Time to load utils op: 0.21424007415771484 seconds +2: Time to load utils op: 0.21425795555114746 seconds +2: Time to load utils op: 0.2142493724822998 seconds +0: [2023-03-17 10:27:55,733] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-17 10:27:55,734] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 10:27:55,734] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.13 GB, percent = 6.4% +3: Time to load utils op: 0.00043487548828125 seconds +3: Time to load utils op: 0.0004150867462158203 seconds +3: Time to load utils op: 0.0003895759582519531 seconds +3: Time to load utils op: 0.0003800392150878906 seconds +3: Time to load utils op: 0.00040268898010253906 seconds +3: Time to load utils op: 0.0004200935363769531 seconds +3: Time to load utils op: 0.00039386749267578125 seconds +3: Time to load utils op: 0.00039768218994140625 seconds +7: Time to load utils op: 0.0005042552947998047 seconds +7: Time to load utils op: 0.0005099773406982422 secondsTime to load utils op: 0.00035762786865234375 seconds +7: +7: Time to load utils op: 0.0005333423614501953 secondsTime to load utils op: 0.0005259513854980469 seconds +7: +7: Time to load utils op: 0.0005345344543457031 seconds +7: Time to load utils op: 0.0005862712860107422 seconds +7: Time to load utils op: 0.0005576610565185547 seconds +6: Time to load utils op: 0.0009746551513671875 seconds +6: Time to load utils op: 0.0009014606475830078 seconds +6: Time to load utils op: 0.0011706352233886719 seconds +6: Time to load utils op: 0.0012509822845458984 secondsTime to load utils op: 0.0011489391326904297 seconds +6: +6: Time to load utils op: 0.0012786388397216797 secondsTime to load utils op: 0.00127410888671875 seconds +6: +6: Time to load utils op: 0.00128936767578125 seconds +2: Time to load utils op: 0.0009026527404785156 seconds +2: Time to load utils op: 0.0007979869842529297 seconds +2: Time to load utils op: 0.0006403923034667969 seconds +2: Time to load utils op: 0.0009646415710449219 seconds +2: Time to load utils op: 0.0011510848999023438 seconds +2: Time to load utils op: 0.0010976791381835938 seconds +2: Time to load utils op: 0.001138925552368164 seconds +2: Time to load utils op: 0.001134634017944336 seconds +5: Time to load utils op: 0.001016855239868164 seconds +5: Time to load utils op: 0.001085042953491211 seconds +5: Time to load utils op: 0.0012459754943847656 seconds +5: Time to load utils op: 0.0012454986572265625 seconds +5: Time to load utils op: 0.0011391639709472656 seconds +5: Time to load utils op: 0.0011906623840332031 seconds +5: Time to load utils op: 0.0012006759643554688 seconds +5: Time to load utils op: 0.0012359619140625 seconds +0: Time to load utils op: 0.0006260871887207031 seconds +0: Time to load utils op: 0.00045180320739746094 seconds +0: Time to load utils op: 0.0005838871002197266 seconds +0: Time to load utils op: 0.000606536865234375 secondsTime to load utils op: 0.0006089210510253906 seconds +0: +0: Time to load utils op: 0.0005817413330078125 seconds +0: Time to load utils op: 0.0006401538848876953 seconds +4: Time to load utils op: 0.0009319782257080078 secondsTime to load utils op: 0.0009181499481201172 seconds +4: +4: Time to load utils op: 0.0009317398071289062 seconds +4: Time to load utils op: 0.0010945796966552734 seconds +4: Time to load utils op: 0.0013453960418701172 seconds +4: Time to load utils op: 0.0011832714080810547 seconds +4: Time to load utils op: 0.0011126995086669922 seconds +4: Time to load utils op: 0.0011608600616455078 seconds +1: Time to load utils op: 0.0004780292510986328 seconds +1: Time to load utils op: 0.0004818439483642578 seconds +1: Time to load utils op: 0.0004394054412841797 secondsTime to load utils op: 0.00044989585876464844 secondsTime to load utils op: 0.00044155120849609375 seconds +1: +1: +1: Time to load utils op: 0.0004284381866455078 secondsTime to load utils op: 0.0004184246063232422 seconds +1: +1: Time to load utils op: 0.0005071163177490234 seconds +0: [2023-03-17 10:27:55,892] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-17 10:27:55,892] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 10:27:55,892] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-17 10:27:56,002] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-17 10:27:56,003] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 10:27:56,003] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-17 10:27:56,111] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-17 10:27:56,111] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:27:56,111] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-17 10:27:56,216] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-17 10:27:56,217] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:27:56,217] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-17 10:27:56,324] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-17 10:27:56,324] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:27:56,325] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-17 10:27:56,429] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-17 10:27:56,429] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:27:56,429] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-17 10:27:56,542] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-17 10:27:56,542] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:27:56,543] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-17 10:27:56,650] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-17 10:27:56,651] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 10:27:56,651] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-17 10:27:56,651] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-17 10:27:56,651] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-17 10:27:56,651] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-17 10:27:56,651] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-17 10:27:56,651] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-17 10:27:56,652] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-17 10:27:56,653] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-17 10:27:56,654] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-17 10:27:56,654] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-17 10:27:56,654] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.00042057037353515625 seconds +0: [2023-03-17 10:27:56,654] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-17 10:27:56,664] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +4: [2023-03-17 10:27:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +0: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +5: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +6: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt... +7: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +7: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +3: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +0: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt. +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +5: [2023-03-17 10:27:56,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +6: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +3: [2023-03-17 10:27:56,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +7: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +1: [2023-03-17 10:27:56,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +0: [2023-03-17 10:27:56,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:56,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:56,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:56,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +0: [2023-03-17 10:27:56,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:56,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:56,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:56,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:56,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:56,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:56,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +4: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +1: [2023-03-17 10:27:56,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +6: [2023-03-17 10:27:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:56,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +3: [2023-03-17 10:27:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:56,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +4: [2023-03-17 10:27:56,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:56,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt... +2: [2023-03-17 10:27:56,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:56,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:56,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +7: [2023-03-17 10:27:56,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +5: [2023-03-17 10:27:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:56,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:56,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:56,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:56,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:56,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:56,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:56,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:56,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:56,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:56,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:56,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:56,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:56,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:56,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:56,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt. +2: [2023-03-17 10:27:56,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:57,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:57,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:57,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +5: [2023-03-17 10:27:57,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +7: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +3: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +6: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +0: [2023-03-17 10:27:57,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +5: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +3: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +1: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +0: [2023-03-17 10:27:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +6: [2023-03-17 10:27:57,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +7: [2023-03-17 10:27:57,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +4: [2023-03-17 10:27:57,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt... +2: [2023-03-17 10:27:57,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt. +2: [2023-03-17 10:27:57,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +4: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +0: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +6: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +3: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +7: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +2: [2023-03-17 10:27:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt... +5: [2023-03-17 10:27:57,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +4: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +5: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +7: [2023-03-17 10:27:57,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +1: [2023-03-17 10:27:57,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +2: [2023-03-17 10:27:57,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +0: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +6: [2023-03-17 10:27:57,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt. +3: [2023-03-17 10:27:57,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +7: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +3: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +2: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +5: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +1: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +4: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt... +6: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +6: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +4: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +5: [2023-03-17 10:27:57,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +2: [2023-03-17 10:27:57,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +3: [2023-03-17 10:27:57,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +7: [2023-03-17 10:27:57,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +1: [2023-03-17 10:27:57,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt. +0: [2023-03-17 10:27:57,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +0: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +7: [2023-03-17 10:27:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +5: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +4: [2023-03-17 10:27:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +3: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +1: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt... +6: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +6: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +4: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +3: [2023-03-17 10:27:57,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +5: [2023-03-17 10:27:57,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +1: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +2: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +7: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt. +0: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +0: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +2: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +3: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +4: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +6: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt... +7: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:27:57,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +6: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +0: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +4: [2023-03-17 10:27:57,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +5: [2023-03-17 10:27:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +3: [2023-03-17 10:27:57,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +1: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +2: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt. +7: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +5: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +0: [2023-03-17 10:27:57,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +2: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +6: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +3: [2023-03-17 10:27:57,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +1: [2023-03-17 10:27:57,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +4: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt... +7: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +7: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +0: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +2: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +4: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +3: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +6: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +5: [2023-03-17 10:27:57,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt. +1: [2023-03-17 10:27:57,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +0: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +3: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +6: [2023-03-17 10:27:57,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +7: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +4: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +5: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +1: [2023-03-17 10:27:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt... +2: [2023-03-17 10:27:57,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +7: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +5: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +3: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +2: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +1: [2023-03-17 10:27:57,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +6: [2023-03-17 10:27:57,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +4: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt. +0: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +0: [2023-03-17 10:27:57,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +6: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +1: [2023-03-17 10:27:57,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +7: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +4: [2023-03-17 10:27:57,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +2: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt... +3: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:27:57,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:27:57,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +4: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +6: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +3: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +1: [2023-03-17 10:27:57,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +7: [2023-03-17 10:27:57,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +2: [2023-03-17 10:27:57,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +0: [2023-03-17 10:27:57,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt. +5: [2023-03-17 10:27:57,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +4: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +6: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +5: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +2: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +1: [2023-03-17 10:27:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +0: [2023-03-17 10:27:57,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt... +7: [2023-03-17 10:27:57,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +5: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +3: [2023-03-17 10:27:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +4: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +2: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +6: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +1: [2023-03-17 10:27:57,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +7: [2023-03-17 10:27:57,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt. +0: [2023-03-17 10:27:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +6: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +7: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +4: [2023-03-17 10:27:57,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +5: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt... +1: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +7: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +1: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +4: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +6: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +5: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +2: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +3: [2023-03-17 10:27:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt. +0: [2023-03-17 10:27:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +2: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +7: [2023-03-17 10:27:57,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:57,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +1: [2023-03-17 10:27:57,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +5: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:27:57,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +3: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +0: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +5: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +6: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt... +4: [2023-03-17 10:27:57,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:57,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +1: [2023-03-17 10:27:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:57,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:57,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:57,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:57,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:57,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:57,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:57,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:57,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +2: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +4: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:57,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +3: [2023-03-17 10:27:57,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +7: [2023-03-17 10:27:57,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +6: [2023-03-17 10:27:57,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:57,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:57,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:57,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt. +0: [2023-03-17 10:27:57,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:58,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:58,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:27:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:58,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +4: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +7: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +1: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +6: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +0: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +2: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +5: [2023-03-17 10:27:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt... +3: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +6: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +4: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +5: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +3: [2023-03-17 10:27:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +7: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +1: [2023-03-17 10:27:58,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +0: [2023-03-17 10:27:58,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt. +2: [2023-03-17 10:27:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +0: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +3: [2023-03-17 10:27:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +2: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +7: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +5: [2023-03-17 10:27:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt... +4: [2023-03-17 10:27:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +4: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +3: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +1: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +2: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +6: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +7: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +0: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt. +5: [2023-03-17 10:27:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +2: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +7: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +3: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +1: [2023-03-17 10:27:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +5: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +6: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt... +4: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +6: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +7: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +1: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +2: [2023-03-17 10:27:58,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +5: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +4: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +3: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt. +0: [2023-03-17 10:27:58,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +0: [2023-03-17 10:27:58,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +3: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +7: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:27:58,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +4: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +6: [2023-03-17 10:27:58,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:27:58,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt... +1: [2023-03-17 10:27:58,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +3: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +5: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +6: [2023-03-17 10:27:58,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: [2023-03-17 10:27:58,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +6: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: > overriding warmup iterations value to 0 +7: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: > overriding total number of iterations value to 1 +6: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +0: > overriding decay style value to cosine +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +4: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +3: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +4: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +4: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +2: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +1: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt. +7: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +7: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +7: [2023-03-17 10:27:58,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +5: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +5: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +0: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +2: [2023-03-17 10:27:58,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +0: [2023-03-17 10:27:58,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:27:58,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:27:58,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:27:58,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:27:58,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:27:58,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:27:58,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 10:27:58,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:27:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:27:58,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:27:58,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:27:58,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:27:58,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:27:58,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:27:58,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:27:58,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 10:27:58,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 10:27:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt... +6: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt. +6: [2023-03-17 10:27:58,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:27:58,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:27:58,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:27:58,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:27:58,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:27:58,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:27:58,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 10:27:58,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 10:27:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +3: [2023-03-17 10:27:58,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:27:58,453] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +3: [2023-03-17 10:27:58,455] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +4: [2023-03-17 10:27:58,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:27:58,462] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +4: [2023-03-17 10:27:58,464] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +2: [2023-03-17 10:27:58,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:27:58,469] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +7: [2023-03-17 10:27:58,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:27:58,470] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +2: [2023-03-17 10:27:58,471] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +7: [2023-03-17 10:27:58,472] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +1: [2023-03-17 10:27:58,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:27:58,472] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +0: [2023-03-17 10:27:58,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:27:58,474] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +5: [2023-03-17 10:27:58,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,474] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +1: [2023-03-17 10:27:58,474] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +0: [2023-03-17 10:27:58,476] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +5: [2023-03-17 10:27:58,476] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +6: [2023-03-17 10:27:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:27:58,476] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +0: [2023-03-17 10:27:58,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:27:58,477] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +6: [2023-03-17 10:27:58,479] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +0: [2023-03-17 10:27:58,480] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +5: [2023-03-17 10:27:58,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,481] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +5: [2023-03-17 10:27:58,483] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +7: [2023-03-17 10:27:58,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:27:58,492] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +7: [2023-03-17 10:27:58,494] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +3: [2023-03-17 10:27:58,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:27:58,494] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +3: [2023-03-17 10:27:58,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:27:58,495] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +2: [2023-03-17 10:27:58,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:27:58,495] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +2: [2023-03-17 10:27:58,497] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +3: [2023-03-17 10:27:58,497] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +3: [2023-03-17 10:27:58,498] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +6: [2023-03-17 10:27:58,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:27:58,502] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +6: [2023-03-17 10:27:58,504] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +4: [2023-03-17 10:27:58,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:27:58,506] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +4: [2023-03-17 10:27:58,507] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +7: [2023-03-17 10:27:58,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:27:58,509] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +7: [2023-03-17 10:27:58,511] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +2: [2023-03-17 10:27:58,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:27:58,513] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +1: [2023-03-17 10:27:58,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:27:58,513] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +2: [2023-03-17 10:27:58,515] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +1: [2023-03-17 10:27:58,515] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +0: [2023-03-17 10:27:58,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:27:58,516] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +5: [2023-03-17 10:27:58,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:27:58,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,516] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +6: [2023-03-17 10:27:58,516] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +4: [2023-03-17 10:27:58,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:27:58,517] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +7: [2023-03-17 10:27:58,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,518] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +7: [2023-03-17 10:27:58,517] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +0: [2023-03-17 10:27:58,518] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +6: [2023-03-17 10:27:58,518] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +7: [2023-03-17 10:27:58,519] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +4: [2023-03-17 10:27:58,519] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +4: [2023-03-17 10:27:58,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:27:58,520] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +5: [2023-03-17 10:27:58,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,520] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +2: [2023-03-17 10:27:58,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:27:58,521] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +4: [2023-03-17 10:27:58,521] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +2: [2023-03-17 10:27:58,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,522] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +2: [2023-03-17 10:27:58,522] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +2: [2023-03-17 10:27:58,522] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +6: [2023-03-17 10:27:58,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:27:58,523] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +2: [2023-03-17 10:27:58,523] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +6: [2023-03-17 10:27:58,525] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +0: [2023-03-17 10:27:58,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:27:58,526] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +0: [2023-03-17 10:27:58,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:27:58,527] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +2: [2023-03-17 10:27:58,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:27:58,528] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +0: [2023-03-17 10:27:58,528] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +0: [2023-03-17 10:27:58,529] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +2: [2023-03-17 10:27:58,529] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +3: [2023-03-17 10:27:58,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:27:58,530] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +4: [2023-03-17 10:27:58,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:27:58,530] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +3: [2023-03-17 10:27:58,531] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +4: [2023-03-17 10:27:58,532] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +6: [2023-03-17 10:27:58,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:27:58,532] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +6: [2023-03-17 10:27:58,534] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +4: [2023-03-17 10:27:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:27:58,535] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +4: [2023-03-17 10:27:58,537] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +7: [2023-03-17 10:27:58,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:27:58,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:27:58,537] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +7: [2023-03-17 10:27:58,537] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +3: [2023-03-17 10:27:58,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:27:58,539] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +7: [2023-03-17 10:27:58,539] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +7: [2023-03-17 10:27:58,539] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +5: [2023-03-17 10:27:58,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,540] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +1: [2023-03-17 10:27:58,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:27:58,539] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +3: [2023-03-17 10:27:58,540] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +1: [2023-03-17 10:27:58,540] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +5: [2023-03-17 10:27:58,541] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +1: [2023-03-17 10:27:58,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:27:58,542] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +1: [2023-03-17 10:27:58,544] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +6: [2023-03-17 10:27:58,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:27:58,545] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +7: [2023-03-17 10:27:58,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:27:58,545] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +5: [2023-03-17 10:27:58,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,546] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +6: [2023-03-17 10:27:58,546] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +7: [2023-03-17 10:27:58,546] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +0: [2023-03-17 10:27:58,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:27:58,547] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +4: [2023-03-17 10:27:58,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,547] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +4: [2023-03-17 10:27:58,547] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +0: [2023-03-17 10:27:58,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:27:58,548] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +2: [2023-03-17 10:27:58,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +6: [2023-03-17 10:27:58,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:27:58,548] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +0: [2023-03-17 10:27:58,549] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +6: [2023-03-17 10:27:58,549] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +4: [2023-03-17 10:27:58,549] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +0: [2023-03-17 10:27:58,550] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +3: [2023-03-17 10:27:58,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:27:58,550] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +6: [2023-03-17 10:27:58,550] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +2: [2023-03-17 10:27:58,551] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +6: [2023-03-17 10:27:58,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:27:58,552] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +6: [2023-03-17 10:27:58,552] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +6: [2023-03-17 10:27:58,554] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +3: [2023-03-17 10:27:58,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:27:58,556] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +3: [2023-03-17 10:27:58,557] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +5: [2023-03-17 10:27:58,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,558] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +3: [2023-03-17 10:27:58,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 10:27:58,559] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +2: [2023-03-17 10:27:58,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 10:27:58,559] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +5: [2023-03-17 10:27:58,560] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +3: [2023-03-17 10:27:58,560] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +2: [2023-03-17 10:27:58,561] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +1: [2023-03-17 10:27:58,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:27:58,566] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +0: [2023-03-17 10:27:58,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 10:27:58,567] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +1: [2023-03-17 10:27:58,568] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +0: [2023-03-17 10:27:58,569] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +5: [2023-03-17 10:27:58,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 10:27:58,572] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +5: [2023-03-17 10:27:58,573] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +7: [2023-03-17 10:27:58,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 10:27:58,575] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +7: [2023-03-17 10:27:58,576] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +1: [2023-03-17 10:27:58,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:27:58,581] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +1: [2023-03-17 10:27:58,583] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +1: [2023-03-17 10:27:58,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:27:58,591] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +1: [2023-03-17 10:27:58,592] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +1: [2023-03-17 10:27:58,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 10:27:58,596] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +1: [2023-03-17 10:27:58,597] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +4: [2023-03-17 10:27:58,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m32b100mdedup/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 10:27:58,710] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +4: [2023-03-17 10:27:58,711] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +0: successfully loaded checkpoint from checkpoints_146m32b100mdedup at iteration 0 +7: time (ms) | load-checkpoint: 2045.86 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-17 10:27:59 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.018568 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.092 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.047189 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.076 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-17 10:28:13 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 19859.35 | train/valid/test-data-iterators-setup: 13315.38 +0: [after training is done] datetime: 2023-03-17 10:28:13 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.962464E+00 | lm loss PPL: 5.258675E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3328732: Fri 17 Mar 2023 10:28:37 AM EET diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a01e1b43990f4f15fcc475140b412dd5997e452d --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf6c74b8a6c767c55f07bd64f9d572aaf5d2dcaa8782e3dc165f974bbffb2032 +size 27478295 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..02879549871aa70ea57b3a27a7c4156d2b109964 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:39462d77690b78e1b7557fd452e834011afb2fa8f51eb8b134343c274cb53891 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7098fcddafccf0d15133f67038f538981ade2a81 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:57b7e24d9f0ea09c471197a4e47c2cd55886da35f1ff87da0393283940ea1f79 +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e813e6d83ac3532d7d8bbff7002239b2134a8a6b --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a7d7f1f0ab75c6f5c380b9a99060d7fe42392ac58b9316be759b9137dcf48dfd +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..146fd2cd73432b61d422fbab405d19992270bfca --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1f816b96b6ba079fc5e9662f01c89443be1743410320c893c1c2abcb9d50457 +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..117f777d393456d80e214624c415f2c3c2aff9c5 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:217f30c8fc3d55905025ebc3557ea8fbe818a2fa549b96d6789f0138528854da +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e91e7fb60aed4b7bab217c136bebf234179b7e08 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bcbddfefdefe4aa8d29b62727781524ec025c3243275e2a9833c9a7d5734eea4 +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6e3f54c250c9786810d4b4e4b9bf09de1f87a78c --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69073792d920f0b0791876bca1adc5ed41c520a621727a1aa57b0af20c6c03c4 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..79accb5d79815a7da48c0af0bfb8590b9534b1d3 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dfe582992c656bd121149cd794acb229bf4127183dd49ba51d3ed04dffe54350 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..81890b57797b6dc06c1141784fe92025f473548e --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fbd52b4dbc9ad5daf0cf5a7e3628422957c0b0b6edf7dd2390bb01630ae89cce +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e1e0186d0e4ad34da3f34b8e4bc30addbeed3bcf --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e54f6174ebb651da62a79e7de23acfe2c067f70527a8a01aa0b6ae630f3cc27 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc8fda27824e4e10753bcd6ddbfb799650373868 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:efd311992e195978840404e1ace762c4a5eb7b2c0ad95807e27ca1099f53577e +size 27478231 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bacabfd2bc515618b0659b17a5db81c918883e8b --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a1faf63ddf1e5967bcbc7564cca0878e550a10fd5c70a94eb6c95a593685bb1a +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4d036aabaacc76500f5d8939d68fd1df993d1f65 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:598941b99dc5e91687db030a2c8f97d774da5dfd698f704b08913b1a016348d4 +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..94039ebe08700ba4ffb159b8994b5cd51ea682a3 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72ab181f6abbf460120f36b498bc1b0f55381cb9e89ed792e274d7882048a5e6 +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d960a0d613c80cab93893dfc6fc37762c53a07ae --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ea4419f1086d7916d1291f69d3e901ab3f4410a46ea92de9612ab1966956db2 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4fe18f1119adefc9d7710e94a315fea83bf5c922 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:41865b6e7110ade3ed0dc4b7dd4822c693aa041c7b6d66d2eb4b56b3c4c093ae +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b48ba06b38043af27963590ee106171d4b415fa2 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b897e32a6b54b284123371a3d6a6723390b243e02f04f26fd9581d3f6c987bd5 +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..97c252e76eecda3d31bd67da8d3078e29430cdbb --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3cc63ccc5089557a2e87b00980d91a2d06938ef3a9b5aa7d8a2190352e6ae323 +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1a4dcc01184821f7e8950e99ab49f7a00af4dcf1 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dfd7d0d1dd1b635a02ba12fac94aafca7a93a4e8a5d443d3882a9fe85ffc5b05 +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..47d1e1d5750b23effe5d9fbc2b4df2927f0998f3 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6f98362d8ce730c70434b15d40f49b4626c98be07f69353a8b95052dcf8a237b +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..66af77ea4a5f0c261db8070f68afa00805642bf0 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:382ec54152835fb2c64142856031583dba82753791eb95cc5a7d82d155f81025 +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bd0b83f02b0ec34068f08a20a7c9fd2879d933f5 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db932f7f98136df45e41925f975b3536982c531bdecc1b3bd651774b7e5d7cf8 +size 27478231 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..44e83f21f56a7884d79723911d6b92ffa4ee576c --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfdc19bad5b572f60aa961638ba00774da2451eb5887c70aa49dccbffa06b8a1 +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3c2ae4d45f661f6903e397561748371840199de --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6fa3c0f4b46af308dce5e5df297d9f24a541e6978d3f6b196d2ded16eeb9bd28 +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..32d19fd6984e6289b297330cfc40ddebfb38ada0 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a161bf2cd99349ab2502f542494e752e0b4c6bc318cd2930ea08d8749cc30560 +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..832f07920380a8a497bfbc386e9fdcce6343eb4e --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8082555245fe7d39dc26f1a98cb291d4a0bdf596a442e2eb9171748745512d34 +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c22bb28612583cfa1e2707a99993ec0475b0be47 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae6a0fecdbf1025470283c5b98f0430ad7b3bb005e94a92530cfe8bf0076424b +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0f2bec29631044eabf29fe3ccfeac5b3a81eec3f --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f26f85ae0593a8f68343507afd20e95e18895505f10aa4f97d70db9726dbd872 +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fb5b7d45f1dfd6c9aa5fffb3b234645f0e6079a3 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f2e925824a9020da652c7151028db74c5d3dd4dc5b755a5e5d657133221a0fd4 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5410c14e213dd84142abc936aea9ad38c2d29a03 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e0b5927bfd43ff9028ad0dc63336d2545abc5e572c68d216b70a9d484ee06d9 +size 27478114 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2098ee25c1bd36b2321239882b099095c82c38f2 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:15c18f136d5ce649942d8264a42440344ee86917d73521debfaf73a10f1b9ca5 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1854d68e1847d653cd3fafeb5e803cafd247bca2 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d2a4e79262a4f37e7f78eb68bc1179e45bff9e09321567291273dfd052f3d81 +size 27478434 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4a6d652bd5b7f2704a33c96cbd00e2636a790c1b --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:12ee6802198e0edceb0cb5343103678b32309de3744e3991d8cbf19323d8b7be +size 27478167 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a272979e4b1a0b8ba0414286e704c67f3e942ead --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7202bb8516123976d3f5930b3ced8b19c372e3d23fca56cdbd796a4c5aab10c4 +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c3288235b26dd331e46aa57f04eb528dec2806b2 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c39bed46510e9e829026a361011e0bdbb84287d396d3c1adb8bf419e177486a +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e47e1f62c78242810a80a1e7a85af8e0ecc58c4d --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:665d121eccadc59f2f68ae7c9dfd2e1fdcdc7d9cdcdfeb8e542ad880e59a2ad8 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3382a1fc1e15b3e3feb58fcc9c111d119d572c9f --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d145e4c1d3069b827b391e6e5053c575bd982d5222d3494c6713fa875f9b381c +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e863f28e29d73f3568a8b1b9b126efa410a88d2 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6cf3077f1b51f96c53b8c5743469085a1b0b8e4ab5fffa9531dd3976593586d7 +size 27478434 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fb0c4a3d7da6dfca077924d45325a646deb55d68 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4f95932ee00df966249dbc4f567ebb5882a220f5769abc79ec28504005856ea +size 27478050 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3c9922075e9c5961deda8e08e3eca51dc6ccc91c --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19d69ccbecb70ea8767d98e51b4771a846fa0c781a27272446fff64b628ecac0 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d0fe8f813077822e402da15a281949b37dcd1d8e --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2831e630aaaf1b981027f3fc9b6a81f8326f0bf57728f173d0a31d1ae52f14d4 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..acdd724603ae23ec06cd73e8e608bac907234801 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:067451d0ee02464fccf9cdc93f840a8ab66248f60288119ca87c9d703e9a75bf +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6dd134b5d3fd91b6a8d82f44c1aba22ce5bef3e3 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:58a67da517faee40da4809a6e322aa04d8749afd18272219fdf96a14da96ebbf +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e5c78629493c6b302be97075ec00f7271eb3863a --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:25d1bd61d48771064087a76ddb427764a2cdf601f1a10448ecd73c2a74448dc8 +size 27478231 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d6a20eeb687350d48575a88c54809d04e7634ba9 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0bbab4832f4cde044482dace2bb49f56a47f3ce3b62333fc6a2e0c7f3f21f62 +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6dab45bdffd5cf3e4b07f92686a5614eeb7dfa5f --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d4f46ec66be9ebe4d1e808500e8d100bac0ccc56aec43e4165dd5b8d16d910f9 +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..62b99ee3874d503c5036190e6f9925aa7609db43 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:458845bd499207b74d48420360345606f7ba7c78fe7af4ff11e2db4b06ab9cf1 +size 27478434 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..364e359ff3a7f6877e1d2825c605d2eb5a4fe31f --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae3e0fe84a1133cedcd5ace0cc051886b57e765dd61fa4ab289cd020419c71b8 +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b03171cb0de4b6b1ed6ed1d2c00017e1baf662d --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:96427dc717aaf2127739cfbaeb2fd7b31741467d7544bfcbccbe564563d68b7c +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..be140af30d9740c9de728955e8be8547e4ce1686 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9b66aef03c0c91e55c79c4dd33e7e2237724734d4d67b6203e4a09cceedf8d9 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a804468fce0ee2d8949111c7ba80f993b962110 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d6e4a9d76d06b67daa4f9832f7792343472c51b930b9b5f49c6edb8a9777520 +size 27478306 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..159de9346b3ea1178e00090d3fe73be40aa88bd5 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b3ba4ec05d05a3ff38a641c242b6bffdc59fcda2b6f3babd4cc5d1568e78721 +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..17a1e3dc145baab37d20ffbe392d9acd6c04e941 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bbd5ff0e0ef8738e5f2e0976eb1995ec464cb790ece448dfad84c1d7d47c0549 +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4514fb01279e83ee0ef05142878192fb461fc45f --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6ad94401060d7377e511d2a4f28a00294f9c7be4cc80448705ffbd668f1eae4d +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0c5f0e74f7dae7a0db3a19f1f7d6ba779450e127 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:89450207d07189c7bf6051d7d83c7c4b86cfbf5f2eb106e4e3d268a01e427d9f +size 27478167 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3e3f46ed186d489ed5d42e95ce7153590686c4af --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e22fe4e72513189ffd63f8d7e53bf177e4a8a9cc2972e07ea7930f8bcee7cb39 +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6d88f502e64ec85bdfa2fd5a60f591da280173a5 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:348575cada8ec08c683da69a06a6798bad6af7ca5df4fa42584876150bc5fab6 +size 27478370 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bb56110fd5e7399f0810fd7e6342223d6faff72 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d7f01ea97892cddb42cb2704e24fb9cbf838b4feebf926c9d7931c4b4f50002 +size 27478178 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b096c18b809ae1cee373515e0fcc48713c57f544 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:255fce6f1d746f616d6b90ea9bdbb65a84dde897a5a9837d34e30e63a1bff65b +size 27478242 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1ce7b38eba7b7c29ea06f557ae39a7231d349323 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dbcdcc929d431abbfc1e17510534adfe53ea8b948c744e420d1d2d192502d45e +size 27478359 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c9e5fdcfd9508b6be8295d2f3c1f6cef23d86e86 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6b9db6c8bbb9b0690a876957da41dc04d464894a044e57bc72232616bc62f30 +size 27478103 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7d9c421b0d9e962e6a02524bf9b7f8cc06663c46 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9da8b4b6401987594f5945d6e2781b7303b6b74148cd47c79542f18146b1234e +size 27478359 diff --git a/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3c5816c4f349b9057d1f7d76c79d3951622b31b1 --- /dev/null +++ b/146m32b100mdedup/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de7cd2395dba8fe137ff44a99733c2fb9b1b8a5d6757a536c67a280307d73465 +size 27478167 diff --git a/146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80f3129fc00044c2541b2f71401f501ad019cabd --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2595b6604058f013f97990784a8b3d1189f50324cc4a6e6eaee05e1a65ec3d28 +size 80413955 diff --git a/146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5293652ab385432fb3d99e2e4f664ec6821491d6 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19fcd03c4eb1ad144db51679f6dc5472c8afdaa3345fd56df2ee0387523b9eff +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..98fce0f5d473f695f3b8973fc018da5da7830ef3 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f37e6d8e3758d4141923bd0e2016bf7dd87fe87fd5260c4f5f2bc469eecf295 +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..40fccdb2fdd5632357e91ab997daa16bb696589e --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0116b310414491c20342f8f5c3a2eaf63054d384f1b1a7a461007d7fbca9f778 +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9a42bc66f4f94594f9c11d17b4de0d585c2fcea3 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:649552b5d712c571fe3bcc70e9f4278d67c927722a4e68c5c70a99bdbf1d01fd +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..61fc03a02437d4aaf0f1e29d5d071b5890d7c5dc --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8bc333422e0a4c2081d303908a9c3c83d3f4c766a5d733538cc1469500c096f4 +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dd0791729da4df96ff3f043c48c56be5e21b22af --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bfe7d7d09f0c569691f00760729136b4a81c447778ffeb3461ed9dc5cbc5ac7f +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9971df0b3bfd8327263128a164f8aefe00d6c802 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d340ac928a08cbe37dc32ac499687ecea57587859a68776241dd2dea8693969 +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..be526248bfb1a34e14f36ba6bdf75aa173de83f4 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4fae0407736c53085403011e9203f01e1d311806540470f3f5c6d5bdd2370756 +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bea0dae9ec54d18cae40b44ea26e0a81ace068c8 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e18fbe6977d8edd7c4a1d45675b613f816058023e7380c41b65c0d6fdcf048fc +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..224f50c31034eda5c8e41c77dbd7e969adc4f32c --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f8ffb2398f99e7380db43feb1503a31129cc16dae015fefa49ec578f84359c1a +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b7376a91e4d412299bf8971bf499926d902b5cdc --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4e509dc9af43d5ee6214f14d072f7bc2e9a19e1f6295cac6c33c3754694e7cf +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c39d97bf2d96b7879c2cd932ae68faaf83ec4138 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a37679edf6fd4d6dda30421508369315028865162158290b61769a98bb949aa +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..506e12ded4dad54312d8407e89c29ea9879ccab3 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6db9b8fc22eee33bf8c7c164a0b03f3f7911221a95254d445f28dd033e5630bd +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b8df86a0f45186c49dcc9fd5db2dc4e950c6fa9c --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:878a102d9dfff646fa24a1405939f6871042dad221ceeb07c8a01811a9283e0d +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8e6266cd676714365601b0d1a20ca8cf65909da6 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:611a8a5ec9c030ab1883b9924b1c36c8ae8b52a7814f998a635b84724666f736 +size 14180099 diff --git a/146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt b/146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..166c7c25749948f2da6d18f5ec403dbaa52fb122 --- /dev/null +++ b/146m32b100mdedup/global_step60336/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef50098d2094a9deca858afeef79c9d4b8fb21ff3b3f3ca8403688b1a4e42fbb +size 4291 diff --git a/146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt b/146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e86593479d233060c0425bc63db0eaebe97a8de7 --- /dev/null +++ b/146m32b100mdedup/global_step60336/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2a267b1c0b3999b2768dda4b35e9e733d67e6639c745987e67b792fe8ebff870 +size 35443 diff --git a/146m32b100mdedup/tensorboard_146m32b100mdedup/events.out.tfevents.1679003674.nid005749.86518.0 b/146m32b100mdedup/tensorboard_146m32b100mdedup/events.out.tfevents.1679003674.nid005749.86518.0 new file mode 100644 index 0000000000000000000000000000000000000000..7e0dafa419c0b716614a6962313e546b17976e25 --- /dev/null +++ b/146m32b100mdedup/tensorboard_146m32b100mdedup/events.out.tfevents.1679003674.nid005749.86518.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:186bd9e69eb58cfe50a149027955e6c6ab130b358c79b94d374b4613acdc9a84 +size 107901187 diff --git a/146m32b100mdedup/tensorboard_146m32b100mdedupval/events.out.tfevents.1679041633.nid006724.79473.0 b/146m32b100mdedup/tensorboard_146m32b100mdedupval/events.out.tfevents.1679041633.nid006724.79473.0 new file mode 100644 index 0000000000000000000000000000000000000000..22877d38bfbcb396aa223c6c9de7763f8f85b67b --- /dev/null +++ b/146m32b100mdedup/tensorboard_146m32b100mdedupval/events.out.tfevents.1679041633.nid006724.79473.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fd6e519d16ff33dd1eb47a51b43f13a8e195659ad8ddb8f91e3d887f58737af3 +size 980 diff --git a/146m3b9100mdedup/3326914.err b/146m3b9100mdedup/3326914.err new file mode 100644 index 0000000000000000000000000000000000000000..5eb36d232ff8fdd355c79add4b07e544ac7d66ce --- /dev/null +++ b/146m3b9100mdedup/3326914.err @@ -0,0 +1,1114 @@ +6: 2023-03-16 23:47:19.495322: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:19.495323: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:19.495345: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:19.495342: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:19.495348: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:19.495361: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:19.495350: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:19.495377: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:19.504475: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:19.504485: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:19.504489: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:19.504489: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:19.504474: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:19.504486: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:19.504499: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:19.504479: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:19.576779: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:19.576779: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:19.576796: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:19.576779: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:19.576791: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:19.576806: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:19.576799: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:19.576815: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:19.593887: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:19.593896: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:19.593899: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:19.593907: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:19.593900: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:19.593909: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:19.593917: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:19.593928: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:19.650226: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:19.650234: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:19.650228: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:19.650243: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:19.650237: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:19.650246: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:19.650236: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:19.650248: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:19.776995: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:19.776999: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:19.776998: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:19.777004: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:19.777006: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:19.777002: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:19.776994: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:19.776994: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:19.819776: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:19.819773: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:19.819764: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:19.819778: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:19.819769: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:19.819763: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:19.819777: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:19.819770: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:19.833261: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:19.833258: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:19.833267: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:19.833269: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:19.833274: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:19.833256: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:19.833270: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:19.833271: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:22.152762: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.152768: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.152773: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.152778: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.152783: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.152792: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.152788: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.152786: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.153149: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.153154: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.153155: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.153157: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.153161: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.153168: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.153169: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.153190: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.154221: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.154220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.154228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.154227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.154227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.154237: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.154228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.154230: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.154601: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.154605: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.154611: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.154612: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.154615: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.154616: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.154618: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.154621: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.155995: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.155995: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.156003: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.156005: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.156004: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.156004: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.156008: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.156033: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.156394: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.156398: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.156404: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.156403: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.156407: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.156408: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.156410: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.156432: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.203414: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.203421: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.203414: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.203420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.203423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.203427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.203421: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.203430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.203960: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.203965: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.203968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.203972: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.203973: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.203978: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.203980: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.203986: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.204237: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.204239: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.204245: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.204250: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.204253: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.204247: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.204249: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.204252: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.204644: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.204648: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.204652: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.204652: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.204657: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.204655: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.204659: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.204661: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.204758: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.204763: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.204774: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.204772: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.204773: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.204774: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.204777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.204770: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.205161: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.205165: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.205171: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.205171: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.205174: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.205175: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.205175: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.205179: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.208207: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.208210: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.208203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.208212: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.208214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.208220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.208216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.208219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.208579: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.208577: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.208581: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.208585: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.208588: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.208588: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.208588: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.208592: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.212524: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.212521: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.212521: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.212524: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.212530: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.212529: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.212534: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.212519: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.212944: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.212946: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.212948: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.212953: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.212951: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.212955: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.212963: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.212964: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:37.236925: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.236962: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.236998: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.236993: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.237004: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.237017: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.237020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.237031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.244831: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.244857: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.244874: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.244882: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.244895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.244901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.244916: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.244923: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.252393: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.252393: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.252396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.252398: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.252399: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.252403: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.252399: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.252403: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.252410: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.252410: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.252415: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.252418: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.252419: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.252420: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.252422: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.252423: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.253849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.253868: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.253883: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.253898: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.253915: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.253922: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.254031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.254036: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.254168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.254174: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.254194: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.254196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.254203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.254220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.254225: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.254227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.254450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 23:47:37.254426: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.254513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 23:47:37.254474: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.254499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.254515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.254513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.254528: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 23:47:37.254477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:47:37.254700: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.254553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 23:47:37.254751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 23:47:37.254500: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.254730: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.254613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:47:37.254758: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.254777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.254790: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254785: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:47:37.254800: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.254804: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 23:47:37.254802: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.254804: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 23:47:37.254817: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254832: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254832: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254853: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256471: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256479: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256485: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256482: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256485: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256497: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256495: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256502: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256504: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256505: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256504: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256517: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256533: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256718: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256720: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256722: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256728: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256735: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256726: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256726: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256745: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256744: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256748: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256749: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256750: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256752: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256827: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256840: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257065: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257062: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257063: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257066: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257069: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257071: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257081: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257076: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257087: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257087: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257090: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257090: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257092: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257096: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257303: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 23:47:37.257128: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257148: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.257304: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.257304: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 23:47:37.257382: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.257305: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.257434: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:47:37.257310: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.257384: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:47:37.257312: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.257388: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:47:37.257305: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 23:47:37.257437: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.257389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:47:37.257305: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 23:47:37.257436: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.257318: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257319: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.257326: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257328: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257328: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.257439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:47:37.257329: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257331: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257331: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.257441: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.257442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 23:47:37.257387: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.257443: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 23:47:37.257393: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.257449: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.257446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 23:47:37.257396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.257454: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.257455: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.257403: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.257404: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.257457: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.257457: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.257404: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.257407: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.257409: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.257459: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.257462: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.257465: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.257410: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.257414: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.257442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.257456: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.238794: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.238797: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.238797: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.238799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.238802: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.238800: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.238809: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.238805: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.238805: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.238813: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.238815: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.238819: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.238819: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.238820: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.238824: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.238825: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: +0: +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +1: +1: +1: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: +2: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +3: Building extension module utils... +3: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Loading extension module utils... +0: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +3: Loading extension module utils... +0: Loading extension module utils... +3: Loading extension module utils... +0: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils...Loading extension module utils... +3: +3: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils...Loading extension module utils... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Loading extension module utils...Loading extension module utils... +0: +0: +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Loading extension module utils... +4: +4: +4: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +5: +5: +5: Loading extension module utils...Loading extension module utils... +5: +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m3b9100mdedup/3326914.out b/146m3b9100mdedup/3326914.out new file mode 100644 index 0000000000000000000000000000000000000000..36b5b38859a30044187d6df93bf22b28fede108b --- /dev/null +++ b/146m3b9100mdedup/3326914.out @@ -0,0 +1,5664 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m3b9100mdedupval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m3b9100mdedupval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m3b9100mdedup --load checkpoints_146m3b9100mdedup --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3326914.json --zero-stage 0 +START 3326914: Thu 16 Mar 2023 11:46:58 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 45.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 45.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 46.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 47.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 36.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 46.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 40.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 40.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 46.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 40.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 38.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 43.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 42.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 39.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 41.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 37.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 42.0c 99.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +5: Launching on nid005159 (5/8), master nid005154 port 9999, GPUs 8, CUDA: True +3: Launching on nid005157 (3/8), master nid005154 port 9999, GPUs 8, CUDA: True +0: Launching on nid005154 (0/8), master nid005154 port 9999, GPUs 8, CUDA: True +4: Launching on nid005158 (4/8), master nid005154 port 9999, GPUs 8, CUDA: True +1: Launching on nid005155 (1/8), master nid005154 port 9999, GPUs 8, CUDA: True +2: Launching on nid005156 (2/8), master nid005154 port 9999, GPUs 8, CUDA: True +7: Launching on nid005161 (7/8), master nid005154 port 9999, GPUs 8, CUDA: True +6: Launching on nid005160 (6/8), master nid005154 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3326914.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m3b9100mdedupval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m3b9100mdedup +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m3b9100mdedup +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m3b9100mdedupval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 23:48:24,995] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.116 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.cuda.o scaled_upper_triang_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: [1/1] c++ layer_norm_cuda.o layer_norm_hip_kernel.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so +0: >>> done with compiling and loading fused kernels. Compilation time: 25.765 seconds +0: time to initialize megatron (seconds): 69.663 +0: [after megatron is initialized] datetime: 2023-03-16 23:48:53 +0: building GPT model ... +0: [2023-03-16 23:48:53,800] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 23:48:53,801] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 23:48:53,801] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.2 GB, percent = 6.6% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-16 23:48:55,778] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 23:48:56,315] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 23:48:56,315] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-16 23:48:56,315] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.22 GB, percent = 6.6% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 23:48:56,317] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 23:49:09,851] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 23:49:09,852] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 23:49:09,852] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 23:49:09,856] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 23:49:09,856] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 23:49:09,974] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 23:49:09,975] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-16 23:49:09,975] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.91 GB, percent = 6.7% +3: ninja: no work to do. +3: Time to load utils op: 0.3132457733154297 seconds +0: Time to load utils op: 0.207075834274292 seconds +6: Time to load utils op: 0.31934523582458496 seconds +7: Time to load utils op: 0.3186178207397461 seconds +0: Time to load utils op: 0.30517029762268066 seconds +0: Time to load utils op: 0.3057887554168701 seconds +0: Time to load utils op: 0.30578017234802246 seconds +0: Time to load utils op: 0.30622053146362305 seconds +0: Time to load utils op: 0.30419421195983887 seconds +0: Time to load utils op: 0.3064289093017578 seconds +3: Time to load utils op: 0.30271029472351074 seconds +0: Time to load utils op: 0.3063974380493164 seconds +3: Time to load utils op: 0.3028874397277832 seconds +3: Time to load utils op: 0.3031766414642334 seconds +3: Time to load utils op: 0.3031768798828125 seconds +2: Time to load utils op: 0.3278162479400635 seconds +2: Time to load utils op: 0.3278310298919678 seconds +2: Time to load utils op: 0.3278341293334961 seconds +2: Time to load utils op: 0.32784557342529297 seconds +2: Time to load utils op: 0.32784247398376465 seconds +2: Time to load utils op: 0.32786035537719727 secondsTime to load utils op: 0.32786130905151367 seconds +2: +2: Time to load utils op: 0.32785677909851074 seconds +3: Time to load utils op: 0.10209107398986816 seconds +3: Time to load utils op: 0.10186505317687988 seconds +3: Time to load utils op: 0.10198330879211426 seconds +1: Time to load utils op: 0.33618807792663574 seconds +1: Time to load utils op: 0.33619117736816406 seconds +1: Time to load utils op: 0.3362081050872803 seconds +1: Time to load utils op: 0.33621644973754883 seconds +1: Time to load utils op: 0.336226224899292 secondsTime to load utils op: 0.3362405300140381 seconds +1: +1: Time to load utils op: 0.3362429141998291 seconds +1: Time to load utils op: 0.33625173568725586 seconds +6: Time to load utils op: 0.10264754295349121 seconds +6: Time to load utils op: 0.10247087478637695 seconds +6: Time to load utils op: 0.10243368148803711 seconds +6: Time to load utils op: 0.10248756408691406 seconds +6: Time to load utils op: 0.10308480262756348 seconds +6: Time to load utils op: 0.10296225547790527 seconds +6: Time to load utils op: 0.10289764404296875 seconds +7: Time to load utils op: 0.1023406982421875 secondsTime to load utils op: 0.10259246826171875 seconds +7: +7: Time to load utils op: 0.1021580696105957 seconds +7: Time to load utils op: 0.10237264633178711 seconds +7: Time to load utils op: 0.1030588150024414 seconds +7: Time to load utils op: 0.10256671905517578 seconds +7: Time to load utils op: 0.1031947135925293 seconds +5: Time to load utils op: 0.11148476600646973 seconds +5: Time to load utils op: 0.11149358749389648 seconds +5: Time to load utils op: 0.11150574684143066 secondsTime to load utils op: 0.11147928237915039 seconds +5: Time to load utils op: 0.1115121841430664 seconds +5: +5: Time to load utils op: 0.11152458190917969 secondsTime to load utils op: 0.11153101921081543 seconds +5: +5: Time to load utils op: 0.11150574684143066 seconds +4: Time to load utils op: 0.11174464225769043 secondsTime to load utils op: 0.11174607276916504 seconds +4: +4: Time to load utils op: 0.11174845695495605 seconds +4: Time to load utils op: 0.11177515983581543 seconds +4: Time to load utils op: 0.11173534393310547 seconds +4: Time to load utils op: 0.1117854118347168 seconds +4: Time to load utils op: 0.11175727844238281 secondsTime to load utils op: 0.11176776885986328 seconds +4: +0: [2023-03-16 23:49:10,287] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 23:49:10,288] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-16 23:49:10,288] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.91 GB, percent = 6.7% +3: Time to load utils op: 0.0005102157592773438 seconds +3: Time to load utils op: 0.00048041343688964844 seconds +3: Time to load utils op: 0.0005054473876953125 seconds +3: Time to load utils op: 0.0004138946533203125 secondsTime to load utils op: 0.0004608631134033203 seconds +3: +3: Time to load utils op: 0.00041031837463378906 secondsTime to load utils op: 0.0004184246063232422 seconds +3: +3: Time to load utils op: 0.0004787445068359375 seconds +6: Time to load utils op: 0.0011518001556396484 seconds +6: Time to load utils op: 0.0012943744659423828 seconds +6: Time to load utils op: 0.001325845718383789 seconds +6: Time to load utils op: 0.0013058185577392578 seconds +6: Time to load utils op: 0.0013015270233154297 seconds +6: Time to load utils op: 0.0012943744659423828 seconds +6: Time to load utils op: 0.0013234615325927734 seconds +6: Time to load utils op: 0.001322031021118164 seconds +0: Time to load utils op: 0.0004863739013671875 seconds +0: Time to load utils op: 0.0004248619079589844 secondsTime to load utils op: 0.0004172325134277344 secondsTime to load utils op: 0.0004513263702392578 seconds +0: Time to load utils op: 0.0004591941833496094 seconds +0: +0: +0: Time to load utils op: 0.0004131793975830078 seconds +0: Time to load utils op: 0.0004177093505859375 seconds +1: Time to load utils op: 0.0009388923645019531 secondsTime to load utils op: 0.0009360313415527344 seconds +1: +1: Time to load utils op: 0.0010840892791748047 seconds +1: Time to load utils op: 0.0012569427490234375 seconds +1: Time to load utils op: 0.0012843608856201172 seconds +1: Time to load utils op: 0.0012176036834716797 seconds +1: Time to load utils op: 0.0012519359588623047 seconds +1: Time to load utils op: 0.0012936592102050781 seconds +4: Time to load utils op: 0.0008454322814941406 seconds +7: Time to load utils op: 0.0008401870727539062 seconds +5: Time to load utils op: 0.0008373260498046875 seconds +5: Time to load utils op: 0.0009026527404785156 seconds +4: Time to load utils op: 0.001047372817993164 seconds +4: Time to load utils op: 0.001041412353515625 secondsTime to load utils op: 0.0010907649993896484 seconds +4: +7: Time to load utils op: 0.001005411148071289 seconds +4: Time to load utils op: 0.0011138916015625 seconds +4: Time to load utils op: 0.001190185546875 seconds +5: Time to load utils op: 0.0009975433349609375 seconds +5: Time to load utils op: 0.0009853839874267578 seconds +4: Time to load utils op: 0.001119852066040039 seconds +7: Time to load utils op: 0.0011053085327148438 seconds +7: Time to load utils op: 0.001203298568725586 secondsTime to load utils op: 0.0010783672332763672 seconds +7: +4: Time to load utils op: 0.0011777877807617188 seconds +7: Time to load utils op: 0.0011568069458007812 secondsTime to load utils op: 0.001157522201538086 seconds +7: +7: Time to load utils op: 0.00122833251953125 seconds +5: Time to load utils op: 0.0011718273162841797 seconds +5: Time to load utils op: 0.0011749267578125 seconds +5: Time to load utils op: 0.0011479854583740234 seconds +5: Time to load utils op: 0.0012903213500976562 seconds +2: Time to load utils op: 0.0008757114410400391 seconds +2: Time to load utils op: 0.0008394718170166016 seconds +2: Time to load utils op: 0.0008459091186523438 seconds +2: Time to load utils op: 0.0008625984191894531 seconds +2: Time to load utils op: 0.0009438991546630859 seconds +2: Time to load utils op: 0.0009551048278808594 seconds +2: Time to load utils op: 0.0009400844573974609 seconds +2: Time to load utils op: 0.001028299331665039 seconds +0: [2023-03-16 23:49:10,402] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 23:49:10,403] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-16 23:49:10,403] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.07 GB, percent = 6.8% +0: [2023-03-16 23:49:10,507] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 23:49:10,507] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-16 23:49:10,507] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.06 GB, percent = 6.8% +0: [2023-03-16 23:49:10,612] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 23:49:10,612] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:10,613] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.06 GB, percent = 6.8% +0: [2023-03-16 23:49:10,716] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 23:49:10,716] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:10,716] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.06 GB, percent = 6.8% +0: [2023-03-16 23:49:10,821] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 23:49:10,822] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:10,822] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.06 GB, percent = 6.8% +0: [2023-03-16 23:49:10,923] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 23:49:10,924] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:10,924] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.06 GB, percent = 6.8% +0: [2023-03-16 23:49:11,030] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 23:49:11,031] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:11,031] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.06 GB, percent = 6.8% +0: [2023-03-16 23:49:11,134] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 23:49:11,135] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:11,135] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.06 GB, percent = 6.8% +0: [2023-03-16 23:49:11,135] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 23:49:11,135] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 23:49:11,135] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 23:49:11,135] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 23:49:11,135] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 23:49:11,136] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-16 23:49:11,137] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 23:49:11,138] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 23:49:11,138] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 23:49:11,138] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 23:49:11,138] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0004189014434814453 seconds +0: [2023-03-16 23:49:11,138] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-16 23:49:11,149] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +7: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:11,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:11,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:11,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:11,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:11,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:11,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:11,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:11,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:11,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:11,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:11,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:11,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:11,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:11,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:11,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:11,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:11,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:11,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:11,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:11,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:11,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:11,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:11,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:11,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:11,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:11,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:11,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:11,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:11,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:11,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:11,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:11,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:11,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:11,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:11,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:11,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:11,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:11,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:11,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:11,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:11,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:11,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:11,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:11,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:11,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:11,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:11,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:11,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:11,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:11,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:11,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:11,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:11,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:11,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:11,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:11,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:11,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:11,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:11,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:11,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:11,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:11,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:11,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:11,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:11,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:11,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:11,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:11,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:11,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:12,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:12,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:12,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:12,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:12,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:12,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:12,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:12,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:12,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:12,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:12,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:12,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:12,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:12,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:12,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:12,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:12,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:12,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:12,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:12,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:12,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:12,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:12,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:12,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:12,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:12,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:12,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:12,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:12,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:12,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:12,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:12,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:12,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:12,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:12,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:12,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:12,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:12,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:12,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:12,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:12,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:12,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:12,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:12,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:12,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:12,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:12,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:12,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:12,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:12,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:12,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:12,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:12,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:12,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:12,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:12,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:12,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:12,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:12,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:12,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:12,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:12,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:12,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:12,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:12,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:12,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:12,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +0: [2023-03-16 23:49:12,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:12,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:12,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:12,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:12,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:12,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:12,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:12,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:12,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:12,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:12,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:12,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:12,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:12,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:12,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:12,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:12,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:12,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:12,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:12,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:12,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:12,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:12,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:12,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:12,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:12,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:12,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:12,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:12,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:12,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:12,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:12,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:12,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:12,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:12,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:12,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:12,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:12,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:12,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:12,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:12,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:12,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:12,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:12,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:12,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:12,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:12,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:12,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:12,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:12,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:12,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:12,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:12,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:12,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:12,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:12,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:12,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:12,856] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +3: [2023-03-16 23:49:12,858] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +7: [2023-03-16 23:49:12,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,869] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +7: [2023-03-16 23:49:12,871] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +5: [2023-03-16 23:49:12,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:12,873] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +6: [2023-03-16 23:49:12,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,874] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +6: [2023-03-16 23:49:12,874] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +5: [2023-03-16 23:49:12,875] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +4: [2023-03-16 23:49:12,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,875] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +6: [2023-03-16 23:49:12,876] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +6: [2023-03-16 23:49:12,876] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +4: [2023-03-16 23:49:12,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,876] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +7: [2023-03-16 23:49:12,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,877] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +4: [2023-03-16 23:49:12,877] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +4: [2023-03-16 23:49:12,878] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +3: [2023-03-16 23:49:12,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,879] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +3: [2023-03-16 23:49:12,878] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +0: [2023-03-16 23:49:12,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:12,880] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +3: [2023-03-16 23:49:12,880] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +0: [2023-03-16 23:49:12,881] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +5: [2023-03-16 23:49:12,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:12,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:12,883] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +0: [2023-03-16 23:49:12,883] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +0: [2023-03-16 23:49:12,885] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +5: [2023-03-16 23:49:12,885] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +1: [2023-03-16 23:49:12,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:12,885] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +1: [2023-03-16 23:49:12,887] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +1: [2023-03-16 23:49:12,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:12,891] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +3: [2023-03-16 23:49:12,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:12,892] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +3: [2023-03-16 23:49:12,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:12,892] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +1: [2023-03-16 23:49:12,893] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +4: [2023-03-16 23:49:12,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,894] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +3: [2023-03-16 23:49:12,894] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +3: [2023-03-16 23:49:12,895] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +2: [2023-03-16 23:49:12,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,895] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +2: [2023-03-16 23:49:12,895] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +6: [2023-03-16 23:49:12,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,897] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +2: [2023-03-16 23:49:12,897] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +2: [2023-03-16 23:49:12,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,898] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +2: [2023-03-16 23:49:12,898] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +3: [2023-03-16 23:49:12,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:12,899] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +2: [2023-03-16 23:49:12,900] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +0: [2023-03-16 23:49:12,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,901] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +0: [2023-03-16 23:49:12,901] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +3: [2023-03-16 23:49:12,902] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +6: [2023-03-16 23:49:12,903] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +0: [2023-03-16 23:49:12,903] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +4: [2023-03-16 23:49:12,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,903] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +4: [2023-03-16 23:49:12,905] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +0: [2023-03-16 23:49:12,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:12,906] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +0: [2023-03-16 23:49:12,908] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +5: [2023-03-16 23:49:12,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:12,909] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +5: [2023-03-16 23:49:12,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:12,910] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +1: [2023-03-16 23:49:12,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:12,911] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +1: [2023-03-16 23:49:12,911] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +5: [2023-03-16 23:49:12,912] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +4: [2023-03-16 23:49:12,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,913] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +1: [2023-03-16 23:49:12,913] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +5: [2023-03-16 23:49:12,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:12,914] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +4: [2023-03-16 23:49:12,914] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +4: [2023-03-16 23:49:12,914] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +7: [2023-03-16 23:49:12,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,915] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +7: [2023-03-16 23:49:12,915] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +2: [2023-03-16 23:49:12,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:12,915] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +7: [2023-03-16 23:49:12,915] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +4: [2023-03-16 23:49:12,916] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +5: [2023-03-16 23:49:12,916] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +7: [2023-03-16 23:49:12,917] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +5: [2023-03-16 23:49:12,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,917] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +2: [2023-03-16 23:49:12,917] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +5: [2023-03-16 23:49:12,917] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +7: [2023-03-16 23:49:12,917] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +3: [2023-03-16 23:49:12,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:12,918] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +0: [2023-03-16 23:49:12,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:12,918] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +0: [2023-03-16 23:49:12,919] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +7: [2023-03-16 23:49:12,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,919] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +4: [2023-03-16 23:49:12,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,919] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +3: [2023-03-16 23:49:12,920] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +1: [2023-03-16 23:49:12,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:12,920] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +6: [2023-03-16 23:49:12,920] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +4: [2023-03-16 23:49:12,921] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +7: [2023-03-16 23:49:12,921] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +0: [2023-03-16 23:49:12,921] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +1: [2023-03-16 23:49:12,921] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +3: [2023-03-16 23:49:12,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,922] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +3: [2023-03-16 23:49:12,922] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +3: [2023-03-16 23:49:12,923] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +1: [2023-03-16 23:49:12,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:12,924] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +6: [2023-03-16 23:49:12,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,924] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +5: [2023-03-16 23:49:12,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:12,925] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +1: [2023-03-16 23:49:12,926] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +6: [2023-03-16 23:49:12,926] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +5: [2023-03-16 23:49:12,927] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +6: [2023-03-16 23:49:12,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,928] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +0: [2023-03-16 23:49:12,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:12,928] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +6: [2023-03-16 23:49:12,930] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +0: [2023-03-16 23:49:12,930] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +6: [2023-03-16 23:49:12,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:12,930] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +0: [2023-03-16 23:49:12,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:12,931] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +6: [2023-03-16 23:49:12,931] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +0: [2023-03-16 23:49:12,932] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +0: [2023-03-16 23:49:12,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:12,934] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +0: [2023-03-16 23:49:12,935] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +2: [2023-03-16 23:49:12,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:12,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:12,936] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +2: [2023-03-16 23:49:12,936] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +7: [2023-03-16 23:49:12,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,937] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +1: [2023-03-16 23:49:12,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:12,937] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +2: [2023-03-16 23:49:12,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:12,938] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +2: [2023-03-16 23:49:12,938] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +2: [2023-03-16 23:49:12,938] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +7: [2023-03-16 23:49:12,939] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +1: [2023-03-16 23:49:12,939] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +2: [2023-03-16 23:49:12,940] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +4: [2023-03-16 23:49:12,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,942] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +5: [2023-03-16 23:49:12,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:12,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:12,943] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +5: [2023-03-16 23:49:12,943] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +2: [2023-03-16 23:49:12,943] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +3: [2023-03-16 23:49:12,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:12,944] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +5: [2023-03-16 23:49:12,945] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +2: [2023-03-16 23:49:12,945] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +3: [2023-03-16 23:49:12,946] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +2: [2023-03-16 23:49:12,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:12,975] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +2: [2023-03-16 23:49:12,975] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +7: [2023-03-16 23:49:12,977] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +2: [2023-03-16 23:49:12,977] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +1: [2023-03-16 23:49:13,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:13,030] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +1: [2023-03-16 23:49:13,031] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +1: [2023-03-16 23:49:13,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:13,098] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +1: [2023-03-16 23:49:13,099] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +0: successfully loaded checkpoint from checkpoints_146m3b9100mdedup at iteration 0 +7: time (ms) | load-checkpoint: 1953.50 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 23:49:13 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.032957 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.026 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.026810 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.010 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-16 23:49:26 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 20137.58 | train/valid/test-data-iterators-setup: 11657.08 +0: [after training is done] datetime: 2023-03-16 23:49:26 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.871975E+00 | lm loss PPL: 4.803716E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3326914: Thu 16 Mar 2023 11:49:47 PM EET diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc41fbe62d27bf15474244b4a7e7ecd03530b62b --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca11821e1dd00f528d12dde478cfd127714595b5272e7aec92835fbb5ac3ebbd +size 27478295 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fb79044e26db078f8d8bed30a26875667534e722 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:795c27886e4673d484bb9325c7834e0adc51a5e01da29d1a8d8d5cfc1c10711f +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..77251874149d018ba24cff5aff2d378389ed1665 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f7df71068a11a51eea668b587857666ec68a044e411140116580393af73b363 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..db5131d7cd02c5f237e5b8f4a1c9984240ce26d0 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb06d617b8f562e60086aba748cd6a1a6615a8e6fc71a11836e506dda60bafd5 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a4c7bbc562bbbe6fbb6eba1031ce4a01b17a0ba8 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:824cf07ffea03af1f22938b4684de1edd3f1611e9fbaa4951b938e1cc59793f6 +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c38e96924344c4970154080a820ad17de073ec0 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5d699ba3fd60ad5d2642bf0a857d25f973f4de3f5b69a18b3de69a29732cb13 +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7d7e58b43b5c955421cd57b30be47a6d4ac4a29 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee2b54fceb7442641f1228acbba7bb2c473c8cae25abe458ac11bf2fb76e3624 +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc1eb78cc7d288d361df34fc015c5e5537bb2c89 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf40ba740269daa64037a0623cb7a12f2e614b4737c82dd9b97d1650fff418c4 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb94b254a22a1f945815761e9736d9493f73bef1 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9cd48b907038e2dbf774f62b15e985ebacba59790a2b22d5629488f7db396610 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a33fac40ab3a8a080edabe8598fb522115367b21 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66295b7b31516903881d145fa43a5369f03068d74878df05a5f90f65da66d9e4 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fd9fbf562e6f385ef0e035260959bd3613426a51 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ef9a7f646c39c53b3d2a16b8712af9b08d84e933596affeafbc706cd091385d +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b6ca00ad59058b2217e44544ad9b4e05a44c6940 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abdc09bf6f22af0579bd2eaf1837b05c008cb485583c95e535a093c056be89e0 +size 27478231 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5abf9d969d38146f1140c487782eb5b93d11f94b --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f94523e72eb754c0a0a6b6e9fe06b073c31309e696fc62b8acb0a3ddd8cbf4b +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1cdd71086c66565ec26d84ee717a4109f9fd7e8 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:60d6d2e1eba1cb79f8150bc56e34e21a1f7c1b6736b0a593757b5c1edf6a948d +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5663626003ce5aef5d73218607b9ea85dc4a950a --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32b2eebeb8911a6566f21e5dc996686013f053b44f08cd46de7da742fa8f6cc9 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7b4172abfb48bc911376521655f65312bcec0d6e --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c4733a8a3636ff7200849a05d80f09d93bac2ace23242de27b9487f5bc5b0f3 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7ef77df55d079a086c27689aecd4b40a7c3a4a9a --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:74a8a6ba2ca05cf31575b084fe7159b70b8153743a451b4b5b73a73f44b9564e +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..330cbb535b9a197e4eba29962020b4671a10df1c --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a58efa2b92447343d8de2d21fc96aa0942229eca4aa433c68b0e81a3e22618bf +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..de5c7fd90f0d2e68a4d9c98fe306417191fec930 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7056d0b6177fe4dfdca460f00cf0077f62bcc350d2672cb1540b0cb8b2ea552d +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f2eb32cb95799b22605df8bfd388924cc0c2e3d --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:018cde2514b8fb3978c96fe79bda01549b5721a0123f9725c522c8f900ccf7e6 +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..11a33531c2258d7e42625ac74c7e99f915e6990b --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e2b81a0b34d5bfffdc64bd4e80f455f3a7e616a5ac8dafcba5185a3e63df422 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8996e7f58b1286c75cc8c91efdfd4505b0112a02 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb7fc32bb303dac3dd1321aec825c5e23fb7af36c4a542216283ce78a947d87d +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..097a5717361345768d0c8661fb6ec5e08e84fcfe --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:518b05bfa5d2d5380457e71bf74683b3249ae994763d7d5c9902e2c8c2dec968 +size 27478231 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f2c2dc609157e670eaf714819e120006b36b0d65 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ba5a848bd68b60ec343b4d4f3fec389ecbf78a0dd238619902e6ba9f0992c16 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..023960031b071acee4f8aeea0fb108a683261356 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5dd5f7582ba4f683645e86dabe5378c283dabffa026086bc648f03a4122a0ec0 +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..852c2223e6cd2048df67d900a70baad3e7ab7a47 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3419868a1eb8a96eed25266d2f3d101efaa7e96b8bebed2e19548df50d900a9 +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d3c8f6af3191d8e17f2b9d1e6eb3a2d4eed16761 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:118bbc4c17d987be0ee8d8fe003dbb3b4349daa25f40c169122a3758376f34ed +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..49254a1e9f615d4b827aaee8792aeeb3ea215607 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:45bf15f0b1fd4efeb2b6ae721d7a2afc7ddf16f343583ac3b732b6f71f64f403 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..019a0e26e123a7ca95a4f8eb2f7cd57815ecef75 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:99e2e67dbcdaa5d68bb7fb7f8956ff5b550bfe1672ff10c5fec02bfa7abd0105 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..68602b1063a0f814d1e9c3dfeeb852e907d65f7c --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1463de93cc5f5eba05f1089a19cd3de07b6a257c1476722249ea50be4c389ce0 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d0f14218816345b731bdae6b95cb65c24ac1bef2 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e4c301bb11f19a5681fa9f0bad498fbe25dccbb305c60f3cd6f199b000cc8dff +size 27478114 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..742e8e26bc03b02b4976f6fdfff5f87ea90c25d0 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:88bd38708c1f768f491a8f78ca3ff8c1a6196e3f0995723ba299b0b823c6f25c +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..70f32465f411ce87b2337c82da420231e0d380aa --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2009f0ab2d65cf0aaa60b286bf932d7453825b795d5f9e93fd8007bba5379487 +size 27478434 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bf8581cf1be4af46584464bd8e4da2b852a4f691 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f25e464596ba525be5f39c932a98d64814c5b6e99b1a25fb14aa0a717196fe2 +size 27478167 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..85ab683d0f7743e484126d22a72344aa5056b2ad --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:99429382d6482912ad56ab8a788d20b6f21be21780431c496f7c2b3a4c92c0ab +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..78c0b5259b9487aed848e8b9f0c6f48bcb0f5c80 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a37214ea2cb3c52906287c75834e2593a5f5663b84053f4545a03ff8956d4ea3 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d98c28aea1cbf431f8a1c09047154739cc9a3a12 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c0fa09b44fe9187b53ffa9b17e929d62dd5ba9c3c4c61304d6c3fadc47d2fa17 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b69d98395224c6f9f5087bee4f5704b49147d3cd --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:878d2464fa066b654751a10e9d1a99ecef747003c3d4560ee804ae52e3667abf +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b8d82a94aa85d7d697cb1de46589c50fc5def54 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51922f96ef9c488ef861a2233f94d695d534141d185f0513ecb52fb03a48f87b +size 27478434 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4dbc6d265780ce07d702202b7c227cf7175e832d --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:289e931b220fd4ab5411f1283e2b56c0bed50354a7df0a956105ddb855f40f83 +size 27478050 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c62cc5dfe6d8387cfbcc9da02ccb2b24dd16a466 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:03db34ee0c4bc5c3c2cd0b92129d9cead6a2ad4fc60fc31c4a650887e7a5763b +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e75590e16c31b81afe3d3a1af4fe4e29cf59e23c --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7640d96f269f36a0c33b6ee052f2c59341bcd99d7d204b83a4f4d1da15db23fb +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d56885bf13f17eb4b081ee7e17100c91636ef8b2 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ec198940820f90b86215ccdf05b8d9852ca9ad4a576824c7a5694894b04dd3e8 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..40c0aa8825052626ca6c1eeea28a01f2fab39b0e --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48823a07a2d7b27f4dbdc136c81df95bc9e744fa0bdb0c8c4c6e208038cc64e1 +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6c8107d023f43ab660a027a0200a498130597e0f --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:833c840d50cace3cddb9d0e9c56b6c8581eb34e0dd087c569b5b28a91831f920 +size 27478231 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fc2a2b26bd9e8d023b332e8769dbfcbb77d8e052 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:012fc0deff526351db9fe491f675c350babbd7e7112f8b82b41b1e19bcbe7238 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3e1b471083bd91c6f7c390446e17d549ce0e6b7c --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:625c486df13f5ae1c44932eb619d94bcd57f172a8b7e8f7f80b610a0c7dd5c30 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4cc198a68770392f7049c72569e330c88e063b56 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c3e5511fccc7f7d006eb005ed820c216afe2c40d7e9835ecc042de8571fd4ac5 +size 27478434 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..651bdbe80cbe4aaa9cf7f49448717bdfe91e6575 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:623f2cad02b5a25102837f2db5ee33c366ca3600b297216d25e9fb5081d40f52 +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b8dbc3501a3b19a4188154d1e789efeb723da74f --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9990e084ca1fb2744c201b8954bbd1e80a4fae6090a259c9c5674cdc0f902103 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..20fdbd664c7cd6391efb5fba41f17db905775420 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:532df141a6cad145d9ee85a80ed825203c5e4028e6739ba7ddf061943ebcca42 +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..434216aa45880cc33f8cd29584a7338df7f9e3e9 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c321ca762d6a5329440b8ee7e822b0f931574ce86b08f97e021cb1b53bd8aaeb +size 27478306 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9c7bc2c20210eae94e8c2b40c8beab3ec6c202b --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff0a09444099d0c6ab83f54d05dd2dde9344fcda0b59b20d4f479e648e619ed5 +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b67cbdf04195d39466e1cf3a5c72fbf67f5c16b7 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3ed7f8f64862054ef64d019e3294774c2afd60694507006d4ffd6e79f1450d54 +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aadbca037a8842b57dc8cf7d187eb020ec00713f --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1189c0548fddbdf20f44538cf60a18a8b3f29536ac4da12425c8b5518e8d4361 +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1febee0dc889d17dd92d9e7a1a520f1727610c0f --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c1ecf6f74c84e0f3eae45a8bda8a3397c975785736bea4a3487ab829d9f92c6 +size 27478167 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7d5b9207d040489440792bb1ef225bb6a157e323 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:242d2c02aa73a995be7e49478817e7962efa2eab71a6060410df970d65d632f9 +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..764017284e5fad2ce540c84c4f0c35ea9c7a3792 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f3231228c83cc4467e02c11c5e565c20fbe079ded160c98791fd28838f55f85 +size 27478370 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5df02f8176ca23e12de793b899ce1e1786a7a063 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9141604902a0a2d98f849b4c69ea6ee5893cbc9cb14a5cae3e47717676e96286 +size 27478178 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f60411a72ea269c1856b10d770e776801529cc32 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5156d2d68febc1af31439fca1245b3fe651be9ab84e25c2dc94d9883db5ddc4c +size 27478242 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..870271ba7ce9c1b04d76e5bd6eb70eb2902fb22e --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e6793ce6bb36ad598ac1958c45c4f96954df8f700063df2a798ed372bf74701a +size 27478359 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9ed1bb9b8a1837fc7aaf57e4ae0ca1bfa2a31172 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:668368740a11f3f160c5c97e68b0c744c1cce81689d864327be6a25789a0024b +size 27478103 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2cea4e96afdca2025e8045068afc6a6deb4f6117 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:24dd64854b6d4f89f46f022393c50f6724d9910c5ea58e49af8339b02c2bba2f +size 27478359 diff --git a/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f28987319d28dda28f017361cb570de64acdd8e7 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b5c7ce196819648ac88b46fb4116ef1b30c4decde283fcdaccbefc4ac8f127be +size 27478167 diff --git a/146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e6c64f0b9c5d134d336f7c74c6216acc81beab33 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8dafcc6faa6ca6b9817262594fa2abd4c016df15db009430fdc0f5d3f3c37d2e +size 80413955 diff --git a/146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d32f990a4138a004a186af5872e3e438e3a1d40a --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b3ec8f09f15323e25ea2845f0ca6f465e0a0935b79548cd4a30d8f6eae3fdb6 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ea1f1400961255732daeedde42467a45d5263578 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ee46a04b774fe2dc677adb677dc348d9a0c2365c00ed1d5b3df83707dabc44f +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..21a35a1355f87362ee8d72061f55ca6f3b130145 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0ac6f5dcac409343d17cbe982b5e3e32dc2dbe31931a937fed5a858b934c4f0 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f82539f539e2d11e0123318becf2e54054cc7840 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f8c7c536d6eaeeeec0efc95b1f8ed224d92e75408d0d7646a38884530efab790 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1d9c6dfc517ae0af81d33913e1870cfc842347c6 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:74f29e17000cbc7505b511e7a25d450eeea6dbaf69287dd69f18a017d0d8b252 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..38292b79221bcb8aee482dc259bef80624eecd0a --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0008d7bcba8621af98593c2fdf0de7239c2008a0d2d7c901b701083e214dd8d3 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e910e3ce133eded6f957289ede9fc53dbcbebb46 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33e23733afff7d4d4c86e3c7cdacfd22c40f50d80b8870328275faffa9bb040a +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..af97226a2dc08cc574e96c2b536982a8cb00eebd --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6db1711f00c245cd7cd62197d08a83f7abb0865cbc04a2ca556a50754efb8d7b +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b66fe95d381edd2613fdd68f9cb058ac88efd8f --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d764defe6f2fbd6433d8e2ef8b5a2f4a4a07466db894b021b204f12913f760c2 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..144ea99eb9eabaee8e3d996e21a64f061fe9bd6b --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ddfbf221f2852aec51b59a0a1b5afcac016c2fd67b620ee2be1efc06b981d32 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..04867fa12ca533ca5a47cd6b515ac551d3098ede --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:78fb84c54b459f8b24c8dbc48a6a387ada6988d412f4952edc44985d6acbf4c1 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..34e0d1316c70b256a4596f716c03218588265181 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1e64f53f80d9fda84cc9ace583ca0c373e86d20f73df3f7bed08cb83040dc0fe +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f2e782ee45f7b06d47875b0eb044e6f568e54ed --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fc77d22db0974a246970c8e47bef7cb8b6e02c648aa6e07b8794744c813b7cb4 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..edb40726a4d02282c128fc5c90747fec161c2b30 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b8372cbc2902d51570d5f0089652f649e6eb85f03251e597a897e4311f13aea +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3a68af677557846d15ca0184b125b6425342754 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6e8dba27b273a7ab7b214f3c35b19a7d4dd09070b3b1ded26a49813ce697c058 +size 14180099 diff --git a/146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt b/146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bd286b39df95552f07f1a3f704fc774d77ec9fad --- /dev/null +++ b/146m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba18f7b13945a80f422013cb7f213f716468fb6a642e70180bf91392396a73b0 +size 4291 diff --git a/146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt b/146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dea3eb18762e92763feded0989eefc1ac6ae3fa4 --- /dev/null +++ b/146m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfeaad39262e78836a90aef7abdbe097affb239652adbd2207417aab7d0f2e84 +size 35443 diff --git a/146m3b9100mdedup/sbatch_146m3b9100mdedup.sh b/146m3b9100mdedup/sbatch_146m3b9100mdedup.sh new file mode 100644 index 0000000000000000000000000000000000000000..af10df5e90ae447f60a5f0468b2229a18df0941f --- /dev/null +++ b/146m3b9100mdedup/sbatch_146m3b9100mdedup.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m3b9100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100mdedup.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=5000 + +# Tokens: 3936562000 +# -> Samples: 1_922_149 +TRAIN_SAMPLES=1_922_149 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 19_221 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m3b9100mdedup/sbatch_146m3b9100mdedupval.sh b/146m3b9100mdedup/sbatch_146m3b9100mdedupval.sh new file mode 100644 index 0000000000000000000000000000000000000000..5728048eb01316e78ecb1b7a1331937afcd1a208 --- /dev/null +++ b/146m3b9100mdedup/sbatch_146m3b9100mdedupval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m3b9100mdedupval +VARIANT_CKPT=146m3b9100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m3b9100mdedup/tensorboard_146m3b9100mdedup/events.out.tfevents.1678999827.nid006541.47271.0 b/146m3b9100mdedup/tensorboard_146m3b9100mdedup/events.out.tfevents.1678999827.nid006541.47271.0 new file mode 100644 index 0000000000000000000000000000000000000000..d2b988bb2d9ae8b69863c0422c4464678bae9bca --- /dev/null +++ b/146m3b9100mdedup/tensorboard_146m3b9100mdedup/events.out.tfevents.1678999827.nid006541.47271.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05b5119c440196582bbae4b0da172791689e438a87a42db5580ec80ef95fb759 +size 13358500 diff --git a/146m3b9100mdedup/tensorboard_146m3b9100mdedupval/events.out.tfevents.1679003305.nid005161.65770.0 b/146m3b9100mdedup/tensorboard_146m3b9100mdedupval/events.out.tfevents.1679003305.nid005161.65770.0 new file mode 100644 index 0000000000000000000000000000000000000000..ac1496b4a5c1241b2bb33cd49a411d47697c4610 --- /dev/null +++ b/146m3b9100mdedup/tensorboard_146m3b9100mdedupval/events.out.tfevents.1679003305.nid005161.65770.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ed57983b2590d64d497787b2116a6a1dd291c469d77c027251340488971cc09 +size 980 diff --git a/146m5b9100mdedup/3326911.err b/146m5b9100mdedup/3326911.err new file mode 100644 index 0000000000000000000000000000000000000000..b9efe9baa317c78f945e7728f0bb5372718899c4 --- /dev/null +++ b/146m5b9100mdedup/3326911.err @@ -0,0 +1,1117 @@ +2: 2023-03-16 23:47:18.795299: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:18.795314: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:18.795290: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: 2023-03-16 23:47:18.795676: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:18.795688: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:18.795700: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: 2023-03-16 23:47:18.795569: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:18.795582: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:18.795564: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: 2023-03-16 23:47:18.795806: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:18.795815: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:18.795826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: 2023-03-16 23:47:18.795753: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:18.795761: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:18.795765: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: 2023-03-16 23:47:18.795703: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:18.795561: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:18.795575: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:18.795292: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:18.795707: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:18.795695: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:18.795772: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:18.795775: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:18.795786: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:18.795612: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:18.795616: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:18.795616: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: 2023-03-16 23:47:18.795377: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:18.795378: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:18.795906: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:18.795924: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:47:18.795926: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: 2023-03-16 23:47:18.795639: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:18.795635: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:18.795635: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:18.795844: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:18.795750: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:18.795782: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:18.795790: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: 2023-03-16 23:47:18.795811: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:18.795819: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:18.795368: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:18.795864: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:18.795882: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:18.795885: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:18.795786: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:18.795607: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:47:18.795459: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:18.795727: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:18.795831: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:18.795834: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:18.795640: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:47:18.795648: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:47:18.795736: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:47:18.795968: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:47:18.795862: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:18.796875: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:18.796875: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:18.796904: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:18.796901: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:18.796917: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:18.796920: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:18.796906: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:47:18.796919: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:47:22.152878: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.152893: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.152884: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.152885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.152885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.152890: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.152886: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.152886: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:22.153305: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.153155: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 23:47:22.153307: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.153310: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.153313: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.153313: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.153315: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.153318: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:47:22.153320: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.153157: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.153162: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.153169: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.153163: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.153176: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.153170: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.153161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:22.153540: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.153542: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.153546: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.153548: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.153550: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.153554: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.153556: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:47:22.153558: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.154172: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.154176: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.154179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.154181: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.154172: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.154327: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 23:47:22.154180: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.154186: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.154179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:47:22.154329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.154619: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.154623: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.154329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 23:47:22.154629: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.154629: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.154631: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:22.154635: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.154639: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:47:22.154643: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.154339: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.154337: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.154339: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.154342: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.154344: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:22.154537: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.154538: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.154537: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.154540: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.154541: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.154543: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.154544: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:47:22.154544: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.154691: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.154705: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.154699: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.154710: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.154705: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.154713: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.154713: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.154716: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:22.155062: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.155069: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.155072: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.155073: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.155075: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.155077: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.155080: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:47:22.155081: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.158952: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.158938: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.158950: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.158955: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.158955: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.158960: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.158957: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.158954: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:22.159329: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.159328: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.159334: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.159341: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.159343: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.159349: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.159355: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:47:22.159356: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.205926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.205918: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.205926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.205928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.205918: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.205932: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.205930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.205927: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:22.206325: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.206327: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.206332: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.206331: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.206332: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.206334: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.206335: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:47:22.206337: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.220463: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.220459: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.220465: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.220466: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.220467: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.220468: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.220470: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.220473: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:22.220840: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.220838: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.220843: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.220844: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.220848: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.220848: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.220848: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:22.220852: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:47:37.236542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.236570: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.236580: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.236613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.236612: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.236627: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.236629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.236641: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.238978: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.238978: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.238982: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.238984: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.238986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.238988: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.238988: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.238994: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.238994: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.239002: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.239012: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.239012: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.239014: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.239015: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:47:37.239039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:47:37.239054: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.253593: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:47:37.253742: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:47:37.253851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:47:37.253859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.253624: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:47:37.253762: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:47:37.253878: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:47:37.253889: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.253875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.253651: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:47:37.253768: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:47:37.253901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:47:37.253890: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.253648: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:47:37.253777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:47:37.253914: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:47:37.253897: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.253677: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:47:37.253786: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.253922: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:47:37.253904: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.253902: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.253681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:47:37.253788: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.253930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:47:37.253911: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.253914: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.253693: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:47:37.253798: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.253940: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:47:37.253915: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.253931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.253699: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:47:37.253804: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.254045: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:47:37.253921: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.253945: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.253959: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.253968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.253971: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254641: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254654: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254667: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254679: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.254741: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 23:47:37.254946: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254956: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254971: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254972: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254998: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.254989: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.255025: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.255031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.255605: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.255607: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.255611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.255622: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.255617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.255625: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.255622: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.255617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.255619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.255619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.255623: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:47:37.255641: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.255643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.255644: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.255645: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:47:37.255646: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256054: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256053: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256053: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256058: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256061: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256060: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:47:37.256076: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256070: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256073: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256072: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256075: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256077: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256086: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:47:37.256087: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256458: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256529: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256529: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256532: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-16 23:47:37.256642: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256531: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256539: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256542: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256547: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256546: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.256645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 23:47:37.256548: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256605: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-16 23:47:37.256646: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256620: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:47:37.256620: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:47:37.256636: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.256811: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.256813: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.256814: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.256815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.256817: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.256819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.256822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.256829: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.256829: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.256830: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.256828: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:47:37.256833: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.256834: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.256835: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.256838: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:47:37.256844: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 23:47:37.257651: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.257691: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257682: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257652: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:47:37.257686: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257658: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:47:37.257689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257661: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:47:37.257690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257661: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257664: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257666: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257664: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:47:37.257694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257675: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257675: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:47:37.257682: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257684: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257684: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 23:47:37.257685: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257687: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:47:37.257687: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.257709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257698: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:47:37.257714: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257716: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257719: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:47:37.257722: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.256648: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.256648: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.256651: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.256658: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.256659: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.256662: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.256654: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.256670: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.256670: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.256670: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.256673: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:47:37.256682: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:47:37.256696: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +6: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +2: Building extension module utils... +2: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +2: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +2: Loading extension module utils... +0: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +0: Loading extension module utils... +2: Loading extension module utils... +0: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +4: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: +2: Loading extension module utils...Loading extension module utils...Loading extension module utils... +2: +2: +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +2: +2: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +0: +0: +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils...Loading extension module utils... +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +0: Loading extension module utils... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +3: +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m5b9100mdedup/3326911.out b/146m5b9100mdedup/3326911.out new file mode 100644 index 0000000000000000000000000000000000000000..93388ca16818dfb33c395027d0704a37d4ffc2d8 --- /dev/null +++ b/146m5b9100mdedup/3326911.out @@ -0,0 +1,5653 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m5b9100mdedupval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m5b9100mdedupval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m5b9100mdedup --load checkpoints_146m5b9100mdedup --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3326911.json --zero-stage 0 +START 3326911: Thu 16 Mar 2023 11:46:28 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 58.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 44.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 49.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 45.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 48.0c 78.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 44.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 47.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 47.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 47.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 50.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 53.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 46.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 49.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 55.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 41.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 53.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 40.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 41.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 40.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 48.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 48.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 54.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 54.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 47.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 49.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 47.0c 79.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 42.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +7: Launching on nid006716 (7/8), master nid006709 port 9999, GPUs 8, CUDA: True +2: Launching on nid006711 (2/8), master nid006709 port 9999, GPUs 8, CUDA: True +1: Launching on nid006710 (1/8), master nid006709 port 9999, GPUs 8, CUDA: True +3: Launching on nid006712 (3/8), master nid006709 port 9999, GPUs 8, CUDA: True +0: Launching on nid006709 (0/8), master nid006709 port 9999, GPUs 8, CUDA: True +5: Launching on nid006714 (5/8), master nid006709 port 9999, GPUs 8, CUDA: True +4: Launching on nid006713 (4/8), master nid006709 port 9999, GPUs 8, CUDA: True +6: Launching on nid006715 (6/8), master nid006709 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3326911.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m5b9100mdedupval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m5b9100mdedup +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m5b9100mdedup +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m5b9100mdedupval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +7: > setting tensorboard ... +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 23:48:25,092] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.095 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 102 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: [1/1] c++ layer_norm_hip_kernel.cuda.o layer_norm_cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so +0: >>> done with compiling and loading fused kernels. Compilation time: 28.357 seconds +0: time to initialize megatron (seconds): 72.264 +0: [after megatron is initialized] datetime: 2023-03-16 23:48:56 +0: building GPT model ... +0: [2023-03-16 23:48:56,401] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 23:48:56,402] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 23:48:56,402] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-16 23:48:58,406] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 23:48:58,622] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 23:48:58,623] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-16 23:48:58,623] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.45 GB, percent = 6.2% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 23:48:58,624] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 23:49:11,595] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 23:49:11,596] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 23:49:11,596] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 23:49:11,605] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 23:49:11,605] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 23:49:11,723] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 23:49:11,723] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-16 23:49:11,724] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.13 GB, percent = 6.4% +2: ninja: no work to do. +2: Time to load utils op: 0.27425217628479004 seconds +0: Time to load utils op: 0.21033740043640137 seconds +0: Time to load utils op: 0.20276451110839844 seconds +0: Time to load utils op: 0.20214056968688965 seconds +0: Time to load utils op: 0.2019209861755371 seconds +2: Time to load utils op: 0.20235586166381836 seconds +2: Time to load utils op: 0.20301413536071777 seconds +2: Time to load utils op: 0.20223355293273926 seconds +2: Time to load utils op: 0.20340585708618164 seconds +2: Time to load utils op: 0.20286011695861816 seconds +2: Time to load utils op: 0.20312213897705078 seconds +2: Time to load utils op: 0.202789306640625 seconds +0: Time to load utils op: 0.2019660472869873 seconds +0: Time to load utils op: 0.20204472541809082 seconds +0: Time to load utils op: 0.20194339752197266 seconds +0: Time to load utils op: 0.20217156410217285 seconds +3: Time to load utils op: 0.21116232872009277 secondsTime to load utils op: 0.21146750450134277 secondsTime to load utils op: 0.21144771575927734 seconds +3: +3: +3: Time to load utils op: 0.2116858959197998 seconds +3: Time to load utils op: 0.21111512184143066 secondsTime to load utils op: 0.21172213554382324 seconds +3: +3: Time to load utils op: 0.21211528778076172 seconds +3: Time to load utils op: 0.21206140518188477 seconds +1: Time to load utils op: 0.21025943756103516 seconds +1: Time to load utils op: 0.2102642059326172 seconds +1: Time to load utils op: 0.21028685569763184 seconds +1: Time to load utils op: 0.2102982997894287 seconds +1: Time to load utils op: 0.21030163764953613 secondsTime to load utils op: 0.21030712127685547 seconds +1: +1: Time to load utils op: 0.21032476425170898 seconds +1: Time to load utils op: 0.21031785011291504 seconds +6: Time to load utils op: 0.2112264633178711 seconds +6: Time to load utils op: 0.21123552322387695 seconds +6: Time to load utils op: 0.21091961860656738 seconds +6: Time to load utils op: 0.2110593318939209 seconds +6: Time to load utils op: 0.2115011215209961 secondsTime to load utils op: 0.21097373962402344 secondsTime to load utils op: 0.21097493171691895 seconds +6: +6: +6: Time to load utils op: 0.21097826957702637 seconds +5: Time to load utils op: 0.21187567710876465 secondsTime to load utils op: 0.21187591552734375 seconds +5: +5: Time to load utils op: 0.21187973022460938 seconds +5: Time to load utils op: 0.21190452575683594 seconds +5: Time to load utils op: 0.21187305450439453 seconds +5: Time to load utils op: 0.21191167831420898 secondsTime to load utils op: 0.2119159698486328 seconds +5: +5: Time to load utils op: 0.21192574501037598 seconds +7: Time to load utils op: 0.21489810943603516 seconds +7: Time to load utils op: 0.2137007713317871 seconds +7: Time to load utils op: 0.2151038646697998 seconds +7: Time to load utils op: 0.21495270729064941 seconds +7: Time to load utils op: 0.21549391746520996 secondsTime to load utils op: 0.21523070335388184 seconds +7: Time to load utils op: 0.21531891822814941 seconds +7: +7: Time to load utils op: 0.21340656280517578 seconds +2: Time to load utils op: 0.0004780292510986328 secondsTime to load utils op: 0.0004913806915283203 secondsTime to load utils op: 0.00040078163146972656 seconds +2: +2: +2: Time to load utils op: 0.0005161762237548828 seconds +2: Time to load utils op: 0.0004963874816894531 seconds +2: Time to load utils op: 0.0005180835723876953 secondsTime to load utils op: 0.0005571842193603516 seconds +2: +2: Time to load utils op: 0.0004935264587402344 seconds +4: Time to load utils op: 0.23089385032653809 seconds +4: Time to load utils op: 0.23091936111450195 seconds +4: Time to load utils op: 0.23094820976257324 seconds +4: Time to load utils op: 0.2309577465057373 seconds +4: Time to load utils op: 0.23094463348388672 seconds +4: Time to load utils op: 0.23096323013305664 seconds +4: Time to load utils op: 0.23097634315490723 secondsTime to load utils op: 0.23097443580627441 seconds +4: +0: Time to load utils op: 0.0005669593811035156 secondsTime to load utils op: 0.0006203651428222656 secondsTime to load utils op: 0.0005958080291748047 seconds +0: +0: +0: Time to load utils op: 0.0004494190216064453 secondsTime to load utils op: 0.0004923343658447266 seconds +0: +0: Time to load utils op: 0.0005676746368408203 seconds +0: Time to load utils op: 0.0006761550903320312 seconds +3: Time to load utils op: 0.0008296966552734375 seconds +3: Time to load utils op: 0.0008957386016845703 seconds +3: Time to load utils op: 0.0005941390991210938 seconds +3: Time to load utils op: 0.0009591579437255859 seconds +3: Time to load utils op: 0.0010440349578857422 seconds +3: Time to load utils op: 0.0010347366333007812 seconds +3: Time to load utils op: 0.0010256767272949219 seconds +3: Time to load utils op: 0.0009696483612060547 seconds +6: Time to load utils op: 0.0008084774017333984 seconds +6: Time to load utils op: 0.000985860824584961 seconds +6: Time to load utils op: 0.0013065338134765625 seconds +6: Time to load utils op: 0.0011932849884033203 seconds +6: Time to load utils op: 0.0011990070343017578 seconds +6: Time to load utils op: 0.0011954307556152344 seconds +6: Time to load utils op: 0.0011844635009765625 seconds +6: Time to load utils op: 0.0012390613555908203 seconds +7: Time to load utils op: 0.0005805492401123047 seconds +7: Time to load utils op: 0.0005557537078857422 seconds +7: Time to load utils op: 0.0005617141723632812 seconds +7: Time to load utils op: 0.0005917549133300781 seconds +7: Time to load utils op: 0.00041413307189941406 seconds +7: Time to load utils op: 0.000431060791015625 seconds +7: Time to load utils op: 0.00044655799865722656 seconds +7: Time to load utils op: 0.00042366981506347656 seconds +5: Time to load utils op: 0.0010521411895751953 seconds +5: Time to load utils op: 0.0013453960418701172 secondsTime to load utils op: 0.0013079643249511719 seconds +5: +5: Time to load utils op: 0.0013267993927001953 seconds +5: Time to load utils op: 0.0013875961303710938 seconds +5: Time to load utils op: 0.0012853145599365234 seconds +5: Time to load utils op: 0.0013384819030761719 seconds +5: Time to load utils op: 0.0013663768768310547 seconds +4: Time to load utils op: 0.0010190010070800781 seconds +4: Time to load utils op: 0.0010085105895996094 seconds +4: Time to load utils op: 0.0011506080627441406 seconds +4: Time to load utils op: 0.001195669174194336 seconds +4: Time to load utils op: 0.0012638568878173828 secondsTime to load utils op: 0.001230001449584961 seconds +4: +4: Time to load utils op: 0.0012786388397216797 seconds +4: Time to load utils op: 0.0012392997741699219 seconds +1: Time to load utils op: 0.0008950233459472656 seconds +1: Time to load utils op: 0.0010151863098144531 secondsTime to load utils op: 0.0010273456573486328 seconds +1: +1: Time to load utils op: 0.00113677978515625 seconds +1: Time to load utils op: 0.0011229515075683594 seconds +1: Time to load utils op: 0.001093149185180664 seconds +1: Time to load utils op: 0.0009753704071044922 seconds +1: Time to load utils op: 0.0011737346649169922 seconds +0: [2023-03-16 23:49:12,055] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 23:49:12,056] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-16 23:49:12,056] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.28 GB, percent = 6.4% +0: [2023-03-16 23:49:12,178] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 23:49:12,178] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-16 23:49:12,179] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.28 GB, percent = 6.4% +0: [2023-03-16 23:49:12,281] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 23:49:12,282] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-16 23:49:12,282] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.28 GB, percent = 6.4% +0: [2023-03-16 23:49:12,385] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 23:49:12,386] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:12,386] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.28 GB, percent = 6.4% +0: [2023-03-16 23:49:12,487] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 23:49:12,488] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:12,488] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.28 GB, percent = 6.4% +0: [2023-03-16 23:49:12,592] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 23:49:12,592] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:12,592] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.28 GB, percent = 6.4% +0: [2023-03-16 23:49:12,693] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 23:49:12,694] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:12,694] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.28 GB, percent = 6.4% +0: [2023-03-16 23:49:12,801] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 23:49:12,802] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:12,802] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.28 GB, percent = 6.4% +0: [2023-03-16 23:49:12,904] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 23:49:12,905] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-16 23:49:12,905] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.28 GB, percent = 6.4% +0: [2023-03-16 23:49:12,905] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 23:49:12,905] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 23:49:12,905] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 23:49:12,905] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 23:49:12,906] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-16 23:49:12,907] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 23:49:12,908] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 23:49:12,908] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 23:49:12,908] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 23:49:12,908] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0004241466522216797 seconds +0: [2023-03-16 23:49:12,908] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-16 23:49:12,985] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +5: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:12,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:12,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-16 23:49:13,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:49:13,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:49:13,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:49:13,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:13,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:49:13,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:13,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:13,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:13,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:49:13,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:49:13,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:49:13,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:49:13,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:49:13,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:49:13,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:49:13,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:49:13,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:49:13,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:49:13,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:49:13,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:49:13,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:49:13,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:49:13,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:49:13,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:49:13,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:49:13,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:49:13,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:49:13,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:13,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:13,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:49:13,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:13,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:13,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:13,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:49:13,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:49:13,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:49:13,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:49:13,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:49:13,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:49:13,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:49:13,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:49:13,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:49:13,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:49:13,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:49:13,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:49:13,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:49:13,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:49:13,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:13,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:49:13,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:49:13,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:49:13,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:13,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:49:13,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:49:13,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:49:13,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:49:13,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:49:13,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:49:13,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:49:13,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:49:13,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:49:13,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:49:13,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:49:13,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:49:13,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:49:13,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:49:13,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:49:13,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:49:13,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:49:13,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:49:13,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:49:13,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:49:13,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:49:13,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:49:13,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:49:13,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:49:13,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:49:13,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:49:13,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:49:13,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:49:13,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:49:13,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:49:13,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:49:13,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:49:13,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:49:13,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:49:13,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:49:13,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:49:13,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:49:13,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:49:13,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:49:13,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:49:13,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:49:13,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:49:13,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:49:13,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:49:13,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:49:13,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:49:13,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:49:13,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:49:13,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:49:13,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:49:13,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:49:13,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:49:13,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:13,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:13,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:49:13,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:49:13,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:49:13,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:49:13,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:49:13,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:49:13,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:49:13,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:49:13,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:49:13,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:49:13,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:49:13,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:49:13,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:49:13,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:49:13,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:13,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:49:13,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:13,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:49:13,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:13,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:13,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:13,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:13,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:13,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:13,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:13,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:13,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:13,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:49:13,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:49:13,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:49:13,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +6: [2023-03-16 23:49:13,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:49:13,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:49:13,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:49:13,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:49:13,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:49:13,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:49:13,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:13,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:13,754] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +3: [2023-03-16 23:49:13,754] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +3: [2023-03-16 23:49:13,755] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +7: [2023-03-16 23:49:13,756] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +2: [2023-03-16 23:49:13,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:13,758] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +2: [2023-03-16 23:49:13,760] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +5: [2023-03-16 23:49:13,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:13,761] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +2: [2023-03-16 23:49:13,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:13,763] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +5: [2023-03-16 23:49:13,763] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +0: [2023-03-16 23:49:13,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:13,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:13,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:13,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:13,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:13,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:13,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:49:13,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:49:13,765] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,769] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:49:13,771] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +1: [2023-03-16 23:49:13,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:13,775] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +6: [2023-03-16 23:49:13,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:49:13,777] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +6: [2023-03-16 23:49:13,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:49:13,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:49:13,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:13,781] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +5: [2023-03-16 23:49:13,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:13,782] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +3: [2023-03-16 23:49:13,783] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +5: [2023-03-16 23:49:13,783] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:49:13,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,785] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:49:13,785] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:49:13,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:49:13,786] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +1: [2023-03-16 23:49:13,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:13,787] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +5: [2023-03-16 23:49:13,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:13,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:13,787] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +5: [2023-03-16 23:49:13,787] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +3: [2023-03-16 23:49:13,787] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +5: [2023-03-16 23:49:13,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:13,789] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +1: [2023-03-16 23:49:13,789] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +3: [2023-03-16 23:49:13,789] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +5: [2023-03-16 23:49:13,789] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +4: [2023-03-16 23:49:13,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:13,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:13,790] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +4: [2023-03-16 23:49:13,790] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +5: [2023-03-16 23:49:13,791] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +4: [2023-03-16 23:49:13,792] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +4: [2023-03-16 23:49:13,792] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +4: [2023-03-16 23:49:13,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:13,793] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +3: [2023-03-16 23:49:13,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:13,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:13,793] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +4: [2023-03-16 23:49:13,794] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +4: [2023-03-16 23:49:13,795] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +5: [2023-03-16 23:49:13,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:13,795] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +3: [2023-03-16 23:49:13,795] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +4: [2023-03-16 23:49:13,795] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +1: [2023-03-16 23:49:13,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:13,795] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +5: [2023-03-16 23:49:13,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:13,796] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +4: [2023-03-16 23:49:13,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:13,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:13,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:13,796] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +7: [2023-03-16 23:49:13,796] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +4: [2023-03-16 23:49:13,796] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +5: [2023-03-16 23:49:13,796] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +1: [2023-03-16 23:49:13,797] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +5: [2023-03-16 23:49:13,797] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +4: [2023-03-16 23:49:13,798] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +7: [2023-03-16 23:49:13,798] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +7: [2023-03-16 23:49:13,798] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +7: [2023-03-16 23:49:13,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:13,798] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +7: [2023-03-16 23:49:13,800] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +7: [2023-03-16 23:49:13,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:13,800] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +7: [2023-03-16 23:49:13,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:13,801] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +1: [2023-03-16 23:49:13,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:13,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:13,801] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +1: [2023-03-16 23:49:13,801] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +7: [2023-03-16 23:49:13,802] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +7: [2023-03-16 23:49:13,803] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +1: [2023-03-16 23:49:13,804] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +6: [2023-03-16 23:49:13,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:13,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:13,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:13,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:13,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:13,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:13,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:49:13,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:49:13,804] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +2: [2023-03-16 23:49:13,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:13,804] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +1: [2023-03-16 23:49:13,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:13,806] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +2: [2023-03-16 23:49:13,806] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +7: [2023-03-16 23:49:13,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:13,806] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +1: [2023-03-16 23:49:13,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:49:13,807] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +1: [2023-03-16 23:49:13,808] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +7: [2023-03-16 23:49:13,808] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +1: [2023-03-16 23:49:13,809] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +3: [2023-03-16 23:49:13,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:13,811] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +3: [2023-03-16 23:49:13,813] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +3: [2023-03-16 23:49:13,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:13,819] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +3: [2023-03-16 23:49:13,821] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +2: [2023-03-16 23:49:13,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:13,822] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +7: [2023-03-16 23:49:13,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:49:13,822] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +2: [2023-03-16 23:49:13,824] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +7: [2023-03-16 23:49:13,824] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +0: [2023-03-16 23:49:13,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:13,824] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +0: [2023-03-16 23:49:13,826] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +4: [2023-03-16 23:49:13,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:13,828] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +4: [2023-03-16 23:49:13,830] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +5: [2023-03-16 23:49:13,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:13,832] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +4: [2023-03-16 23:49:13,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:13,833] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +0: [2023-03-16 23:49:13,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:13,834] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +5: [2023-03-16 23:49:13,834] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +0: [2023-03-16 23:49:13,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:13,834] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +0: [2023-03-16 23:49:13,834] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +2: [2023-03-16 23:49:13,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:13,835] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +0: [2023-03-16 23:49:13,835] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +2: [2023-03-16 23:49:13,836] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +0: [2023-03-16 23:49:13,836] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +5: [2023-03-16 23:49:13,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:49:13,838] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +5: [2023-03-16 23:49:13,839] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +0: [2023-03-16 23:49:13,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:13,843] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +0: [2023-03-16 23:49:13,845] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +4: [2023-03-16 23:49:13,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:49:13,848] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +0: [2023-03-16 23:49:13,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:13,848] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +4: [2023-03-16 23:49:13,850] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +0: [2023-03-16 23:49:13,850] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +6: [2023-03-16 23:49:13,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,850] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +0: [2023-03-16 23:49:13,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:13,851] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +6: [2023-03-16 23:49:13,852] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +0: [2023-03-16 23:49:13,852] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +3: [2023-03-16 23:49:13,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:49:13,856] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +3: [2023-03-16 23:49:13,858] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +6: [2023-03-16 23:49:13,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,870] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +6: [2023-03-16 23:49:13,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,871] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +6: [2023-03-16 23:49:13,872] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +0: [2023-03-16 23:49:13,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:13,873] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +6: [2023-03-16 23:49:13,873] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +0: [2023-03-16 23:49:13,874] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +6: [2023-03-16 23:49:13,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,880] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +6: [2023-03-16 23:49:13,881] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +6: [2023-03-16 23:49:13,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,883] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +6: [2023-03-16 23:49:13,884] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +6: [2023-03-16 23:49:13,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,885] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +0: [2023-03-16 23:49:13,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:49:13,886] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +6: [2023-03-16 23:49:13,887] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +0: [2023-03-16 23:49:13,888] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +6: [2023-03-16 23:49:13,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,925] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +6: [2023-03-16 23:49:13,927] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +6: [2023-03-16 23:49:13,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:49:13,936] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +6: [2023-03-16 23:49:13,938] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +2: [2023-03-16 23:49:14,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:14,008] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +2: [2023-03-16 23:49:14,009] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +2: [2023-03-16 23:49:14,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:49:14,098] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +2: [2023-03-16 23:49:14,099] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +0: successfully loaded checkpoint from checkpoints_146m5b9100mdedup at iteration 0 +7: time (ms) | load-checkpoint: 1118.69 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 23:49:14 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.011613 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.099 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.046235 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.073 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-16 23:49:26 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 18152.43 | train/valid/test-data-iterators-setup: 10895.17 +0: [after training is done] datetime: 2023-03-16 23:49:26 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.860607E+00 | lm loss PPL: 4.749416E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3326911: Thu 16 Mar 2023 11:49:49 PM EET diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..763186dd5cacfd522c62d56f93fad68c7516e918 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3bdc67561835519a9e1e7ac733cca8b4ce395cb89bea04cc265ee62ef16c1a7a +size 27478295 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b177a49f2eed1ee98dd290b2acc8d8f096fb337a --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4cb0cd1e936a692e998feaa675cc968dcbdb7032feeebe618bfe78d6ada8335f +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d8ffe405de4749011e345f66ca30542b9fdc46a --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2f26bc665ae8e7fe1b927d3cc3bd0afdf809c2ae2f79d6d30fed301700fa18cb +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80c0a3c69aa18a25654f0d0a4e2596614d42e591 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82da749c972378565e6eeab3ef6d2be9684206d3936eafd905e210320bf81535 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..17bcbc7747177121ee668e89d016846b281d3690 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d90d974a2906b1b17f2577518ed7793537dbc8aabf3ee614b61603950139f2b +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..747549f98917599902cbec423f438227a31cec8c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4debc6945e9067455ed6792d0fab7d6052424cf5fe476606991225b83102bd65 +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dc293bf61fc3799748fbc86ffc2a62c4c50c9de8 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a280dc980bb97d951b85bd0b356abeac9105d641ff44ae3922fba297aef6c4b7 +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8b50ac88416946a0e31d9260cb216f874138946f --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00e8d0a0ee3c9903ffc5bf332eb4e5adf8a0a7cd7e4ee79a5bf99f0a8e4ad162 +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7881bd0ff882aef4905ec4c92f66b0204e90a41a --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48f5d1d8903459b71b21d456d56896d0932b2a0bfc10ecbaf09c2410dec5dff8 +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..185b9537638686b0054b6b4cc08a10e532ab42e2 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3125d137f7ffdc2fe2746336f3f049767436fefaf071329629e91c389f684e87 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bf41dc28c3115849fa55ffc112e85211a1b2c5cb --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:46f024ae8dab861749ea88d67250348a5e110e9ce29f6756a3182fae1b90f061 +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fa59c237fd145ffb4b5064cdb199b82b854dced6 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:babed38cbbea446e258532b40aff64dcb9439fcfdc29a69eaaf44da6cc96c739 +size 27478231 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b334f0f2391645268ed909a05fe27eace24ec96 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e407d90e586441241df404bacc2dfa9319c762cf7eba178b2026c9573ac0eee7 +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..05a1fc1ae75e751322437715dfcafb790a014fb2 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3fe6a12a58b91913a06dba13425e993b54e196a48705edfbef73e1f5ec31f3c9 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3e72b04921fdfd4c6691e675d1a59f07c39ef163 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:89f29fe478535641992c607a77b3f3b7a8e38659009f7eac167ed8f5f6e0bcf0 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7025eb21b1d553b83d0d9f82eac4fc14c3f2631c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63db64acfc977dc8363344a214acd9e46e1dbf97b89ce680d4813289c22a91ee +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fc6e191f0af0dd20e3f87ab173eecc58f4305319 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6f1fcef7467d9ed3f584cb79e87e629975a21e231d612113de610f1e0ba9d0ba +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b7198047a16227d87a86e78681ef9bd51192acd1 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:71e5caee48cd81b33321889228640c4c82fcfd590a5eba495160da9cfaac3388 +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1eccd09e2fb81d8b13e799fbb8901e6f79699362 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:754030f663ad4ba6f1ae97ea2ccf6de0abbd2c4266dc1bf73cb30c5b0ea0af64 +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6207427ec86541a45d33031718f5b9b333a6bfba --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ffe1fee966b9433ca29d95f8b71fddb872abd0137a7a292b1c5d2049b8ea35b3 +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dcc8ad65f7287b453321f29fa61585655c0f7fc1 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:31031ceb4e658d3411eee0871823ed63a695f63b9b62305d36e8396580d4fe35 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb67aae0af3f879dd16acf823d271280e7379e0c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00cc09236d3158bfc980a7046abb27ae65dc5ba0b6c5f5d338c07fe0b7e6e001 +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..311f4213df14e2797088c553b153184d5406d99f --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:136cae610ad24a2a4a23fb3d3d93250deeee1170e355819d496dbf062f1d21c0 +size 27478231 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..30dc0ec183c059cbc25886596f75f780184d1025 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:baca6803fe49936352c3a989cc40006b0d66cd7419a8f9a02a512f78b0db9951 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e6c891e86f339bd353604a66e27da064b542e816 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e57b91b00eaef9591d07db3bd27d27ab8280bf5ef6ffe347b56d8496b035295 +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b14726ef6f0b01bbfa43e8a5ed841c8275342dcc --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f2da83db79d7cfe91c06b2392c870605244964802d60677bc1f4ebf9c5c87cc4 +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a3e5c46e52145b3facb242843d71cb655f27c04c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1378f160c4ef60d92933b2993d2e4660f8a9b8e80190370e308fa63c05863aae +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b85e17bff93c4cd601fe770bdb01149db577a9c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:38ad5a19455d43f4e1be6675f2ebdeee647c7ce0a9c626253451168cb662a84c +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6aaf13f1685a458099c752044e90971a31893b9c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b7344d62404a765017074bd1cf9313f53e7c63010d45e8ad6beb5e3b6080636 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8be51f43aec40be46aef92b02244e70ec5736c04 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48957c84e68b1b6f24e1e8e62c9b585cf8b7b7412edbdf2b1c6184d6f3f4831b +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..516bd18a138fd25dd134850be48103c105cc442d --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:633bd8a7e60958734d39f8107c08303a04261ab0167dcd0acd4c102be741dd31 +size 27478114 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb1c2f61c254db7d553ddd46ab2f0d00fd1dd578 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e30ab986313fec95762e677acb38856d0ed7ba22de97ec7898baa607380188b +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cd2c8c6e2a940c25cba25375da96e529f98c5271 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:860c994d0da972b8c147c9936981abf069564947d3d081c21d86130eb68efbb1 +size 27478434 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..53b5ca9d0829cf2b28cd7d0fa1fa91ebafc61117 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b9cdec7698c5b82d0bff63fc891957df843d766d50a92f3c418d145b679f358a +size 27478167 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d0e394a2e5c54795b7cc8bf25ed93008eecc3427 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f8086edab10a601d776102abbb64f3e041cea7e3e05c4cef2a58ca1951190773 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b17c55ae39fc2cc07f4560fb536cd31b41834592 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b4cb48cd905790e9d8808ee40d904d710a9d162b3df64fc51e51e633dd7f37fa +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e1b87f47c472e8d9b8c69b84dd8b82e78c4a0e25 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8eaf1ff9de49145952210511a7e7eda3ce3b81b2e1f8efccdc2cb4b2cb579547 +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a07696dee0aaae5fd47d8a342314dcec524bd9d --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:93782d07291d3db95ec13646310c5ff9f32ed7ad7c2755533eac345307e38f62 +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..457c5dc8cf4b23554bd2bf818dd489cc3d4f300f --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aed13d1975f3d9bb4099608420d2e67ace7c5b85a4e5a16d419218fbe1275551 +size 27478434 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..75cddf3e254af017e056b4e1f61a419ee48438b9 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:71d20e6fdcfc6c61f978ddfc4164c83532253925b01bf97a2e173fa7e0cbb1d2 +size 27478050 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6bdecd176ba90d5367fc00292a7df58619b4a309 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19b0de62031576463bc72a2b0e5fe957f554ad32eb2cde66f5cff41fd3f48c33 +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ec876b1220f928f8b594801bcd8a2e6a7e2f9bf8 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:539309d7c1847ea30d998fe22bb2b57263575dec61be9d717e789512c5759a2c +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c2d331e6b911f568fc43a54e8083a64f65c00d0c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ec089f73088b1f28ab76c596c26b0304c1acbbe7872991c1c1390eae6d6b395 +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80f05238523fed13909e6966331a908f90a425bb --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2db43cd2d6a1f84a969b537570dfba29cea0ee087526f963fa6f3f8302cac043 +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a444688b91f72e05cfe5994b7665eb5356cf13b8 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:325307a674763dbc592590988ff59d45c93c08622b6dfb5973d5ef6f767786c5 +size 27478231 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..105a41c1a34c6da8196b883ddc3becf4b5c44ac7 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a6d18bc9a061118c163ac58e584eb72ec65434ea21b1bd6b5b2ab607de061646 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d212f7b926956199b559828ea2808ca21b0e9499 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:41e451295d4d9355f74a11753ea685c45084ae1d9760848b2ad40ca4ea9785ff +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b7794873425ef7ca4721470cfec76771aa3786c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fc163f2a63a514fdd2ae2a1b033bef2d1977f0049f85c38a270a2c4814be745c +size 27478434 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7479485ffb8cdaa202eb921cce766d7907cedd0c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8774c87f56e2ea4528e127199f8d257ccadba3e787c99c35b77adcae41be127e +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..47706fe4cab96e78c416bf24a52a80d10e8980f6 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd8cf41a69d7f7c7b14dc8c9e6a67e5215b8cc9b3c7bd5a5e134524641926b52 +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8ac212d39a6a5e0d17bafc1780c7273e1f1d5026 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34fc83befb0843b82280d04ad1912e926bfde045362bd7d11609ab113beb7a5c +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5405b0783724832448bf8bef84e2bb955f855893 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a198f776e983a5c04f52ec29dfc7047c5526bd1854243ef4e7b193cb634851ff +size 27478306 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9e5895db55de46b8e886618f221c7503501d330e --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb8fe9e4ab876456c5ecf0f4e8fd15e7a2c6b1d443f1c31e150f42fcb5ed14cd +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fa61a8b0a8f915929788e8d6b3fcfb9e65ec8bc3 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0986a5fc977b82dbb459c4b0f8a33485bbb570e3ced89934ff9ec5d912af06fd +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4d6835f6227e97a3e72f53312421f4a671c31efc --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97f505f97137e5bc42029f0c168bbba9a95be333af63641fc26d2cf8bc06ef0c +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fd02b5b901bcfeff87f42e7e60261475c64fc96d --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b092161bf2ad6de710c15ad9130a662122bf15686db620958b780e3a6a919d71 +size 27478167 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..59377e157a8835b82f1891b4a3124956c94709d6 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:320f7fb39e7bfeb04b214f99edb8a7051e7d33f4b0cf2da70ad19b5f03e17da1 +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bdfafd57ab464efee1a20e060999f87f11794643 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abd653de580bca339aa5dcd650c4d7a730fd3105da07f40d6a86f8a30aaafa43 +size 27478370 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a7a761d715d97299bcb32a6123e223a49271f15d --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:10b9f8493c99b14f039a3bc5323648b963654dbb5779ebaffff0a0d45d17adc2 +size 27478178 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0427329955e5ad5f1e89f073cc816ddb363cd2bb --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d2c61e3586d8c448ba11d2a8f2048ffc9a5c7b42832c2c24caa6dd14b7468108 +size 27478242 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..895abb2b595f41226c8445dfe95dd928c3541bfb --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a98ab64027c3680f5cdef3c3843002f3f436624e043d0e98d82ca124e443bd61 +size 27478359 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bea37aee6504d41a82617d2b8a07fd7deb78f32c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af3c19f045e34a732ffa30da23abe3e67d4e87799085575435ce4a6bfe297af9 +size 27478103 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1effe5fb0f0d3052e712e5e89265f3711b10b8f3 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:375c06981e67c13c2c6d9da432dde9d5039929a989bb3c5683a85b293acdb036 +size 27478359 diff --git a/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e487fbd27b9878596f4c0109dc5a3eb87fe7468 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a27c63ad956a6d4b6a6e9d72496fdfa592f685e552ef88626a70522d6f5fd120 +size 27478167 diff --git a/146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d8e52e8d1b8aa90bb398ae5f90b39433d3e85c57 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c07aa81669294c538fc56426fc3754cfe06879bce242f5520773107378d1d8bc +size 80413955 diff --git a/146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6d7b93be54fce75f618b1ca825b4efba6511366d --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b449c6e246acae779b795d20c59574e9bb79fa6b5474149b5c289b4ca8e366e +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5670dcdb49fedc593b2e841b683369fd8793f41a --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e68f31473bab2e50dd612cf250531a67d756fb5e97f5082e9e2988c4af3ca7da +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4cf83a1d72af0f1f0ca8c96a5b0bc8a62c9ff395 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b907f98e362aec545fcfaac774c43ea5eb30661d3de02f95147cd21a08c1ff4f +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..74a53b5f56f56e566ba5a50e61566f527b6241dc --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e8fb4fc15312006b038a4a94a1aefefa13a4fad4c2d40dbd1db7ca0a0826dbb6 +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..572375fe2b678d5966d9f5223810bf4e716c7855 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19a6c61375087dc9066a3e644d3678dc02903cad77e6f962d220a9102013eee9 +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4f945d7f194ffbdd78a8f822daf80d59c85918c --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:60ff833e802d6874a53932b5dbf3ef6d9e597dc42cf3e7e64cb860122d61b1d7 +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..513c836e03a4a659c43139ce128e74ee8b87848e --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63653e80e850991a9319bd77ea830b578036c7a8f3f54e3d0d7d8ae25992f8eb +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c1ca2524b9b9f620fc89a02bb5d7c5ecde2b6528 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:daaac689aa6f4ff31296c47b4448f7a730bdf7a7954a046a38863e6a7bab674b +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b3f219dc391978b57a76a508d9898132582426c8 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a6a2582b5a442b2cfb094db5ea47a4cded28144aad511d994140899eb2f4698d +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a777d678bab2cc520527af9443391a5d0783ee84 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:74013aebd56013e7a6a5573bfd9d94c760a9eb032f074b63940884d49f3295a7 +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4fe0419e6c016761038adf897fbbcf8ebf810b4f --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b4508428674c5d59c013825d06c10bcaa38183c1ec239aebe808b05b85a3fa8 +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..930a57fb7fe4112b9d168ec0ea83c6f1fd297833 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e18f791f759ff4776a11cdfa27b06a547e764b3b9d5a9ab88fc58d5f374f4b22 +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..857ec7f3a4ba4388a669822ad785a1096f71ce4f --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2fe9c4b398e5a6e0e7152515ccf55c08598321e292a96768ae87a913467b69c2 +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..056acd43b50c333855b48b00ce2cedd16b61caa9 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:98e1d9581f8cec6db5abf155e089aa3b5063ecf5e2b779acba5ba9a4796e8955 +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d118f857ab97c785c40080faa97df1a282972807 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:239cda064bd2b17c6f0c6a1a7e8162675ae48eb51819b100f679e223fee85ce5 +size 14180099 diff --git a/146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt b/146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..306c7881dd94660d9cdfa98e4f14936e78112a6e --- /dev/null +++ b/146m5b9100mdedup/global_step11269/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bdc1ccf2207b0e185b1e788ab4758038245ceb904bcfc972f9d5bf0049e16669 +size 4291 diff --git a/146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt b/146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..37b79654f90cdca602d8905c4589a8dc93fb5281 --- /dev/null +++ b/146m5b9100mdedup/global_step11269/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:170393fc151e05b7928fdffdc8a1c1f3948ced6be19736e7ec5b0b013c0c7c38 +size 35443 diff --git a/146m5b9100mdedup/sbatch_146m5b9100mdedup.sh b/146m5b9100mdedup/sbatch_146m5b9100mdedup.sh new file mode 100644 index 0000000000000000000000000000000000000000..418005d6500e06ce2265a532977194d00de07a38 --- /dev/null +++ b/146m5b9100mdedup/sbatch_146m5b9100mdedup.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --exclusive=user +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m5b9100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100mdedup.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 5908231000 +# -> Samples: 2884878 +TRAIN_SAMPLES=2_884_878 +#2_884_878 +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 28_849 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m5b9100mdedup/sbatch_146m5b9100mdedupval.sh b/146m5b9100mdedup/sbatch_146m5b9100mdedupval.sh new file mode 100644 index 0000000000000000000000000000000000000000..0a564a4d1a38f9e3120e9c97b4abc551c4d1b2d9 --- /dev/null +++ b/146m5b9100mdedup/sbatch_146m5b9100mdedupval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m5b9100mdedupval +VARIANT_CKPT=146m5b9100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m5b9100mdedup/tensorboard_146m5b9100mdedup/events.out.tfevents.1678999628.nid006716.84128.0 b/146m5b9100mdedup/tensorboard_146m5b9100mdedup/events.out.tfevents.1678999628.nid006716.84128.0 new file mode 100644 index 0000000000000000000000000000000000000000..f067eb2ff88e7513b61fb50153711c9081ef8874 --- /dev/null +++ b/146m5b9100mdedup/tensorboard_146m5b9100mdedup/events.out.tfevents.1678999628.nid006716.84128.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:391c6d4afc559f0c21ea4553ec980cb02a98a088e27144c107a5e20a905498cd +size 20061784 diff --git a/146m5b9100mdedup/tensorboard_146m5b9100mdedupval/events.out.tfevents.1679003305.nid006716.97072.0 b/146m5b9100mdedup/tensorboard_146m5b9100mdedupval/events.out.tfevents.1679003305.nid006716.97072.0 new file mode 100644 index 0000000000000000000000000000000000000000..84e814836c7f9d120d06321aea893dec42d8c7e0 --- /dev/null +++ b/146m5b9100mdedup/tensorboard_146m5b9100mdedupval/events.out.tfevents.1679003305.nid006716.97072.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1801dc5d284343a69188cbde15c6d259b6154d371238939a3f7db6dd043f50fa +size 980 diff --git a/146m60b100m/3328570.err b/146m60b100m/3328570.err new file mode 100644 index 0000000000000000000000000000000000000000..d58f127c16099d354ccea96609feb4f1878aef7d --- /dev/null +++ b/146m60b100m/3328570.err @@ -0,0 +1,1124 @@ +4: 2023-03-17 09:42:42.685936: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:42:42.685933: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:42:42.685951: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: 2023-03-17 09:42:42.685886: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:42:42.685883: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:42:42.685895: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: 2023-03-17 09:42:42.685657: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:42:42.685664: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:42:42.685665: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:42:42.685893: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:42:42.685895: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:42:42.685939: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:42:42.685947: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:42:42.685667: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:42:42.685673: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:42:42.685906: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:42:42.685883: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:42:42.685931: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:42:42.685948: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:42:42.685956: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: 2023-03-17 09:42:42.685679: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:42:42.685902: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:42:42.685682: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:42:42.685689: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:42:42.697784: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:42:42.697787: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:42:42.697798: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: 2023-03-17 09:42:42.697744: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:42:42.697751: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:42:42.697760: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:42:42.697791: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:42:42.697799: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:42:42.697803: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:42:42.697793: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:42:42.697799: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:42:42.697762: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:42:42.697756: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:42:42.697761: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:42:42.697769: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:42:42.698144: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:42:42.698152: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:42:42.698153: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: 2023-03-17 09:42:42.697772: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:42:42.698160: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:42:42.698157: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:42:42.698159: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:42:42.698169: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:42:42.698170: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:42:42.706594: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:42:42.706600: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:42:42.706601: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:42:42.706596: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:42:42.706607: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:42:42.706591: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:42:42.706588: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:42:42.706616: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:42:42.739720: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:42:42.739727: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:42:42.739726: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:42:42.739734: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:42:42.739732: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:42:42.739731: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:42:42.739724: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:42:42.739740: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:42:44.326830: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:44.326831: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:44.326841: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:44.326848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:44.326848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:44.326844: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:44.326850: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:44.326854: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:44.327307: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:42:44.327311: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:42:44.327317: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:42:44.327318: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:42:44.327322: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:42:44.327321: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:42:44.327321: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:42:44.327323: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:42:44.366633: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:44.366626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:44.366627: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:44.366635: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:44.366638: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:44.366640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:44.366642: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:44.366637: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:44.367065: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:42:44.367069: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:42:44.367074: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:42:44.367074: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:42:44.367077: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:42:44.367077: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:42:44.367078: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:42:44.367082: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:42:44.367423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:44.367436: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:44.367433: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:44.367429: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:44.367438: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:44.367441: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:44.367439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:44.367439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:44.367846: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:42:44.367848: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:42:44.367853: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:42:44.367857: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:42:44.367857: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:42:44.367857: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:42:44.367862: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:42:44.367865: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:42:44.384593: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 09:42:44.384594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:44.384595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 09:42:44.384613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:44.384604: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 09:42:44.384605: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:44.384601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 09:42:44.384606: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:44.384602: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 09:42:44.384597: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:44.384609: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 09:42:44.384625: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:44.384607: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 09:42:44.384613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:44.384608: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-17 09:42:44.384622: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:44.385025: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:42:44.385030: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:42:44.385007: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:42:44.385032: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:42:44.385036: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:42:44.385037: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:42:44.385015: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:42:44.385015: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:42:44.385019: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:42:44.385039: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:42:44.385041: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:42:44.385044: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:42:44.385023: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:42:44.385023: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:42:44.385027: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:42:44.385031: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:42:44.387175: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:44.387181: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:44.387189: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:44.387191: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:44.387188: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:44.387192: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:44.387194: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:44.387196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:44.387583: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:42:44.387589: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:42:44.387592: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:42:44.387595: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:42:44.387596: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:42:44.387598: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:42:44.387600: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:42:44.387604: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:42:44.391854: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:44.391856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:44.391860: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:44.391861: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:44.391868: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:44.391864: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:44.391858: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:44.391863: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:44.392268: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:42:44.392274: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:42:44.392275: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:42:44.392279: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:42:44.392282: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:42:44.392282: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:42:44.392286: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:42:44.392287: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:42:44.426113: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:44.426116: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:44.426120: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:44.426125: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:44.426134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:44.426127: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:44.426126: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:44.426134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:44.426559: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:42:44.426564: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:42:44.426566: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:42:44.426569: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:42:44.426570: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:42:44.426572: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:42:44.426574: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:42:44.426575: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:42:49.594872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.594880: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.594881: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.594883: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.594886: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.594888: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.594885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 09:42:49.595199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.594896: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.595202: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.595205: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.595208: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.595214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.595217: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.595212: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.595236: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.595540: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.595542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.595553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.595556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.595550: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.595549: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.595554: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.595559: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.596350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 09:42:49.596401: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.596356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 09:42:49.596401: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.596362: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 09:42:49.596408: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.596362: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 09:42:49.596407: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.596356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 09:42:49.596408: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.596644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 09:42:49.596365: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 09:42:49.596415: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.596371: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 09:42:49.596411: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 09:42:49.596650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.596373: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 09:42:49.596413: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 09:42:49.596739: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 09:42:49.596656: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.596655: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.596736: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 09:42:49.596665: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:42:49.596782: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 09:42:49.596752: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 09:42:49.596667: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.596756: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 09:42:49.596664: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 09:42:49.596784: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 09:42:49.596899: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.596747: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 09:42:49.596671: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 09:42:49.596790: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.596748: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 09:42:49.596791: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 09:42:49.596898: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.596753: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 09:42:49.596796: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 09:42:49.596902: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.596751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 09:42:49.596797: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 09:42:49.596906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:42:49.596794: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:42:49.596794: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 09:42:49.596906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.596908: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.596909: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.596916: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:42:49.596916: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:42:49.596919: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:42:49.596922: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:42:49.596924: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:42:49.596924: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:42:49.596926: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:42:49.596928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:42:49.596943: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:42:49.597596: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.597595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.597594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.597597: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.597601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.597601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.597610: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:42:49.597609: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.597611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:42:49.597614: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:42:49.597615: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:42:49.597617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:42:49.597617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:42:49.597618: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:42:49.597623: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:42:49.597624: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:42:49.598376: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:42:49.598422: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.598384: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 09:42:49.598471: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.598426: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:42:49.598386: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.598391: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:42:49.598478: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:42:49.598424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:42:49.598383: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:49.598479: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:42:49.598425: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:42:49.598393: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.598436: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:49.598480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:42:49.598431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:42:49.598391: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.598439: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:42:49.598446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.598397: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:42:49.598482: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:42:49.598438: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:42:49.598401: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:42:49.598405: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:42:49.598409: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:49.598485: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.598409: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:42:49.598481: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:42:49.598441: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:42:49.598417: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:49.598491: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:42:49.598498: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.598448: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:42:49.598454: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:49.598499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:42:49.598499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:42:49.598502: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:42:49.598457: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:42:49.598458: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:42:49.598424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 09:42:49.598522: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:42:49.598480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:42:49.598436: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:42:49.598495: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:42:49.598733: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:42:49.598441: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:42:49.598526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:42:49.598537: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:42:49.598541: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.598735: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.598735: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.598741: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.598740: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-17 09:42:49.598844: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.598743: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.598749: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:42:49.598750: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:42:49.598750: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 09:42:49.598757: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.598762: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:42:49.598763: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 09:42:49.598763: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.598761: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:42:49.598769: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:42:49.598776: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:42:49.598849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:42:49.598850: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:42:49.598851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:42:49.598861: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598863: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598862: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598865: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598867: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598869: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598871: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:42:49.598888: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:42:49.598902: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:42:49.597145: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.597145: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.597148: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.597151: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.597152: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.597150: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.597153: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.597161: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:42:49.597162: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:42:49.597163: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:42:49.597167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:42:49.597169: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:42:49.597170: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:42:49.597171: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:42:49.597174: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:42:49.597179: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +4: Successfully preprocessed all matching files. +4: Successfully preprocessed all matching files. +4: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils...Loading extension module utils...Loading extension module utils... +0: +0: +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils...Loading extension module utils...Loading extension module utils... +4: +4: Loading extension module utils...Loading extension module utils... +4: +4: +4: Loading extension module utils...Loading extension module utils... +4: +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils...Loading extension module utils... +7: +7: Loading extension module utils...Loading extension module utils...Loading extension module utils... +7: +7: +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +0: +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils...Loading extension module utils... +7: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: +2: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: +2: +2: +2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... +2: +2: +2: +2: +2: +2: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +6: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: +6: Loading extension module utils...Loading extension module utils... +6: +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +3: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +5: +3: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils...Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: +5: Loading extension module utils... +3: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m60b100m/3328570.out b/146m60b100m/3328570.out new file mode 100644 index 0000000000000000000000000000000000000000..6e7d1e4b0f7fbd5d02fbe2fbdfe4d54a82d22d83 --- /dev/null +++ b/146m60b100m/3328570.out @@ -0,0 +1,5665 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m60b100mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m60b100mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m60b100m --load checkpoints_146m60b100m --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3328570.json --zero-stage 0 +START 3328570: Fri 17 Mar 2023 09:42:20 AM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 49.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 38.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 39.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 47.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 44.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 48.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 44.0c 79.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 49.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 44.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 43.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 41.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 47.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 42.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 38.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 50.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 39.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 43.0c 77.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 47.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 50.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 48.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 46.0c 78.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 50.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 42.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +5: Launching on nid006701 (5/8), master nid006696 port 9999, GPUs 8, CUDA: True +7: Launching on nid006703 (7/8), master nid006696 port 9999, GPUs 8, CUDA: True +4: Launching on nid006700 (4/8), master nid006696 port 9999, GPUs 8, CUDA: True +1: Launching on nid006697 (1/8), master nid006696 port 9999, GPUs 8, CUDA: True +0: Launching on nid006696 (0/8), master nid006696 port 9999, GPUs 8, CUDA: True +2: Launching on nid006698 (2/8), master nid006696 port 9999, GPUs 8, CUDA: True +6: Launching on nid006702 (6/8), master nid006696 port 9999, GPUs 8, CUDA: True +3: Launching on nid006699 (3/8), master nid006696 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3328570.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m60b100mval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m60b100m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m60b100m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m60b100mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +7: > setting tensorboard ... +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-17 09:43:08,145] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.109 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: [1/1] c++ scaled_masked_softmax_hip.cuda.o scaled_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: [1/1] c++ layer_norm_cuda.o layer_norm_hip_kernel.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so +0: >>> done with compiling and loading fused kernels. Compilation time: 22.964 seconds +0: time to initialize megatron (seconds): 38.325 +0: [after megatron is initialized] datetime: 2023-03-17 09:43:34 +0: building GPT model ... +0: [2023-03-17 09:43:34,460] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-17 09:43:34,461] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-17 09:43:34,461] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.39 GB, percent = 6.2% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-17 09:43:36,458] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-17 09:43:36,720] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-17 09:43:36,720] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-17 09:43:36,721] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.41 GB, percent = 6.2% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-17 09:43:36,722] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-17 09:43:50,021] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-17 09:43:50,021] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-17 09:43:50,021] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-17 09:43:50,026] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-17 09:43:50,026] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-17 09:43:50,143] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-17 09:43:50,144] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 09:43:50,144] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.09 GB, percent = 6.4% +0: ninja: no work to do. +0: Time to load utils op: 0.20661282539367676 seconds +4: Time to load utils op: 0.209242582321167 seconds +0: Time to load utils op: 0.10257673263549805 seconds +7: Time to load utils op: 0.20963048934936523 seconds +0: [2023-03-17 09:43:50,369] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-17 09:43:50,369] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 09:43:50,370] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.09 GB, percent = 6.4% +0: ninja: no work to do. +0: Time to load utils op: 0.1374208927154541 seconds +0: Time to load utils op: 0.0006983280181884766 seconds +4: Time to load utils op: 0.0004246234893798828 seconds +7: Time to load utils op: 0.0005936622619628906 seconds +0: Time to load utils op: 0.0004894733428955078 seconds +0: Time to load utils op: 0.2037677764892578 secondsTime to load utils op: 0.20337724685668945 seconds +0: +0: Time to load utils op: 0.20352816581726074 secondsTime to load utils op: 0.20375776290893555 seconds +0: +0: Time to load utils op: 0.20334219932556152 seconds +0: [2023-03-17 09:43:50,495] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-17 09:43:50,495] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 09:43:50,495] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.14 GB, percent = 6.4% +4: Time to load utils op: 0.2037951946258545 seconds +4: Time to load utils op: 0.2040722370147705 seconds +4: Time to load utils op: 0.20318913459777832 seconds +4: Time to load utils op: 0.2034318447113037 secondsTime to load utils op: 0.2041025161743164 secondsTime to load utils op: 0.20409679412841797 secondsTime to load utils op: 0.20432257652282715 seconds +4: +4: +4: +1: Time to load utils op: 0.21182465553283691 secondsTime to load utils op: 0.20936846733093262 seconds +1: Time to load utils op: 0.2091825008392334 secondsTime to load utils op: 0.20756125450134277 seconds +1: +1: +1: Time to load utils op: 0.2078554630279541 secondsTime to load utils op: 0.20737457275390625 seconds +1: Time to load utils op: 0.20811104774475098 seconds +1: +1: Time to load utils op: 0.20853066444396973 seconds +2: Time to load utils op: 0.2109975814819336 secondsTime to load utils op: 0.21101045608520508 seconds +2: Time to load utils op: 0.21100831031799316 seconds +2: +2: Time to load utils op: 0.21102356910705566 secondsTime to load utils op: 0.21106529235839844 secondsTime to load utils op: 0.21105265617370605 seconds +2: +2: +2: Time to load utils op: 0.21106767654418945 secondsTime to load utils op: 0.21106314659118652 seconds +2: +7: Time to load utils op: 0.20364069938659668 seconds +7: Time to load utils op: 0.20336508750915527 seconds +7: Time to load utils op: 0.20371699333190918 seconds +7: Time to load utils op: 0.2033708095550537 seconds +7: Time to load utils op: 0.20375561714172363 secondsTime to load utils op: 0.20397162437438965 seconds +7: +7: Time to load utils op: 0.20361757278442383 seconds +3: Time to load utils op: 0.2117290496826172 seconds +3: Time to load utils op: 0.21173691749572754 seconds +3: Time to load utils op: 0.21107792854309082 seconds +3: Time to load utils op: 0.2110602855682373 seconds +3: Time to load utils op: 0.2102673053741455 secondsTime to load utils op: 0.21064162254333496 seconds +3: +3: Time to load utils op: 0.21179795265197754 seconds +3: Time to load utils op: 0.20786523818969727 seconds +0: Time to load utils op: 0.0004425048828125 seconds +0: Time to load utils op: 0.00041484832763671875 secondsTime to load utils op: 0.0003790855407714844 seconds +0: +0: Time to load utils op: 0.00036978721618652344 seconds +0: Time to load utils op: 0.00037932395935058594 seconds +5: Time to load utils op: 0.2115039825439453 secondsTime to load utils op: 0.2115323543548584 seconds +5: +5: Time to load utils op: 0.21153879165649414 seconds +5: Time to load utils op: 0.2115461826324463 seconds +5: Time to load utils op: 0.21157431602478027 secondsTime to load utils op: 0.21160173416137695 seconds +5: +5: Time to load utils op: 0.21161150932312012 seconds +5: Time to load utils op: 0.21161103248596191 seconds +6: Time to load utils op: 0.21138930320739746 seconds +6: Time to load utils op: 0.21140027046203613 seconds +6: Time to load utils op: 0.2114102840423584 seconds +6: Time to load utils op: 0.21144556999206543 seconds +6: Time to load utils op: 0.21144628524780273 seconds +6: Time to load utils op: 0.21144485473632812 secondsTime to load utils op: 0.2114264965057373 seconds +6: +6: Time to load utils op: 0.21143841743469238 seconds +4: Time to load utils op: 0.0003218650817871094 seconds +4: Time to load utils op: 0.0003457069396972656 seconds +4: Time to load utils op: 0.0003223419189453125 seconds +4: Time to load utils op: 0.0003619194030761719 seconds +4: Time to load utils op: 0.0003445148468017578 seconds +4: Time to load utils op: 0.0003447532653808594 seconds +4: Time to load utils op: 0.00034236907958984375 seconds +7: Time to load utils op: 0.00044655799865722656 seconds +7: Time to load utils op: 0.0004429817199707031 seconds +7: Time to load utils op: 0.0005023479461669922 seconds +7: Time to load utils op: 0.0004811286926269531 seconds +7: Time to load utils op: 0.0004820823669433594 seconds +7: Time to load utils op: 0.0004801750183105469 secondsTime to load utils op: 0.0004696846008300781 seconds +7: +1: Time to load utils op: 0.0004596710205078125 seconds +1: Time to load utils op: 0.0003845691680908203 seconds +1: Time to load utils op: 0.0005052089691162109 seconds +1: Time to load utils op: 0.000476837158203125 seconds +1: Time to load utils op: 0.0005192756652832031 seconds +1: Time to load utils op: 0.0006809234619140625 seconds +1: Time to load utils op: 0.0006735324859619141 seconds +1: Time to load utils op: 0.0006706714630126953 seconds +2: Time to load utils op: 0.0005972385406494141 seconds +3: Time to load utils op: 0.0005142688751220703 seconds +2: Time to load utils op: 0.0004923343658447266 secondsTime to load utils op: 0.0004875659942626953 secondsTime to load utils op: 0.0004949569702148438 secondsTime to load utils op: 0.0004982948303222656 seconds +2: +2: +2: +2: Time to load utils op: 0.0004875659942626953 secondsTime to load utils op: 0.0004971027374267578 seconds +2: +2: Time to load utils op: 0.0006501674652099609 seconds +6: Time to load utils op: 0.001100301742553711 seconds +3: Time to load utils op: 0.0003955364227294922 seconds +6: Time to load utils op: 0.0011413097381591797 seconds +6: Time to load utils op: 0.0011017322540283203 seconds +3: Time to load utils op: 0.0003998279571533203 seconds +5: Time to load utils op: 0.0009305477142333984 seconds +3: Time to load utils op: 0.0003905296325683594 seconds +3: Time to load utils op: 0.0003771781921386719 seconds +5: Time to load utils op: 0.0009307861328125 seconds +6: Time to load utils op: 0.0012705326080322266 secondsTime to load utils op: 0.0012793540954589844 seconds +6: +6: Time to load utils op: 0.0012788772583007812 seconds +5: Time to load utils op: 0.0011131763458251953 seconds +6: Time to load utils op: 0.0013341903686523438 seconds +6: Time to load utils op: 0.0012938976287841797 seconds +5: Time to load utils op: 0.0012125968933105469 seconds +3: Time to load utils op: 0.0004525184631347656 seconds +3: Time to load utils op: 0.0004646778106689453 seconds +5: Time to load utils op: 0.0011637210845947266 secondsTime to load utils op: 0.0012004375457763672 seconds +5: Time to load utils op: 0.0012083053588867188 seconds +3: Time to load utils op: 0.0004608631134033203 seconds +5: +5: Time to load utils op: 0.0012369155883789062 seconds +0: [2023-03-17 09:43:50,612] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-17 09:43:50,612] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 09:43:50,613] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.24 GB, percent = 6.4% +0: [2023-03-17 09:43:50,720] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-17 09:43:50,720] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:43:50,721] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.24 GB, percent = 6.4% +0: [2023-03-17 09:43:50,824] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-17 09:43:50,824] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:43:50,824] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.24 GB, percent = 6.4% +0: [2023-03-17 09:43:50,931] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-17 09:43:50,931] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:43:50,932] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.24 GB, percent = 6.4% +0: [2023-03-17 09:43:51,034] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-17 09:43:51,035] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:43:51,035] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.24 GB, percent = 6.4% +0: [2023-03-17 09:43:51,143] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-17 09:43:51,144] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:43:51,144] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.24 GB, percent = 6.4% +0: [2023-03-17 09:43:51,247] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-17 09:43:51,247] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:43:51,248] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.24 GB, percent = 6.4% +0: [2023-03-17 09:43:51,248] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-17 09:43:51,248] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-17 09:43:51,248] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-17 09:43:51,248] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-17 09:43:51,248] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-17 09:43:51,249] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-17 09:43:51,250] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-17 09:43:51,251] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0004286766052246094 seconds +0: [2023-03-17 09:43:51,251] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-17 09:43:51,262] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +5: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:43:51,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:43:51,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:43:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:43:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:43:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:43:51,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:43:51,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:43:51,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:43:51,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:43:51,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:43:51,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:43:51,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:43:51,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:43:51,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:43:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:43:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:43:51,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:43:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:43:51,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:43:51,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:43:51,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:43:51,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:43:51,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:43:51,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:43:51,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:43:51,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:43:51,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:43:51,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:43:51,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:43:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:43:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:43:51,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:43:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:43:51,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:43:51,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:43:51,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:43:51,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:43:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:43:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:43:51,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:51,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:51,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:43:51,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:51,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:43:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:51,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:51,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:51,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:51,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:51,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:51,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:51,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:51,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:43:51,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:51,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:51,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:51,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:51,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:51,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:51,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:52,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:52,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:43:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:43:52,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:43:52,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:43:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:43:52,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:43:52,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:43:52,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:43:52,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:43:52,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:43:52,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:43:52,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:43:52,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:43:52,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:43:52,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:43:52,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:43:52,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:43:52,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:43:52,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:43:52,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:43:52,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:43:52,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:43:52,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:43:52,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:43:52,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:43:52,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:43:52,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:43:52,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:43:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:43:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:43:52,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:43:52,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:43:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:43:52,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:43:52,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:43:52,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:43:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:43:52,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:43:52,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:43:52,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:43:52,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:43:52,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:43:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:43:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:43:52,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:43:52,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:43:52,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:43:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:43:52,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:43:52,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +6: [2023-03-17 09:43:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +2: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:43:52,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:43:52,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:43:53,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:43:53,001] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +0: [2023-03-17 09:43:53,003] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +3: [2023-03-17 09:43:53,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:43:53,007] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +1: [2023-03-17 09:43:53,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:43:53,007] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +6: [2023-03-17 09:43:53,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:43:53,008] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +1: [2023-03-17 09:43:53,009] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +6: [2023-03-17 09:43:53,009] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +6: [2023-03-17 09:43:53,011] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +7: [2023-03-17 09:43:53,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:43:53,013] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +4: [2023-03-17 09:43:53,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:43:53,013] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +2: [2023-03-17 09:43:53,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:43:53,014] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +7: [2023-03-17 09:43:53,014] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +4: [2023-03-17 09:43:53,015] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +2: [2023-03-17 09:43:53,016] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +3: [2023-03-17 09:43:53,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:43:53,022] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +2: [2023-03-17 09:43:53,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:43:53,023] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +3: [2023-03-17 09:43:53,023] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +2: [2023-03-17 09:43:53,025] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +2: [2023-03-17 09:43:53,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:43:53,029] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +2: [2023-03-17 09:43:53,031] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +4: [2023-03-17 09:43:53,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:43:53,031] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +4: [2023-03-17 09:43:53,033] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +5: [2023-03-17 09:43:53,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:43:53,034] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +5: [2023-03-17 09:43:53,036] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +1: [2023-03-17 09:43:53,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:43:53,036] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +1: [2023-03-17 09:43:53,038] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +0: [2023-03-17 09:43:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:43:53,046] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +6: [2023-03-17 09:43:53,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:43:53,047] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +7: [2023-03-17 09:43:53,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:43:53,047] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +0: [2023-03-17 09:43:53,048] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +5: [2023-03-17 09:43:53,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:43:53,048] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +6: [2023-03-17 09:43:53,049] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +7: [2023-03-17 09:43:53,049] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +7: [2023-03-17 09:43:53,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:43:53,049] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +4: [2023-03-17 09:43:53,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:43:53,049] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +5: [2023-03-17 09:43:53,050] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +7: [2023-03-17 09:43:53,051] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +4: [2023-03-17 09:43:53,051] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +0: [2023-03-17 09:43:53,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:43:53,051] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +0: [2023-03-17 09:43:53,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:43:53,052] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +0: [2023-03-17 09:43:53,053] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +0: [2023-03-17 09:43:53,054] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +5: [2023-03-17 09:43:53,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:43:53,055] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +4: [2023-03-17 09:43:53,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:43:53,057] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +5: [2023-03-17 09:43:53,057] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +6: [2023-03-17 09:43:53,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:43:53,057] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +3: [2023-03-17 09:43:53,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:43:53,058] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +3: [2023-03-17 09:43:53,058] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +6: [2023-03-17 09:43:53,059] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +2: [2023-03-17 09:43:53,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:43:53,059] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +3: [2023-03-17 09:43:53,060] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +2: [2023-03-17 09:43:53,061] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +4: [2023-03-17 09:43:53,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:43:53,063] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +3: [2023-03-17 09:43:53,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:43:53,065] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +4: [2023-03-17 09:43:53,065] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +5: [2023-03-17 09:43:53,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:43:53,066] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +1: [2023-03-17 09:43:53,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:43:53,066] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +1: [2023-03-17 09:43:53,066] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +2: [2023-03-17 09:43:53,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:43:53,067] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +2: [2023-03-17 09:43:53,068] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +1: [2023-03-17 09:43:53,068] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +4: [2023-03-17 09:43:53,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:43:53,068] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +2: [2023-03-17 09:43:53,070] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +4: [2023-03-17 09:43:53,070] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +6: [2023-03-17 09:43:53,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:43:53,072] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +3: [2023-03-17 09:43:53,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:43:53,073] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +6: [2023-03-17 09:43:53,074] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +3: [2023-03-17 09:43:53,075] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +2: [2023-03-17 09:43:53,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:43:53,075] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +2: [2023-03-17 09:43:53,077] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +0: [2023-03-17 09:43:53,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:43:53,078] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +7: [2023-03-17 09:43:53,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:43:53,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:43:53,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +7: [2023-03-17 09:43:53,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +0: [2023-03-17 09:43:53,080] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +7: [2023-03-17 09:43:53,080] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +7: [2023-03-17 09:43:53,080] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +7: [2023-03-17 09:43:53,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:43:53,084] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +6: [2023-03-17 09:43:53,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:43:53,085] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +1: [2023-03-17 09:43:53,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:43:53,086] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +5: [2023-03-17 09:43:53,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:43:53,086] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +5: [2023-03-17 09:43:53,086] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +6: [2023-03-17 09:43:53,087] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +7: [2023-03-17 09:43:53,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:43:53,087] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +5: [2023-03-17 09:43:53,088] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +1: [2023-03-17 09:43:53,088] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +7: [2023-03-17 09:43:53,088] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +6: [2023-03-17 09:43:53,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:43:53,090] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +6: [2023-03-17 09:43:53,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:43:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:43:53,091] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +4: [2023-03-17 09:43:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:43:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:43:53,091] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +4: [2023-03-17 09:43:53,091] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +5: [2023-03-17 09:43:53,091] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +2: [2023-03-17 09:43:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:43:53,091] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +6: [2023-03-17 09:43:53,091] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +0: [2023-03-17 09:43:53,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:43:53,092] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +6: [2023-03-17 09:43:53,092] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +1: [2023-03-17 09:43:53,092] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +5: [2023-03-17 09:43:53,093] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +4: [2023-03-17 09:43:53,093] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +2: [2023-03-17 09:43:53,093] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +0: [2023-03-17 09:43:53,093] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +0: [2023-03-17 09:43:53,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:43:53,095] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +5: [2023-03-17 09:43:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:43:53,095] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +1: [2023-03-17 09:43:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:43:53,095] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +4: [2023-03-17 09:43:53,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:43:53,096] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +0: [2023-03-17 09:43:53,096] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +5: [2023-03-17 09:43:53,097] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +0: [2023-03-17 09:43:53,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:43:53,097] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +0: [2023-03-17 09:43:53,097] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +4: [2023-03-17 09:43:53,098] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +6: [2023-03-17 09:43:53,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:43:53,098] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +0: [2023-03-17 09:43:53,099] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +5: [2023-03-17 09:43:53,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:43:53,100] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +6: [2023-03-17 09:43:53,100] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +5: [2023-03-17 09:43:53,101] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +1: [2023-03-17 09:43:53,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:43:53,101] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +1: [2023-03-17 09:43:53,103] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +3: [2023-03-17 09:43:53,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:43:53,103] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +3: [2023-03-17 09:43:53,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:43:53,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +3: [2023-03-17 09:43:53,105] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +3: [2023-03-17 09:43:53,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:43:53,107] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +3: [2023-03-17 09:43:53,109] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +3: [2023-03-17 09:43:53,109] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +7: [2023-03-17 09:43:53,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:43:53,110] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +7: [2023-03-17 09:43:53,112] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +2: [2023-03-17 09:43:53,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:43:53,121] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +2: [2023-03-17 09:43:53,123] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +1: [2023-03-17 09:43:53,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:43:53,190] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +1: [2023-03-17 09:43:53,192] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +0: successfully loaded checkpoint from checkpoints_146m60b100m at iteration 0 +7: time (ms) | load-checkpoint: 1932.45 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-17 09:43:53 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.032065 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.086 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.031975 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.020 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-17 09:44:08 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 19324.23 | train/valid/test-data-iterators-setup: 13629.41 +0: [after training is done] datetime: 2023-03-17 09:44:08 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.846421E+00 | lm loss PPL: 4.682518E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3328570: Fri 17 Mar 2023 09:44:30 AM EET diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f5ae1836a24fc3898b8c6a77aa1d2a41ad772452 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba71901704e31ef7cad1e78910d8e54090f39e0fa121db0f7e8bf6471e19c97e +size 27478295 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d0976bd2f4d0dffff56ad82bd1ecc1626b743fc --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dffa844facbb9a9e8b9c1edbe54e8a709f11e0d2d18396191b624ac4244f2b90 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cd56659b5a8ef24b60007b917a128875f881915f --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9056c599f7c1f3e06dd628afba9321589154972e4e7dc4f98121233eab8cfb64 +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b5bffc80c824a73c61fa0c7070e168e0698bb600 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3db843b06bff4c69efbe9886fadc06b3f30b7abeb6086d8d25bda570c15cfe5c +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2e74dfa098865c04c8407dd87a7643b35c2dadda --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aa3476047660bd7f2e73bca2a3c7bacbd88ead44468e505e443c7350965410cf +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ae9f38dda3cf4c724f8c818a32f2ca6867db36a2 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:30cbb0bb67b69e84a88f3b9669e759ce56a7d8bd90179b73e43297c6875f53b3 +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3f19900fb506f4d38b2619be09737827aae7774a --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dbe5e381e009de6aaeed6f7bf578d7395fde3e9c4ca72c4f59a4453276ce9498 +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..18c1b0a8d23a559dba061ddfbea614e7f280bd44 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd095f962663bd1da4ec30470c180dbacb8fcfb368baffaca2064386e8a2c0b0 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f8b630c28a92279b06bebb2110fd4ce95eac588 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e79d5f3c7fca2595e47e69fbf47c2426071e833fbf09b29079b332a9923bca4 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6731c274eedf616e020d7f41da4539f8dd53bacc --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b5f67c84cad624d7c9d1c4a52d4cff43867e9f9ea863d5a13ca2065c91b4b3e6 +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e068cf34b7ff3672ac0f56d476ac5e42f0caf3de --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:252c3902a041718a74f3690179ea5068f289d3a370e64a07ae338f2f59a84315 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0594b5ffe88cb911d26d3117c9063a83d047943c --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a4f8fcd35d5803964e98e5b66321fb1aa23c1f65cc67999f8018eb04786e12c +size 27478231 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7c915912a04d3de915d186bab35f4524f80429f0 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17e9651e95f0558083e353cf97f3a439423357414abb2f5f1f3ef0f467947625 +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3562807618bd4675f326dfc939f2a01dfb257b44 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:780af1e178b0dec7381c1be8582eb26fe549496cd62b1e746dfaf9c2dda2c266 +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c607a4fe74d217c64a680045357f5d58418924aa --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb4a4db8eea18a98498a12b4a9b9fbc9e925c856ec1cc0f713f8268634a572e8 +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..40ad77e7ed90924b648d158863b52522d3bb30e3 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:61baef97bafe3a2e2db7e3c36c0630f1fb597a09e45e751cc84692f503317d14 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4bd1d984234dff777792bb3a9a8943e3a4d5ca39 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:643caca477bf87e03dc315008ab63edc70ff5202dc30e421a7cc91672118129f +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bb58cf652510480cf8e53fe26aa1b9656dce5ae --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a06a429e8756ce63b446c605c7e2e3fabfa277691c2c25933b59fa95b2e39820 +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f45228ab90b3564969f3d81701437b265666bd5 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0e19dff4dce043d1648eab2903990aad29dd9c5627bc6753524252ea72f36c5 +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a20e5bf48b747f5c08ca97667218492a4fee4ec3 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fc94323f58c6a6d80e094f8077094da839d9ca3b373af0b4a248d59bcf44a2de +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..878d5ab1808918841f477d15fe3223040a807364 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92e7038055a8c2980c41c53da0376bd9151a6fa1a409e819da005a8128df0e42 +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ab86781e04a25327b22f20e3ef2440e5f20b57d2 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c2a114e874f31d23f8a0a203762fd5b6b2aff979fdda6b1f72b8bd813b2cdab +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e38ae2851d5b077546a6bd9e81640c0cc779ee8 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0eeba5bc55089ab688ce2030e25173fa97af13ea66103ca1a17f13751c3332ec +size 27478231 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..494fbe0e80cbb940fcd8d103933642636d0ee4c0 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f56172529ec5f19fb04a7c23d669a8a1f9c72ed4b9c11c3cd46ffa70c5d81062 +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fa201fac1a8e3f41169c19b24e5cf32d9a07b9f1 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e81f503b93fa1383a54871e1eda897c23d56ada6b14948674c59ee1fd083c89e +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7d52e500f3716a7f3abc2a2e732d705431f9a389 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d47eb9a7bae88238f34a5f2dc35daeccefbb4be1882f787018401d5b864265a +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb541df7b97bf6b87a51734d4708e26256f39594 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ea9e557b03321d663f68e9733cd15e3880845f02028e783544fab34044e14bd6 +size 27478434 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c1ac28463f4af86057e34e862484735b0178ba99 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22167f23d706e0c7cc6af073b6a1d94d723415b65f1830a9f930179f268abd40 +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..13234fb6be986296c90d27646b50e93a2d9c83aa --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9bc2cc9b2c151b059223cbef52a1e212d0d2499a4e12cbbfcee9858aef38154c +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c861720532bd863efb67331a550a9425dd1b855d --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9491396f9a79563596edbe31b85539f32b29e1d91730d6a656493bd0fd6c437 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4010b328bd2647ddba5c1271fb7e0281b57b6d32 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ea7eac415f3def4984882ef65a776a8288908e4d10181ca03a7a090b361769fc +size 27478114 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..628acf5c45a136fc5114e125a3e7ed5406a09d0e --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:736c4527b256a206e0b8c53d8ddd401226fffdec3bf6099c7943fe9104505d8b +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..be04de86fb798afe9f5aa5c9b4f2d7022e70c803 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56ee3dbeba71eb5a562750c94cf2dce7d7c187e7b60dbce82197f4dfc6fce233 +size 27478434 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..09e82acf9fc8ebc6dce39e7abb153f5046b98238 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c65b591295fa0359fbbb458cbec135ecdfaccfd09235771803f284ba2e1fa990 +size 27478231 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..34b4f91d2f1cfc41b085590e6f05f1efe4d5c8b1 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a1c49fc536c2a58042153fca2856aa1b4a09e633d4929906f5dbf6a9299b1a09 +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..710761b3e08579b3bd5ca3a3f81ef4112c8aa32d --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:037d6432a655eecf7ce361ce2a97a119d527a0d0550389c85019730be9be8fdd +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f398686d36795bb8d70c4a2f32d51f6321f81be3 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4345b4b57ac99f57d681bdee8066d5b4a7fbc6c7f650abf3957d396fef983a2c +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..da48b814357410f70bcb9e2630bbc877988f081a --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:779a086200b8f2de78886b13f48f243d368e93acd62f35898d91891b28a53be0 +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..85b6b116a8e86bf78601429ac6b6a729ac06c9a7 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b803307cc8350b557e72c809077488615c5e41126d882095a46832df7da21a0f +size 27478434 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4f37b75e75e1143a9d83455a8fee975373eca519 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:307a0bc8d5cdf8a20d0545df1d3f9b463f122d83e01c2c7d87ca77bb04a79af3 +size 27478114 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..444899795619c70f01f9033baacd626470ff9664 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a4d7369b025a378a75f960cdf41fd7dea5a34543b2d1a8cbc88f2c26d223697 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2045e6cf8a9ba7f8d63b232cabad52c0b858f39c --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:57948a9c213db34eff916ceee663e4f434cfbd835585e01bd99f63016f5ba29e +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5661c2c8d333d55a43ee1384bcdc9226650a40fd --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72ff7a3da30af84b3071234ebb19c4537e6017b0a3983f68b8f30619af872282 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc64dc668d344875331d6dbcf266026099f7375d --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f7580d04e01a87c65628d7a1968993c48b554990b9f5d20a97780e229b3c647e +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f22948b67ab10e4346994f371f9279b41a92557 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33caa8c929d74a156b97db50ecbe0b7d94d2f6125d51dc189855566ef928d9c1 +size 27478231 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9cccce1905471d976d2c25ee56b9223fd009f005 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:07cf8c947a3adc9638361cb89b48b06e18ee8e7097011cc01bfb097a673e481f +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0ea9c4932dd94f92712263a02353902677fc66d1 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34f2cdb5c716c4ba62ee7a4ff35abd068afd5a7fe349207ff52965b28fba1d0d +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7553c3e1aee9600c88eac0a68db0af01f1fc6f3d --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b1e936f657a7cb4963a65aec9e7c7650780d0b79c7e4143b07aa2edf64628f4 +size 27478434 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..310a626ea1213d6086921acbfbe38874cf416c8e --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae84a4c25a5e7f7217d663b49ecce699e1400f9ed738f17e809e190360a6e693 +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ef9d1d1f62549594a67c0bbd223b64b9b048f811 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bac9b060e7307eb606344a615c886ed5592761322c68ac723b4eb1041bc33fb4 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1105e5f807eb7c9d07b2fe1562b28f10d2667cd0 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6cdf739c25cb3a89de6356948e36d37ff18f7d5b337d9bd62a6f75f014f318d8 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..15af8d10321b8569169376a287ac819b79dab7b2 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:83afd4e9da01932b408cb3b4c56164feb02cd84b0f3199462ae6264409765665 +size 27478306 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aadce6e1c7e3e73416b80cd773eb59433af7a5a2 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb0ff43243367efa876132ffe71fec516daac4fb8d680c2a9e6eaaa1e0f414fd +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c17b4e2c3dc24218c3ef5439db08a786f4d826b --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a32ec54a2981e277d76e66420f7807548a843c9d8531bce12bcee369af5936c +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..70b3489b6c09acd713de9a2d54aebfd389d41208 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5fb29c30735d600f6bdce18a10148bee7bdad8a50b05ba0908ad83e3820accf9 +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c1851363dae646e2c83f8e22d87830dd7b2305b0 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d4dcc32f2a9316ef54b80d6ce3b8e064ff94a2114d18b1d2e5d132bd858110d +size 27478167 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3b821148feba0c7f8e6c1e2b711260771d909e33 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7c9ac694f55c804fe0c841a7c3181169314fc06b47cb83abf79b203bfa32bc53 +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..60c898f9af50aba8b0f91a9f9d46ed0ba694f1f5 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7d57e1132a45c7cb78d7f6c9610b3a54d5938279250a991cef38cbe53164471 +size 27478370 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bf20f2fcf92ef72788f0f9ec18eeedd908009672 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:559b39f898b9ba2a4d4328c04472225335fd27b1dd7cfd7f902d922f2ddb9933 +size 27478178 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..491b11d7ebd6634405e745f68f935e03ce7bc350 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:910217a5ce2479098dc84ffebee59fb7f661f6337873429181464eed6bd18bbc +size 27478242 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0660c43e7bd5ae7cec0d766b0c9244660255104a --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a0c1ae387e5eb43dbd7798fd6f6b139d307c5a517d1e1cf789db1b151664924 +size 27478359 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d7c88ce49dc8df62bf8bd3da5015310eba21391 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85b4c200b70d8d69e177c4e67e0fff0ab37711134d455188eef558e8706f1286 +size 27478103 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ff9e5258c9d3489f8405170ddb44cd887e450371 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84df7606c8385a7034a70e579917d7091f8b2a464e7fdd80ef1ce91b754e5329 +size 27478359 diff --git a/146m60b100m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m60b100m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6a5c88000b4543c8e964393f24fc54af21084563 --- /dev/null +++ b/146m60b100m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b7f37eee3515e9ffd8f3dad882911557ee92f8c1b5d6ca6e118f1cdae240424 +size 27478167 diff --git a/146m60b100m/global_step115203/layer_01-model_00-model_states.pt b/146m60b100m/global_step115203/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3786c885edcc4e3d2a76b448555d02181118d935 --- /dev/null +++ b/146m60b100m/global_step115203/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37bad8dca8b88bea629e03ff9d7a9bb4699b03ae96c6fb0df14ed99378e02595 +size 80413955 diff --git a/146m60b100m/global_step115203/layer_03-model_00-model_states.pt b/146m60b100m/global_step115203/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8b50d574c00eebdd2cfdacb1f9fa59013f13cb24 --- /dev/null +++ b/146m60b100m/global_step115203/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7db6fc2a460f6dcb3373751d0052fd13abb879f67a8ad662ddf2a010b41c1467 +size 14180099 diff --git a/146m60b100m/global_step115203/layer_04-model_00-model_states.pt b/146m60b100m/global_step115203/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..36dc140abff76977d5a8ac1bf2a66f181479683d --- /dev/null +++ b/146m60b100m/global_step115203/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd9329eacea1bc10eb175c0f6bc95b4bb3bd3db08578c2fe58b71d9a8f33f07a +size 14180099 diff --git a/146m60b100m/global_step115203/layer_05-model_00-model_states.pt b/146m60b100m/global_step115203/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6033ac0995e24147de44c0db48c4546d385260de --- /dev/null +++ b/146m60b100m/global_step115203/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e4ed327930302d9bc0264fe0807a13657ed8bcd4267a5af27009679efb53d809 +size 14180099 diff --git a/146m60b100m/global_step115203/layer_06-model_00-model_states.pt b/146m60b100m/global_step115203/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0d85ec94de6aa537ebcea5930f322913659ab4b3 --- /dev/null +++ b/146m60b100m/global_step115203/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:036294b4b71ae9ad5155085bbdb9676ea1e641523ff0dc3c1e8fba29b294463b +size 14180099 diff --git a/146m60b100m/global_step115203/layer_07-model_00-model_states.pt b/146m60b100m/global_step115203/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..345e1f0f15f0bfa7e9efbe9790dd3340383df2e6 --- /dev/null +++ b/146m60b100m/global_step115203/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5b685daec0ff63a7e92226f3003c837fd97e882a11415c18c6b1076475d89784 +size 14180099 diff --git a/146m60b100m/global_step115203/layer_08-model_00-model_states.pt b/146m60b100m/global_step115203/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..68959a3a8766f813ad8f1a5d836418a7b3cd6e1f --- /dev/null +++ b/146m60b100m/global_step115203/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:03b31fd0d506bd51758de1c7c344c1583df1cbab8f9d1c65a55d8cd4198cf3d7 +size 14180099 diff --git a/146m60b100m/global_step115203/layer_09-model_00-model_states.pt b/146m60b100m/global_step115203/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2199fb0987da10fbe31853f0438a9787ba3fe33c --- /dev/null +++ b/146m60b100m/global_step115203/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d78d43d595ec4736e2e26deaae549c41b0184601e654285493515732546089fb +size 14180099 diff --git a/146m60b100m/global_step115203/layer_10-model_00-model_states.pt b/146m60b100m/global_step115203/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a44891ad4a7420b284072ff2c6cc6041c4336cc0 --- /dev/null +++ b/146m60b100m/global_step115203/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ffe7dd9942ee8c65d6f30b5d96c7581e7c85ef646c1e95c86fcd3c2094181be +size 14180099 diff --git a/146m60b100m/global_step115203/layer_11-model_00-model_states.pt b/146m60b100m/global_step115203/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7ff9e419e95f9b5db58574c4063b2d5fff707386 --- /dev/null +++ b/146m60b100m/global_step115203/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d63b492bd10e080051a57669724fad56e1fe5cd59e2c0caf79a420135bd2aa65 +size 14180099 diff --git a/146m60b100m/global_step115203/layer_12-model_00-model_states.pt b/146m60b100m/global_step115203/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5313d28e94092c16899723e3a79f19d7fd509528 --- /dev/null +++ b/146m60b100m/global_step115203/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:765256c26712196c933b6832ea290c108bdb0437b0ae9ccd58c6b4e0df45f30b +size 14180099 diff --git a/146m60b100m/global_step115203/layer_13-model_00-model_states.pt b/146m60b100m/global_step115203/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ca65be79c8ca559f451926c7877fdbba852b6a3b --- /dev/null +++ b/146m60b100m/global_step115203/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:155adf86baea49f0ba2d7860498ee714a93b1523d1e77df57c823728b556ea2d +size 14180099 diff --git a/146m60b100m/global_step115203/layer_14-model_00-model_states.pt b/146m60b100m/global_step115203/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d145bd57d507f6f1cd2c654f70fc075e1ca99d8e --- /dev/null +++ b/146m60b100m/global_step115203/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5704056664f943c63a68a16feb3201ee4ce5d69e34dd1188d9e9296ccad3c63b +size 14180099 diff --git a/146m60b100m/global_step115203/layer_15-model_00-model_states.pt b/146m60b100m/global_step115203/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4537ffe2c385aadfd1ab3aca5abb8918332c9d99 --- /dev/null +++ b/146m60b100m/global_step115203/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:da56924903ecee6f3c731ee76d536d834a8d114f1b0375ec10fbed838fc65ae9 +size 14180099 diff --git a/146m60b100m/global_step115203/layer_16-model_00-model_states.pt b/146m60b100m/global_step115203/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee879504c8407608394645ec6624784bad3789a2 --- /dev/null +++ b/146m60b100m/global_step115203/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:78abdb17843bcbebf18f62a4ba4de63c667cd811b4d4d602763c725d18831c07 +size 14180099 diff --git a/146m60b100m/global_step115203/layer_17-model_00-model_states.pt b/146m60b100m/global_step115203/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..85eb7600ae3eb8c0db0d86bfd4b0ed694441c6ad --- /dev/null +++ b/146m60b100m/global_step115203/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:57451869f289a08cb85201637229d7b6f5142d697a40e24f5152c21bea803fce +size 14180099 diff --git a/146m60b100m/global_step115203/layer_19-model_00-model_states.pt b/146m60b100m/global_step115203/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0c26bfed1954ec55e75f1bd614e1a6c2d810226 --- /dev/null +++ b/146m60b100m/global_step115203/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1967fcbfd60eac88475d5f0f5954fc6f27b7cb5d5a15498cee5faa0fe0b00b29 +size 4291 diff --git a/146m60b100m/global_step115203/mp_rank_00_model_states.pt b/146m60b100m/global_step115203/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0fdf1b6fa71425256ea0fb96c4883922b41104f9 --- /dev/null +++ b/146m60b100m/global_step115203/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c429b7c677a7213f754a2871562e558a5a33772f228f99b3cc129aa2fa032d2a +size 35443 diff --git a/146m60b100m/sbatch_146m60b100m.sh b/146m60b100m/sbatch_146m60b100m.sh new file mode 100644 index 0000000000000000000000000000000000000000..79d4e4e4e7adfb42c03cc34430b0f84b7308d052 --- /dev/null +++ b/146m60b100m/sbatch_146m60b100m.sh @@ -0,0 +1,174 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 2-0:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m60b100m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT +mkdir -p $CHECKPOINT_PATH +mkdir -p $TENSORBOARD_PATH + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 19873180000 +# -> Samples: 9703701 +# TRAIN_SAMPLES=9_703_701 +# Tokens: 31633480000 +# -> Samples: 15446035 +# TRAIN_SAMPLES=15_446_035 +# Tokens: 60400000000 +# -> Samples: 29492188 +TRAIN_SAMPLES=29_492_188 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 294_922 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + --checkpoint-activations \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 100 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 10000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m60b100m/sbatch_146m60b100mval.sh b/146m60b100m/sbatch_146m60b100mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..0d4491b4f270ee49ea688a91d53f2e99c83f2748 --- /dev/null +++ b/146m60b100m/sbatch_146m60b100mval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m60b100mval +VARIANT_CKPT=146m60b100m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m60b100m/tensorboard_146m60b100m/events.out.tfevents.1678985650.nid006724.52958.0 b/146m60b100m/tensorboard_146m60b100m/events.out.tfevents.1678985650.nid006724.52958.0 new file mode 100644 index 0000000000000000000000000000000000000000..527628f11e167836a5daef9ce077b068c8f648ca --- /dev/null +++ b/146m60b100m/tensorboard_146m60b100m/events.out.tfevents.1678985650.nid006724.52958.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f034bf3054a76b220933f4f76a114e33f078ed8669ad5b2c11f548faf97c190d +size 198352922 diff --git a/146m60b100m/tensorboard_146m60b100mval/events.out.tfevents.1679038988.nid006703.18642.0 b/146m60b100m/tensorboard_146m60b100mval/events.out.tfevents.1679038988.nid006703.18642.0 new file mode 100644 index 0000000000000000000000000000000000000000..6f5dd4e82e979e7a1b28f058405fe8f93ac43a60 --- /dev/null +++ b/146m60b100m/tensorboard_146m60b100mval/events.out.tfevents.1679038988.nid006703.18642.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7c5f4a724a0cee7baf6a89b73f8cf5ad5a3eee7d54a0ecb1fd8ab49c35d3fb43 +size 980 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..63579cf234f23db728580415ce59016cbc921fcf --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6f6faec12b54dbf569f00f6c3c009e1d57a1ca9f4354675ce2895bd5ae18f29e +size 27478295 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a62092223ac9d9fc9d4b94bdad2514954df00cfd --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ccde6e717b8893c3fd929d18759a714822e472290744e88a59b43d6ff0eed872 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9762e62d3f1678cfcc8d58cceffa7dc1f0a534f6 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9d45389d57a2b4bd2ecb6ff0cace76141858b0bbdaf809a8ec9a6dcc5f771d73 +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..73d055ea82cb8f833ad5a7422f05d6a780b6abb9 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:45f34d4d85229189a332dd14f0c0ea5f2fac3d60dddd1995da14f8a27d4a83d2 +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ba9278894f997204e8c72719c7999589f76b904f --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a55bd9766044435434fd465fc7ab2d2469933135ba124f41d1a14f0d673f6e0 +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d9c66424211639adfafa78163f5c0e9d6c5a4c93 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a49c45a3cbf50e5e89f8139e143714ce5e59b405831ad247f53e7fdb25bfde80 +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dbd15cd4f3802398917bfe4b733f94b26147f3af --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4335fee699eb5bd52fac5b3aaacace69de31c700696b5d2b1e6eb134507a8b7b +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..99ff5d54911d9d98f12cd20999a7ba0c59db5bfc --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d5d562cf10f0336f5be074e6d13a7ef7f63d9813252f9cde6b4d25d04c8925f +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..49e626f161bee10f7d0e4c2f2e13835e05a8a76d --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b9422a7dca27c358455b14546e78c38117b97d84e3e633b772ceefbc6453c0f7 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..df32a52999bda0b371fc695d9d89b0a75d1f283c --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6fdc538fbc5c4718e4fb629951a05c1f84cbc4576e7b000f8e6d426a545748c5 +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b35c7214b37f8f2cbbfcfeb930b2a93dd9498bc --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c9d3fc47edbca2d2e5bfd36475f45a5de968cf78b5e580280b67f02ec1d5327 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3d1e9187c8543e351cb0ad249f04d200b455b93 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2227fc50cb532b4b99582813714144fedf5bd77169305170f1d53c62ccfe1b4b +size 27478231 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a9e4b7108ac8fc3b1558173894edff643f58e77b --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17d4e5f7b880f6a17dd6fa3f0f61d3a0283fd5a2db3687efe82c5c842ef48f99 +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e05d0081e7c56e3468f7c35c260e9eb82d90d8d9 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2182d03904831d9e048d5d0c1cce6d4d7ce0db8feafff8e7c8f42a21a64cfb5d +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7c23693b545031a2201d7bd76c3439d7e3b17a3c --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:773c7979daf3b3da5f3b1c620188cad58272e50f3241e0ac80f1491585c2833c +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..386137516590e637915b39268dd4596e305f76bf --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:719c4504331d324892c3e0200a6e2132312f2aa1ae36a5500300461c20d1d1aa +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6e0f7715d0ab3cff4df5d57b3988e5e3b0397767 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eeb99b0bf74bcded1235efa91e451dfb7f63bdfb74d53f13c5eec63ec9accbe4 +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5196d7c192a4e93af50f4c1bc8ef7e502f67853c --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a1a4a86853d2b1cfd1a77aa1f9df6c34b23c39ee3854f766b9ae1438921b968b +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c2f1b00bcb4141426270cf534312abb6fdc6525 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b35dc8bd60384ac2498bf221bf860efa9b54ecb5c6b0d6e854c9bbf5601ab142 +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c275f92d0a8cae2f1f0231c21fc1ba29a6d6c57 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ad578655757b18abcb9883f4bd679ca1efadbfe41bedf28066e49de283a1d4ae +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a3b2ee10695e3e598ee3f164a7da17707966016f --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:410a862afb8ae7bb38f4c5dd97082d506e4349dd6d2382eba7c1c80b24796811 +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fcece116920d74b2cb9d434e8c67c492c47dabee --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0dfbb1abf9d9b6cd6a80bdff543c99b41c4309c5d58a92073b3eccb9fa46e6ad +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..00f8f2d7acb9d14c2640a6aff75b18993a278653 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c4997735b9eae524c19ac6a39012b837ce65c7da954f921c8d01f5ee2dd2c37c +size 27478231 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..60ae200f79d8c72797f0dc0c4d9305fd405159b7 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91cbd7cb83685a389d38a532ad829a5bce8613f126f89de2716a970a6f9e004c +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..741d8b6373bf6135a6b6169d55ceaa959f67e991 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de124360ba4fb984890710d7f661510ffccf9b9147cd3b9ed00a249c4189953d +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a5cca850bed977cf04acdaca09a30599f51257c4 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:81df207fda19374d4b31c47a732d5d3761f06bea24cd9dcc5d658083c3e8ea16 +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d80261a62d3a49bba43b96c2f1b3bd24782eb85a --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05bbb5e5526678196959fcea1fd8c44c6c03035c6ab67bb949f189503a008df4 +size 27478434 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..567d5e98d0f7cf9c937ba6a083b8b15a2a7dd64a --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf775f2f36b50393316f2f9a1faa7c0e4776a46831ed226260d1c2a437abde79 +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c029ec72ee5a5bea8ca94cc36887f378cfb47e61 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:392888130be57b64a44fe3af9c4897b8ce2f1d836e2a796bf6ebefee28501b7f +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c6c81ac5a53d862604df27aa2f240220788a794d --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd736ca9224869b570f66a01ce4a3f5764443d3bac9df3cecdd74ed5566b5333 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a09f1d76ef9c3c3ab4618a028c8a0911ef4bcf31 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35f5b1ac5c8ba93fd82fe223b0d971545c234e12a524cc5e3061f23793605dea +size 27478114 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..65f18426286b9ad413099ad665f0e7c8ed0ffcb2 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51a15fe47637369034c3065261f066a13cf57073d5ae3a2880c733763a129c61 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1da298ffb174116691ac19daed13240d5d28638 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b350863b94c57b1dce470a3a76e5ea44687a0e1093b2308d00a0766415ebb6f +size 27478434 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dbb2f7952050737482b3f2595eb4cfbd07e1a253 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:01dcad0ee8681a8573f6c6cc3abff75f5d32dd43ec8704faf5f88fd1fff51665 +size 27478231 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..193c1fb1ac29f13a8a3acd312ee7ff8eafd9da4c --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:40bed9b8b197489d771c49cff0be95e39acf82e48963a4586553b3b977574bbd +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6f6e4916e70da8f81238ede0498fde696d8739be --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:50b1e0273d877bfc359c46cc759572d69d4f524121338fd1d9ebdc03bf5cff1f +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f2855aab39deea68e2ea8f58f2bff15978466c9 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f637ff923632dd3413f55f0498e946922e6abc2f8b31a6a574c569738f586a6 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0bb51ac5c4c7389e52c8801df34c8d844a9a3469 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c8048453abe466bbb89a57affa05c0c751d256ee2a7becfd0adf9109dd8a3a0 +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80f63fc78f5fda6acc44e05ff6cd983925c48c32 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4bdb556acd7fcb624bd04c85658d2e2fa1c46fa8cbf62ff2b7d3d407ff1e42c +size 27478434 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7b71bc249b1eb86f9bac384eb88bd8af22e41df9 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9f61568a37a8a69870fa4872dc1776a0070cda5d0238caef1a8100bbf1127761 +size 27478114 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c9e92517608eeb36bd0e6e0ce02b8887ab46cf1 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5cb12bc09f5c0b7cde5bcb7bc9dcecea8a0f7530cdf1e9715b855409dcd24f4 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bcf41674fcb4f695208e469b70e60917ae338ffd --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:57a785266fbe11a1f8457cd385c762e415134fc8ee7c1eda435def9a6db9e72f +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0555bf41f6652043d73d52374f6618dff0f79a9f --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:36ece907df959342b1c3e9d3dea165c0c49cc46215639917f5731565d00a0606 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ae3467c973ec1b44dc18ea8e06295c3948b00128 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:344464b76be21b8ad4421880b3adb55caa820c85e4c7f0a230798ba620804866 +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..332f300315b20613008d02f424479b28246e7c3c --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:40d08939d04ffbb266691839cf29ef19e071a3e082fa126eac14048dfa9a70ac +size 27478231 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dc596a1f82c5ef9979b6844f3945ed9782c7f1ba --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce93dc547e92d12f691b45b089b74d7110b7c3a3190907c6b5023218c6cd5136 +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..df8454ca14749db01839cd6436ad00ec5e2d3353 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:27e991404654a1b0796d5e3f47245ebc2c1cb211d3989c0008cfaa36e83bc92f +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..91a69693a0411f71f8852ec67149ad1e4c4e319a --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2dd6e5c699c3beb499e062040e09e900cb7d179d9ec7031ea9022a3e1ed57b22 +size 27478434 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..94532678284531a70bdf48f1065ed064cdb899fe --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82e8bc9de713d05842003d556b4c737c5e09976696b03f3367ee3404233ffd9b +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..de020a9b2bf5472d40b7058c38e915e5841d74ab --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f34c373817a5ca863adc446fd6878183f592b823adefbaaf056c0207d610e3c3 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..955c6b15155be471eec73569df8fa8a22686a22a --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bba3d6b77a9e26f0c94be10e54eb0be0f8d8dec913f2f9305dc0b5e19f2a8fb0 +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a1700585d1d54d6b039060ee514d8784a27da962 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80f96d8ae38cd028cc0c012066b8fd062e9dfef2e7a0758588987fd23099a37c +size 27478306 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6863abc4ed8c5354abe47f8e64eab87af27a30c8 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c7ed5b86e959082238cfe2111617a383724a86046b3af0607c7935c05c86ffeb +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e0a816a8272dd07c1122a5e82a5d180ca9802262 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06c86d2eb614dfbba12eb612683bb5b27e76a4f67daf8fe27bae6c18cc0d6eed +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee6ce3c2b6d28bc4f0b26c6cafc41c22d3bc734f --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e37d88cb4c2ef988d9a77d887e05c99089d0facc624202be485b0cdea9cf8fc3 +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a8706f484b48f02272e8d5100d26ebb161769909 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0b33750c1769a88c4758588dfc53909e2d662fcdf70653d96f2d50a79642ebc +size 27478167 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c6de3b1dacfeca7203199592fecd9f75bdfa293f --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a965b44da5802e4eb3a0bf7a757458e133ed35b160367318b7b5d85ff3c4aff6 +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..002e97c9755c0bb5f1141f44e934912b2cca72b9 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:351a33f8e67f78daddd05aff7d048b881ae81ed5a7c0b408936cab6905d3e900 +size 27478370 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ba088aa84a511a3e4ec8662e93c575ad5a886977 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c948c4b319e88b8264181ab894a0168ebd0d56ae9511b3c4f71dfb22dffc380d +size 27478178 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..42acadd60c19e5c2024cd332237f613ce467b3db --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:236d976e9842be8958de613bcfd00070dcdb49a9d81c832c11e0bdf00185dc03 +size 27478242 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..97cf8b2ba47f149e210fca70e61d4c6dd44aa117 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:847da39c77425b269074130616d47b530e8a5de42710e0a71563c54b0059cb6c +size 27478359 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..17c4719b023c64b292c6111fe7e8227d152e1f22 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f786fed37a8da491f9ab908dea151e5f277e179094cd5e3a3ee77e4112ed915d +size 27478103 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0bf27106e3f5e0aeb89961147ca98310f273b105 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ec03e54d6c46c763228733bde75240755b700a342d0c1d4bf1d55afce3a5df43 +size 27478359 diff --git a/146m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7fd82d10b8d601f963a51b21a8f78b014a992a43 --- /dev/null +++ b/146m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a22897df7930e5b6c2f184cd95192e65e3d52d5a46a244f96758589940dbc0c5 +size 27478167 diff --git a/146m60b400m/global_step115203/layer_01-model_00-model_states.pt b/146m60b400m/global_step115203/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..053422d71589bd33302224cdc6249004122945dc --- /dev/null +++ b/146m60b400m/global_step115203/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:94ac0f46acb2cf98c2721e16cf47affe54ffd3fc337b82c144dba88c4d1a2a5b +size 80413955 diff --git a/146m60b400m/global_step115203/layer_03-model_00-model_states.pt b/146m60b400m/global_step115203/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..15b6382c34251b7f683cfd52cfa1cff11cfd0d66 --- /dev/null +++ b/146m60b400m/global_step115203/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3ef05e307122d75f09ef10bb67d2bc3e654e0ae4a894bcb4bfdd2c525840a7d +size 14180099 diff --git a/146m60b400m/global_step115203/layer_04-model_00-model_states.pt b/146m60b400m/global_step115203/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8557e032cd4c4fb032ec50a841cba06ce94bd408 --- /dev/null +++ b/146m60b400m/global_step115203/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34811049ac30f62de8e9ce47612aeaebc37a3bef9e6417eca09003ea4f8057a6 +size 14180099 diff --git a/146m60b400m/global_step115203/layer_05-model_00-model_states.pt b/146m60b400m/global_step115203/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57e28d9f6ca5044ddd7d5eed00ba898cb0b60996 --- /dev/null +++ b/146m60b400m/global_step115203/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7aa165159813537a492b63cf2e49e6287cb2e239d0c6903979ff201dfdd728a0 +size 14180099 diff --git a/146m60b400m/global_step115203/layer_06-model_00-model_states.pt b/146m60b400m/global_step115203/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7572c2d1f0dcb0a54f7f1a79e2a499cc42548c4e --- /dev/null +++ b/146m60b400m/global_step115203/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c81ccf9fc0c3cf8fda1681c14a863a0a2f0601e302b0966da67641240882ba3 +size 14180099 diff --git a/146m60b400m/global_step115203/layer_07-model_00-model_states.pt b/146m60b400m/global_step115203/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bc7771c742d31579aaa0f96069589b06e2dc5713 --- /dev/null +++ b/146m60b400m/global_step115203/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:367d9a836e72079e0748c8d65f0c503a7cfbdad8e04d43f3185003481d9143bc +size 14180099 diff --git a/146m60b400m/global_step115203/layer_08-model_00-model_states.pt b/146m60b400m/global_step115203/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2ca2e1b72d89092d79fb54cddb5a580b726edb01 --- /dev/null +++ b/146m60b400m/global_step115203/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fb92b4a56d0f32a3f748d9e28be8aeacd1fb802e8b274fc546ac1784a0ac5a55 +size 14180099 diff --git a/146m60b400m/global_step115203/layer_09-model_00-model_states.pt b/146m60b400m/global_step115203/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..82abf3f9809b719d4ddd9d8d346d38bfea6537a2 --- /dev/null +++ b/146m60b400m/global_step115203/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b0ad005547700cffdaa7b9ba568fef8f5cb31cd43c1d9ab9078249ca7703360 +size 14180099 diff --git a/146m60b400m/global_step115203/layer_10-model_00-model_states.pt b/146m60b400m/global_step115203/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb6b114713ce12ae013e0158009ca6809a4c51ad --- /dev/null +++ b/146m60b400m/global_step115203/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:361ea273cfad377c0f81a3b3d50740620bf535088f05d3d9924bb915a17d35a6 +size 14180099 diff --git a/146m60b400m/global_step115203/layer_11-model_00-model_states.pt b/146m60b400m/global_step115203/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..718ce6825756b2c0fe8279f0faf69e819c171b71 --- /dev/null +++ b/146m60b400m/global_step115203/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:935667f14116e16c5cb036c66fe2d0bb3de403096dbb0bae98218b1d7aabf96e +size 14180099 diff --git a/146m60b400m/global_step115203/layer_12-model_00-model_states.pt b/146m60b400m/global_step115203/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ff71f34089188bf64f85facec70751ae5bf8c3da --- /dev/null +++ b/146m60b400m/global_step115203/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c02a3f5fee1519cb4098eb08340fac75bb2433bebde889ef10de94bee2fed94 +size 14180099 diff --git a/146m60b400m/global_step115203/layer_13-model_00-model_states.pt b/146m60b400m/global_step115203/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c321d5b017372576cea1283f651b04745100f052 --- /dev/null +++ b/146m60b400m/global_step115203/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8af3c3e8db0d8fd8c4829d2a2d53f68f878e7a546bad422fe3309d66be232fcb +size 14180099 diff --git a/146m60b400m/global_step115203/layer_14-model_00-model_states.pt b/146m60b400m/global_step115203/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7372f6ce1db0aa94686bb547366c81c3ca1dc920 --- /dev/null +++ b/146m60b400m/global_step115203/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b5264fd946b306449b853d6bae6a80b037cace676795c5227ce71acead2f7eeb +size 14180099 diff --git a/146m60b400m/global_step115203/layer_15-model_00-model_states.pt b/146m60b400m/global_step115203/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6df5e7295d51d3a3764cafaf156cc6593e8dba6d --- /dev/null +++ b/146m60b400m/global_step115203/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a6ecfbc72cd54e485ed7f83c38963232b346c70676159723ada2e9df28159def +size 14180099 diff --git a/146m60b400m/global_step115203/layer_16-model_00-model_states.pt b/146m60b400m/global_step115203/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f215616af631079a61b97f84544446f4fb65dd89 --- /dev/null +++ b/146m60b400m/global_step115203/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f0650848320cbef32c957764e5d7198f20ebfe192dd9d7899a96151b6cdc251a +size 14180099 diff --git a/146m60b400m/global_step115203/layer_17-model_00-model_states.pt b/146m60b400m/global_step115203/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..96c192386059f61d0ba83a27782a113f08679325 --- /dev/null +++ b/146m60b400m/global_step115203/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0312ad22eb4ff8c7511c8dccffc2d459fd0bf9946360aab8d69ed59df2c6b4df +size 14180099 diff --git a/146m60b400m/global_step115203/layer_19-model_00-model_states.pt b/146m60b400m/global_step115203/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f19b6ca5d357fe9271884fc8b3d6f225016998f1 --- /dev/null +++ b/146m60b400m/global_step115203/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cd8d1646cc27582e73a9e2be854019165ce7a557af69c0f63df84731daa021a6 +size 4291 diff --git a/146m60b400m/global_step115203/mp_rank_00_model_states.pt b/146m60b400m/global_step115203/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e1f8e978732f97c2083642527756ef27643de002 --- /dev/null +++ b/146m60b400m/global_step115203/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5fda644301bcdfdb6b7fafc5eaed30dee59ca958d0bfc05e8add117bc628879d +size 35443 diff --git a/146m60b400m/sbatch_146m60b400m.sh b/146m60b400m/sbatch_146m60b400m.sh new file mode 100644 index 0000000000000000000000000000000000000000..be7739435c57f18ebc7a711615b57862ef1f8483 --- /dev/null +++ b/146m60b400m/sbatch_146m60b400m.sh @@ -0,0 +1,174 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 2-0:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m60b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT +mkdir -p $CHECKPOINT_PATH +mkdir -p $TENSORBOARD_PATH + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 19873180000 +# -> Samples: 9703701 +# TRAIN_SAMPLES=9_703_701 +# Tokens: 31633480000 +# -> Samples: 15446035 +# TRAIN_SAMPLES=15_446_035 +# Tokens: 60400000000 +# -> Samples: 29492188 +TRAIN_SAMPLES=29_492_188 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 294_922 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + --checkpoint-activations \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 100 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 10000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m60b400m/sbatch_146m60b400mval.sh b/146m60b400m/sbatch_146m60b400mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..c44af7e3159f4bb90af2ade10de1b48d33c363bb --- /dev/null +++ b/146m60b400m/sbatch_146m60b400mval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m60b400mval +VARIANT_CKPT=146m60b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m60b400m/tensorboard_146m60b400m/events.out.tfevents.1679001790.nid005617.89911.0 b/146m60b400m/tensorboard_146m60b400m/events.out.tfevents.1679001790.nid005617.89911.0 new file mode 100644 index 0000000000000000000000000000000000000000..4a57593ce64ca3ecca4f5cebdee15061a48752cc --- /dev/null +++ b/146m60b400m/tensorboard_146m60b400m/events.out.tfevents.1679001790.nid005617.89911.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8e34e948765d12ef46e24fc684ab923fbd96f9e562fdcac3c55051f97b4680d5 +size 198352922 diff --git a/146m60b400m/tensorboard_146m60b400mval/events.out.tfevents.1679047372.nid005617.93514.0 b/146m60b400m/tensorboard_146m60b400mval/events.out.tfevents.1679047372.nid005617.93514.0 new file mode 100644 index 0000000000000000000000000000000000000000..6e4d905956426d34b6c22a1f5f0fae4a480585e5 --- /dev/null +++ b/146m60b400m/tensorboard_146m60b400mval/events.out.tfevents.1679047372.nid005617.93514.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:415108f774154e23866456ed5e5188f8e80bc630c163a617b8bafab5ddbdbbf6 +size 980 diff --git a/146m7b5100mdedup/3327350.err b/146m7b5100mdedup/3327350.err new file mode 100644 index 0000000000000000000000000000000000000000..ab5c2c3d55503219d1e76e6b4fe89a0e1fc59087 --- /dev/null +++ b/146m7b5100mdedup/3327350.err @@ -0,0 +1,1121 @@ +4: 2023-03-17 00:49:54.472231: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:49:54.472240: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:49:54.472241: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: 2023-03-17 00:49:54.472041: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:49:54.472043: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:49:54.472045: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:49:54.472241: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:49:54.472252: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:49:54.472064: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:49:54.472062: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:49:54.472261: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:49:54.472265: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:49:54.472072: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:49:54.472256: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:49:54.472084: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:49:54.472089: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:49:54.473179: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:49:54.473176: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:49:54.473174: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:49:54.473184: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:49:54.473185: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:49:54.473182: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:49:54.473199: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:49:54.473195: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:49:54.473915: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:49:54.473921: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:49:54.473927: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:49:54.473926: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:49:54.473924: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:49:54.473931: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:49:54.473932: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:49:54.473927: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:49:54.520370: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:49:54.520372: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:49:54.520376: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:49:54.520380: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:49:54.520386: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:49:54.520374: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:49:54.520386: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:49:54.520386: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:49:54.589453: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:49:54.589460: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:49:54.589458: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:49:54.589470: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:49:54.589457: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:49:54.589472: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:49:54.589474: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:49:54.589480: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:49:54.590681: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:49:54.590687: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:49:54.590690: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:49:54.590692: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:49:54.590696: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:49:54.590687: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:49:54.590699: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:49:54.590679: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:49:54.766861: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:49:54.766857: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:49:54.766893: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:49:54.766948: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:49:54.766953: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:49:54.766950: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:49:54.766961: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:49:54.766964: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:49:56.264006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:49:56.264011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:49:56.264006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:49:56.264011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:49:56.264012: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:49:56.264016: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:49:56.264019: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:49:56.264011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:49:56.264582: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:49:56.264589: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:49:56.264594: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:49:56.264599: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:49:56.264598: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:49:56.264602: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:49:56.264603: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:49:56.264605: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:49:56.264300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:49:56.264315: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:49:56.264315: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:49:56.264317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:49:56.264320: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:49:56.264317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:49:56.264319: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:49:56.264316: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:49:56.264700: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:49:56.264703: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:49:56.264707: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:49:56.264707: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:49:56.264710: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:49:56.264712: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:49:56.264715: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:49:56.264718: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:49:56.271803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:49:56.271812: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:49:56.271809: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:49:56.271815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:49:56.271820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:49:56.271821: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:49:56.271821: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:49:56.271833: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:49:56.272245: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:49:56.272252: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:49:56.272255: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:49:56.272257: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:49:56.272256: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:49:56.272260: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:49:56.272261: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:49:56.272265: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:49:56.326476: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:49:56.326480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:49:56.326482: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:49:56.326490: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:49:56.326476: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:49:56.326476: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:49:56.326484: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:49:56.326487: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:49:56.326928: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:49:56.326930: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:49:56.326936: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:49:56.326939: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:49:56.326942: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:49:56.326943: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:49:56.326945: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:49:56.326949: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:49:56.353995: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:49:56.354002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:49:56.354013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:49:56.354007: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:49:56.354014: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:49:56.354023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:49:56.354021: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:49:56.354013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:49:56.354416: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:49:56.354419: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:49:56.354425: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:49:56.354425: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:49:56.354429: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:49:56.354427: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:49:56.354431: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:49:56.354433: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:49:56.357226: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:49:56.357220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:49:56.357229: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:49:56.357224: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:49:56.357231: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:49:56.357229: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:49:56.357235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:49:56.357227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:49:56.357571: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:49:56.357570: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:49:56.357576: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:49:56.357578: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:49:56.357583: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:49:56.357591: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:49:56.357592: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:49:56.357594: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:49:56.361221: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:49:56.361227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:49:56.361229: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:49:56.361230: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:49:56.361234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:49:56.361221: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:49:56.361239: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:49:56.361234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:49:56.361502: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:49:56.361508: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:49:56.361510: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:49:56.361511: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:49:56.361511: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:49:56.361513: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:49:56.361515: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:49:56.361517: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:49:56.603670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:49:56.603660: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:49:56.603676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:49:56.603673: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:49:56.603670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:49:56.603679: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:49:56.603675: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:49:56.603675: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:49:56.604111: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:49:56.604112: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:49:56.604117: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:49:56.604119: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:49:56.604122: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:49:56.604123: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:49:56.604126: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:49:56.604129: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:50:00.550463: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 00:50:00.550445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:50:00.550472: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.550553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.550451: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 00:50:00.550469: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.550634: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 00:50:00.550620: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.550557: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.550647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.550450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 00:50:00.550475: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.550638: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.550617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.550567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.550652: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.550446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 00:50:00.550478: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.550647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.550626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.550565: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.550657: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.550452: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 00:50:00.550483: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.550644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.550626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.550568: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.550658: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.550459: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 00:50:00.550483: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.550650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.550634: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.550563: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.550661: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.550459: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 00:50:00.550487: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.550644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.550637: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.550569: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.550664: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.550454: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.550647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.550638: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.550571: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.550667: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:50:00.550647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.550638: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.550668: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.552336: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.552337: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:50:00.552518: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 00:50:00.552338: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:50:00.552518: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 00:50:00.552336: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:50:00.552518: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 00:50:00.552340: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.552341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.552343: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.552355: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:50:00.552356: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:50:00.552523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 00:50:00.552356: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:50:00.552358: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:50:00.552359: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.552360: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:50:00.552362: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:50:00.552523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 00:50:00.552369: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 00:50:00.552713: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:50:00.552384: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:50:00.552753: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:50:00.552526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 00:50:00.552716: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 00:50:00.552735: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:50:00.552525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 00:50:00.552757: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 00:50:00.552715: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:50:00.552533: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:50:00.552536: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.552747: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.552536: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:50:00.552542: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:50:00.552541: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:50:00.552757: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-17 00:50:00.552845: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 00:50:00.552720: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:50:00.552541: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:50:00.552543: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.552743: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:50:00.552569: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 00:50:00.552759: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.552720: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:50:00.552582: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:50:00.552767: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:50:00.552846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.552745: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.552763: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.552722: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.552846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.552752: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.552764: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.552723: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.552852: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.552730: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:50:00.552730: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.552759: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.552771: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:50:00.552775: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:50:00.552774: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.552732: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:50:00.552738: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:50:00.552739: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:50:00.552779: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:50:00.552779: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:50:00.552852: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 00:50:00.552738: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:50:00.552741: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.552759: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.552798: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.552777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.552854: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:50:00.552789: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.552765: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 00:50:00.552803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.552862: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.552867: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:50:00.552870: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:50:00.552873: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:50:00.552811: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:50:00.552816: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:50:00.552873: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:50:00.552870: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:50:00.552911: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.552912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:50:00.552931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:50:00.552933: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.554626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.554626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.554626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.554628: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.554627: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.554630: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.554630: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.554642: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.554642: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.554646: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.554645: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.554649: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.554647: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.554648: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:50:00.554673: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:50:00.554685: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:50:00.573604: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.573612: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.573610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.573616: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.573618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.573619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.573614: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.573621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.575721: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.575735: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:50:00.575793: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.575800: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.575802: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.575803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.575812: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:50:00.575813: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.575819: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:50:00.575820: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:50:00.575821: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:50:00.575826: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:50:00.575870: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.575874: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:50:00.575881: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:50:00.575886: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:50:00.552452: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:50:00.552453: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:50:00.552466: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:50:00.552468: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:50:00.552534: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:50:00.552542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:50:00.552548: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:50:00.552557: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:50:00.552551: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:50:00.552571: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:50:00.552618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:50:00.552621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:50:00.552626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:50:00.552632: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:50:00.552633: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:50:00.552638: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +6: Successfully preprocessed all matching files. +6: Successfully preprocessed all matching files. +6: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Building extension module utils... +5: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils...Loading extension module utils... +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils...Loading extension module utils... +6: +6: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils...Loading extension module utils... +6: +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +3: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +3: Loading extension module utils... +3: +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils... +2: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m7b5100mdedup/3327350.out b/146m7b5100mdedup/3327350.out new file mode 100644 index 0000000000000000000000000000000000000000..6deb323ce6b5739f6a06183f8eb982d31cd2c0f7 --- /dev/null +++ b/146m7b5100mdedup/3327350.out @@ -0,0 +1,5664 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m7b5100mdedupval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m7b5100mdedupval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m7b5100mdedup --load checkpoints_146m7b5100mdedup --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3327350.json --zero-stage 0 +START 3327350: Fri 17 Mar 2023 12:49:32 AM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 49.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 40.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 47.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 45.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 44.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 42.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 43.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 46.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 46.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 47.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 47.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 45.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 46.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 45.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 39.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 48.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 45.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 49.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 43.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 48.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 39.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 46.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 42.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +0: Launching on nid006861 (0/8), master nid006861 port 9999, GPUs 8, CUDA: True +5: Launching on nid006866 (5/8), master nid006861 port 9999, GPUs 8, CUDA: True +3: Launching on nid006864 (3/8), master nid006861 port 9999, GPUs 8, CUDA: True +2: Launching on nid006863 (2/8), master nid006861 port 9999, GPUs 8, CUDA: True +4: Launching on nid006865 (4/8), master nid006861 port 9999, GPUs 8, CUDA: True +6: Launching on nid006867 (6/8), master nid006861 port 9999, GPUs 8, CUDA: True +1: Launching on nid006862 (1/8), master nid006861 port 9999, GPUs 8, CUDA: True +7: Launching on nid006868 (7/8), master nid006861 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3327350.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m7b5100mdedupval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m7b5100mdedup +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m7b5100mdedup +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m7b5100mdedupval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-17 00:50:19,244] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.106 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.o scaled_upper_triang_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: [1/1] c++ layer_norm_cuda.o layer_norm_hip_kernel.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so +0: >>> done with compiling and loading fused kernels. Compilation time: 25.176 seconds +0: time to initialize megatron (seconds): 71.441 +0: [after megatron is initialized] datetime: 2023-03-17 00:50:47 +0: building GPT model ... +0: [2023-03-17 00:50:47,577] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-17 00:50:47,578] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-17 00:50:47,578] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.2 GB, percent = 6.2% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-17 00:50:49,557] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-17 00:50:49,809] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-17 00:50:49,809] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-17 00:50:49,810] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.22 GB, percent = 6.2% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-17 00:50:49,811] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-17 00:51:03,280] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-17 00:51:03,281] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-17 00:51:03,281] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-17 00:51:03,285] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-17 00:51:03,285] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-17 00:51:03,401] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-17 00:51:03,401] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 00:51:03,401] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.91 GB, percent = 6.3% +5: ninja: no work to do. +5: Time to load utils op: 0.17251133918762207 seconds +0: Time to load utils op: 0.10943937301635742 seconds +0: Time to load utils op: 0.10249066352844238 seconds +0: Time to load utils op: 0.10177779197692871 seconds +0: Time to load utils op: 0.10187625885009766 seconds +0: Time to load utils op: 0.1019141674041748 seconds +5: Time to load utils op: 0.10236406326293945 secondsTime to load utils op: 0.1021881103515625 seconds +5: +5: Time to load utils op: 0.1027829647064209 seconds +5: Time to load utils op: 0.10249447822570801 seconds +5: Time to load utils op: 0.10177874565124512 seconds +5: Time to load utils op: 0.10201382637023926 seconds +5: Time to load utils op: 0.10212206840515137 seconds +0: Time to load utils op: 0.10223555564880371 seconds +0: Time to load utils op: 0.10274291038513184 seconds +0: Time to load utils op: 0.10235762596130371 seconds +1: Time to load utils op: 0.11081695556640625 seconds +1: Time to load utils op: 0.11139225959777832 seconds +1: Time to load utils op: 0.11128997802734375 secondsTime to load utils op: 0.11085295677185059 seconds +1: +1: Time to load utils op: 0.11082768440246582 seconds +1: Time to load utils op: 0.11084222793579102 seconds +1: Time to load utils op: 0.11085128784179688 seconds +1: Time to load utils op: 0.11095023155212402 seconds +6: Time to load utils op: 0.10995173454284668 seconds +6: Time to load utils op: 0.11031746864318848 seconds +6: Time to load utils op: 0.11069297790527344 seconds +6: Time to load utils op: 0.11000776290893555 seconds +6: Time to load utils op: 0.10953283309936523 seconds +6: Time to load utils op: 0.11069869995117188 seconds +6: Time to load utils op: 0.10931849479675293 seconds +6: Time to load utils op: 0.10964512825012207 seconds +3: Time to load utils op: 0.11096882820129395 seconds +3: Time to load utils op: 0.11097574234008789 seconds +3: Time to load utils op: 0.11098694801330566 secondsTime to load utils op: 0.11101961135864258 secondsTime to load utils op: 0.11102557182312012 seconds +3: +3: +3: Time to load utils op: 0.11103200912475586 seconds +3: Time to load utils op: 0.11100363731384277 seconds +3: Time to load utils op: 0.11104583740234375 seconds +2: Time to load utils op: 0.11161208152770996 secondsTime to load utils op: 0.11161518096923828 secondsTime to load utils op: 0.11157894134521484 seconds +2: +2: +2: Time to load utils op: 0.11162471771240234 seconds +2: Time to load utils op: 0.11160492897033691 seconds +2: Time to load utils op: 0.1116325855255127 secondsTime to load utils op: 0.11161231994628906 seconds +2: +2: Time to load utils op: 0.11164188385009766 seconds +4: Time to load utils op: 0.1106557846069336 seconds +4: Time to load utils op: 0.11066079139709473 secondsTime to load utils op: 0.1106569766998291 seconds +4: Time to load utils op: 0.11066794395446777 seconds +4: +4: Time to load utils op: 0.110687255859375 seconds +4: Time to load utils op: 0.11069107055664062 seconds +4: Time to load utils op: 0.11070442199707031 seconds +4: Time to load utils op: 0.11041140556335449 seconds +7: Time to load utils op: 0.11036396026611328 seconds +7: Time to load utils op: 0.11037111282348633 seconds +7: Time to load utils op: 0.11040043830871582 seconds +7: Time to load utils op: 0.11040949821472168 seconds +7: Time to load utils op: 0.11042189598083496 seconds +7: Time to load utils op: 0.11043119430541992 seconds +7: Time to load utils op: 0.11043095588684082 seconds +7: Time to load utils op: 0.1103522777557373 seconds +5: Time to load utils op: 0.0005464553833007812 seconds +5: Time to load utils op: 0.0005042552947998047 seconds +5: Time to load utils op: 0.0005230903625488281 seconds +5: Time to load utils op: 0.0005474090576171875 seconds +5: Time to load utils op: 0.0005705356597900391 seconds +5: Time to load utils op: 0.0005953311920166016 secondsTime to load utils op: 0.0005786418914794922 seconds +5: +5: Time to load utils op: 0.0004928112030029297 seconds +0: Time to load utils op: 0.0005650520324707031 secondsTime to load utils op: 0.0004143714904785156 seconds +0: +0: Time to load utils op: 0.00060272216796875 seconds +0: Time to load utils op: 0.0004355907440185547 seconds +0: Time to load utils op: 0.0004291534423828125 seconds +0: Time to load utils op: 0.00044226646423339844 seconds +0: Time to load utils op: 0.0005173683166503906 seconds +1: Time to load utils op: 0.0009410381317138672 seconds +1: Time to load utils op: 0.001233816146850586 seconds +1: Time to load utils op: 0.001277923583984375 secondsTime to load utils op: 0.0011916160583496094 seconds +1: +1: Time to load utils op: 0.0012807846069335938 seconds +1: Time to load utils op: 0.001316070556640625 seconds +1: Time to load utils op: 0.0011997222900390625 seconds +1: Time to load utils op: 0.0012919902801513672 seconds +6: Time to load utils op: 0.0009381771087646484 seconds +6: Time to load utils op: 0.0008814334869384766 seconds +6: Time to load utils op: 0.001024007797241211 seconds +6: Time to load utils op: 0.0011775493621826172 seconds +6: Time to load utils op: 0.0011017322540283203 seconds +6: Time to load utils op: 0.001096963882446289 secondsTime to load utils op: 0.0010645389556884766 seconds +6: +6: Time to load utils op: 0.0012099742889404297 seconds +3: Time to load utils op: 0.0009431838989257812 seconds +3: Time to load utils op: 0.001005411148071289 seconds +3: Time to load utils op: 0.0009517669677734375 seconds +3: Time to load utils op: 0.0010340213775634766 seconds +3: Time to load utils op: 0.0011794567108154297 secondsTime to load utils op: 0.0011005401611328125 seconds +3: +3: Time to load utils op: 0.0011446475982666016 seconds +3: Time to load utils op: 0.0012865066528320312 seconds +4: Time to load utils op: 0.0007886886596679688 seconds +4: Time to load utils op: 0.0010972023010253906 secondsTime to load utils op: 0.0012645721435546875 seconds +4: +4: Time to load utils op: 0.0011942386627197266 seconds +4: Time to load utils op: 0.0010917186737060547 seconds +4: Time to load utils op: 0.001142263412475586 seconds +4: Time to load utils op: 0.0010988712310791016 seconds +4: Time to load utils op: 0.0011150836944580078 seconds +2: Time to load utils op: 0.0006873607635498047 seconds +2: Time to load utils op: 0.0008585453033447266 seconds +2: Time to load utils op: 0.0011589527130126953 seconds +2: Time to load utils op: 0.001068115234375 seconds +2: Time to load utils op: 0.001077413558959961 seconds +2: Time to load utils op: 0.0010509490966796875 seconds +2: Time to load utils op: 0.0011038780212402344 seconds +2: Time to load utils op: 0.0011332035064697266 seconds +7: Time to load utils op: 0.00091552734375 seconds +7: Time to load utils op: 0.0008945465087890625 seconds +7: Time to load utils op: 0.0008785724639892578 seconds +7: Time to load utils op: 0.001207113265991211 seconds +7: Time to load utils op: 0.0012691020965576172 seconds +7: Time to load utils op: 0.0012552738189697266 seconds +7: Time to load utils op: 0.0011949539184570312 seconds +7: Time to load utils op: 0.0012378692626953125 seconds +0: [2023-03-17 00:51:03,623] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-17 00:51:03,624] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 00:51:03,624] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.06 GB, percent = 6.4% +0: [2023-03-17 00:51:03,735] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-17 00:51:03,735] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 00:51:03,735] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.06 GB, percent = 6.4% +0: [2023-03-17 00:51:03,836] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-17 00:51:03,836] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 00:51:03,836] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.06 GB, percent = 6.4% +0: [2023-03-17 00:51:03,938] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-17 00:51:03,938] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 00:51:03,939] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.06 GB, percent = 6.4% +0: [2023-03-17 00:51:04,038] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-17 00:51:04,039] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 00:51:04,039] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.06 GB, percent = 6.4% +0: [2023-03-17 00:51:04,141] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-17 00:51:04,141] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 00:51:04,141] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.06 GB, percent = 6.4% +0: [2023-03-17 00:51:04,241] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-17 00:51:04,241] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 00:51:04,241] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.06 GB, percent = 6.4% +0: [2023-03-17 00:51:04,347] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-17 00:51:04,347] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 00:51:04,347] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.06 GB, percent = 6.4% +0: [2023-03-17 00:51:04,447] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-17 00:51:04,448] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 00:51:04,448] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.06 GB, percent = 6.4% +0: [2023-03-17 00:51:04,448] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-17 00:51:04,448] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-17 00:51:04,448] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-17 00:51:04,448] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-17 00:51:04,448] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-17 00:51:04,449] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-17 00:51:04,450] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-17 00:51:04,451] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-17 00:51:04,451] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-17 00:51:04,451] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-17 00:51:04,451] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.000415802001953125 seconds +0: [2023-03-17 00:51:04,451] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-17 00:51:04,461] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +4: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-17 00:51:04,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:51:04,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:51:04,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:51:04,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:51:04,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:51:04,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:51:04,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:51:04,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:51:04,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:51:04,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:51:04,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:51:04,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:51:04,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:51:04,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:51:04,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:51:04,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:51:04,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:51:04,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:51:04,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:51:04,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:51:04,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:51:04,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:51:04,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:51:04,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:51:04,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:51:04,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:51:04,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:51:04,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:51:04,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:04,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:51:04,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:04,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:51:04,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:04,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:04,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:04,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:04,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:05,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:51:05,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:04,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:51:05,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:04,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:04,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:05,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:05,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:05,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:05,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:51:05,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:51:05,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:51:05,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:51:05,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:51:05,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:51:05,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:51:05,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:51:05,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:51:05,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:51:05,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:51:05,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:51:05,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:51:05,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:51:05,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:51:05,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:51:05,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:51:05,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:51:05,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:51:05,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:51:05,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:51:05,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:51:05,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:51:05,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:51:05,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:51:05,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:51:05,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:51:05,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:51:05,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:51:05,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:51:05,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:51:05,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:51:05,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:51:05,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:51:05,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:51:05,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:51:05,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:51:05,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:51:05,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:51:05,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:51:05,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:51:05,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:51:05,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:51:05,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:51:05,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:51:05,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:51:05,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:51:05,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:51:05,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:51:05,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:51:05,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:51:05,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:51:05,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:51:05,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:51:05,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:51:05,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:51:05,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:51:05,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:51:05,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:51:05,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:51:05,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:51:05,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:05,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:05,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:05,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:05,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:05,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:05,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:05,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:05,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:05,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:51:05,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:51:05,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:51:05,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:05,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:51:05,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:05,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:05,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:05,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:06,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:51:06,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:51:06,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:51:06,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:51:06,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:51:06,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:51:06,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:51:06,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:51:06,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:51:06,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +5: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:51:06,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:51:06,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:51:06,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:51:06,190] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +7: [2023-03-17 00:51:06,192] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +5: [2023-03-17 00:51:06,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:51:06,199] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +5: [2023-03-17 00:51:06,201] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +2: [2023-03-17 00:51:06,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:51:06,205] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +6: [2023-03-17 00:51:06,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:51:06,206] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +2: [2023-03-17 00:51:06,207] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +6: [2023-03-17 00:51:06,208] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +3: [2023-03-17 00:51:06,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:51:06,212] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +3: [2023-03-17 00:51:06,214] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +1: [2023-03-17 00:51:06,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:51:06,215] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +1: [2023-03-17 00:51:06,217] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +0: [2023-03-17 00:51:06,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:51:06,221] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +4: [2023-03-17 00:51:06,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:51:06,222] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +0: [2023-03-17 00:51:06,223] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +4: [2023-03-17 00:51:06,223] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +6: [2023-03-17 00:51:06,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:51:06,226] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +0: [2023-03-17 00:51:06,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:51:06,226] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +5: [2023-03-17 00:51:06,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:51:06,228] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +0: [2023-03-17 00:51:06,228] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +6: [2023-03-17 00:51:06,228] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +5: [2023-03-17 00:51:06,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:51:06,229] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +3: [2023-03-17 00:51:06,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:51:06,229] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +3: [2023-03-17 00:51:06,229] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +5: [2023-03-17 00:51:06,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:51:06,231] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +4: [2023-03-17 00:51:06,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:51:06,231] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +5: [2023-03-17 00:51:06,231] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +3: [2023-03-17 00:51:06,231] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +7: [2023-03-17 00:51:06,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:51:06,232] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +5: [2023-03-17 00:51:06,232] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +4: [2023-03-17 00:51:06,233] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +7: [2023-03-17 00:51:06,233] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +7: [2023-03-17 00:51:06,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:51:06,236] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +7: [2023-03-17 00:51:06,238] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +7: [2023-03-17 00:51:06,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:51:06,247] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +7: [2023-03-17 00:51:06,248] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +6: [2023-03-17 00:51:06,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:51:06,250] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +6: [2023-03-17 00:51:06,252] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +2: [2023-03-17 00:51:06,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:51:06,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:51:06,254] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +2: [2023-03-17 00:51:06,254] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +2: [2023-03-17 00:51:06,255] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +6: [2023-03-17 00:51:06,255] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +3: [2023-03-17 00:51:06,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:51:06,256] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +3: [2023-03-17 00:51:06,258] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +1: [2023-03-17 00:51:06,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:51:06,258] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +6: [2023-03-17 00:51:06,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:51:06,259] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +1: [2023-03-17 00:51:06,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:51:06,259] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +2: [2023-03-17 00:51:06,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:51:06,259] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +1: [2023-03-17 00:51:06,260] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +6: [2023-03-17 00:51:06,260] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +1: [2023-03-17 00:51:06,261] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +2: [2023-03-17 00:51:06,261] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +6: [2023-03-17 00:51:06,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:51:06,264] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +0: [2023-03-17 00:51:06,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:51:06,264] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +6: [2023-03-17 00:51:06,265] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +0: [2023-03-17 00:51:06,266] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +2: [2023-03-17 00:51:06,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:51:06,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:51:06,269] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +1: [2023-03-17 00:51:06,269] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +1: [2023-03-17 00:51:06,271] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +3: [2023-03-17 00:51:06,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:51:06,271] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +3: [2023-03-17 00:51:06,271] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +0: [2023-03-17 00:51:06,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:51:06,272] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +3: [2023-03-17 00:51:06,273] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +5: [2023-03-17 00:51:06,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:51:06,273] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +0: [2023-03-17 00:51:06,274] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +0: [2023-03-17 00:51:06,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:51:06,274] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +5: [2023-03-17 00:51:06,275] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +0: [2023-03-17 00:51:06,276] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +7: [2023-03-17 00:51:06,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:51:06,276] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +7: [2023-03-17 00:51:06,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:51:06,276] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +5: [2023-03-17 00:51:06,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:51:06,277] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +7: [2023-03-17 00:51:06,278] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +7: [2023-03-17 00:51:06,278] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +5: [2023-03-17 00:51:06,278] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +0: [2023-03-17 00:51:06,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:51:06,279] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +4: [2023-03-17 00:51:06,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:51:06,280] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +0: [2023-03-17 00:51:06,281] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +4: [2023-03-17 00:51:06,282] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +5: [2023-03-17 00:51:06,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:51:06,283] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +4: [2023-03-17 00:51:06,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:51:06,283] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +5: [2023-03-17 00:51:06,284] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +4: [2023-03-17 00:51:06,285] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +4: [2023-03-17 00:51:06,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:51:06,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:51:06,285] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +3: [2023-03-17 00:51:06,285] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +4: [2023-03-17 00:51:06,287] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +3: [2023-03-17 00:51:06,287] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +4: [2023-03-17 00:51:06,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:51:06,288] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +1: [2023-03-17 00:51:06,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:51:06,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:51:06,289] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +2: [2023-03-17 00:51:06,289] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +4: [2023-03-17 00:51:06,290] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +1: [2023-03-17 00:51:06,291] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +2: [2023-03-17 00:51:06,291] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +3: [2023-03-17 00:51:06,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:51:06,291] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +3: [2023-03-17 00:51:06,293] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +1: [2023-03-17 00:51:06,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:51:06,295] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +6: [2023-03-17 00:51:06,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:51:06,296] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +1: [2023-03-17 00:51:06,296] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +1: [2023-03-17 00:51:06,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:51:06,297] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +6: [2023-03-17 00:51:06,298] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +1: [2023-03-17 00:51:06,299] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +6: [2023-03-17 00:51:06,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:51:06,300] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +2: [2023-03-17 00:51:06,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:51:06,302] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +6: [2023-03-17 00:51:06,302] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +3: [2023-03-17 00:51:06,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:51:06,302] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +0: [2023-03-17 00:51:06,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:51:06,303] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +2: [2023-03-17 00:51:06,303] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +3: [2023-03-17 00:51:06,304] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +0: [2023-03-17 00:51:06,304] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +0: [2023-03-17 00:51:06,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:51:06,306] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +0: [2023-03-17 00:51:06,307] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +1: [2023-03-17 00:51:06,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:51:06,309] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +1: [2023-03-17 00:51:06,310] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +7: [2023-03-17 00:51:06,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:51:06,312] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +2: [2023-03-17 00:51:06,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:51:06,313] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +4: [2023-03-17 00:51:06,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:51:06,313] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +2: [2023-03-17 00:51:06,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:51:06,314] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +2: [2023-03-17 00:51:06,314] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +2: [2023-03-17 00:51:06,315] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +4: [2023-03-17 00:51:06,315] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +2: [2023-03-17 00:51:06,316] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +3: [2023-03-17 00:51:06,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:51:06,319] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +4: [2023-03-17 00:51:06,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:51:06,320] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +3: [2023-03-17 00:51:06,320] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +4: [2023-03-17 00:51:06,322] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +5: [2023-03-17 00:51:06,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:51:06,360] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +5: [2023-03-17 00:51:06,362] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +7: [2023-03-17 00:51:06,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:51:06,446] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +7: [2023-03-17 00:51:06,447] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +0: successfully loaded checkpoint from checkpoints_146m7b5100mdedup at iteration 0 +7: time (ms) | load-checkpoint: 1986.71 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-17 00:51:06 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.038411 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.092 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.033805 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.077 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-17 00:51:20 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 19300.29 | train/valid/test-data-iterators-setup: 13085.95 +0: [after training is done] datetime: 2023-03-17 00:51:20 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.883653E+00 | lm loss PPL: 4.860145E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3327350: Fri 17 Mar 2023 12:51:42 AM EET diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f3cc5936a06b173bc8f9bb8ac757acbdbedfa21f --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b9aafcce3614b537ec02beb67b45a18ec61ee0bf634160f6a5865a221f885cc +size 27478295 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a5532437db203fe78bd2e6c8136f4eb30b860e27 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:81b20148ebd4bddc891bb87d09996943aaaad24ad546fd1f29e034b1d3b9838f +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f1d2ec198448b75115c142cb90b35ca969fc92a1 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:20ac445e4e34af3f89a8dcc7c3896684bc79b65f01319e2702d5fa9d86d80692 +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..260ab00c5dc6481e5971c826224f7ccd77fbaed6 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e5f7b9f63e6808c99333aa0fe124b7bb51fc98019265f31da3e48ed21cca714d +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b902620835adbbba5300b8dd4e582d2f3ba77774 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0268c7d16ca369f2d69e8ad03d6e2820db415b4928f899b81e037d51c2f13b67 +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7eb83a5e2f2773831f819b34b7704f5d6d345ffa --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0dc8fffbbc495ceb3161b59ddc63203eded94d4f043435ca3a51563a3e860429 +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7b0dbd42d1b0ee22663043083a05371ed63f8b97 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02bfed04f4bf2a03e71c38350414719cf817a7d19b2519ce95197a491550b916 +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..966b6fc8a77aebb563f447dedbd749a1b91a239a --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4bd2d9226fc3c640b3d64f9c131361d9b04a9dd3e90c676450d40658523f92a0 +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..288bb7f33e89282ab30388bd042c625ba7e0366a --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80f5b857604d31c98df28997214f623173217e7aa432fe963789776dfad262be +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7d285467b822d1eee2df6ffb9ed2362cea92d1a --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b85e23bdbe7869c5785a7c540af891e29f775d093b59a83e2113272cbee7eef +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..999df5354280fc6b362e45cad2720d66ad489da7 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9990d69d413bc9b20c76b21063fa6a1a4af492244833c72d29cac32f1f4ff0a2 +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d91b70c3c569bbf82ba4675106bb1acad3eef2e --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26c7755ab5b31b94c2dbdb2c7ff10634cebfe70f7bef55d0bf7ca1fe92065dcb +size 27478231 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d54178e3d6015c9242c28ab8b975fa5c0913da29 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:548da1822c9d0fcd247d338865cb4c238c3d9d1021b35051d8fd6019c2e73e9d +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3aa3346761bef4526c6fb00ff543bc650f0d17d4 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:221eec0a235d2691d35d591e5064aeab9028c2d546f743ef0662028595d0a5cf +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2641448ec3e0b05210ff0e2b29bcf7ce8c501c8c --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c406395e3e14851637e8e729fec36c493d7d4fcba138a61bfc78019227115d6b +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..92057f701fddeefd34e331785c0c70a84ec9121e --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:648c99f726ce9cec9928d45dc292449a28ea1bbd2d836df47f35f13d2c3d6de4 +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a98cbc5b15804e6baa97f43f20f3fde519d2ce8a --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5b9aaf3ef6c67b73878221aaf3737a37125a95dd6f57af44c1c70aea5ff95c5 +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..957953699efcc64521194f3e11f02ef829266475 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:64f0e2ee2141f06ed3607200e37ca5f9e15eb648dd01c3194a910f77a16b0bf8 +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5f80758d60428fb130831d4562f41e648d78e9e --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0822db07a865e72fc7ef2798176394583dcb4e5ab97ca67cc4d9fe772b381762 +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b6a3d7c988c88dab7468be27b9a0d184a31eaaa --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3aaaa66b13609fbf169bd57eccb01588fc95ae7c311166f940d3dd7b885905a5 +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d267471bd97ceade720d2f9b044586d9ff24272 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db23b9c911ec0cbb573dc98224b07807c98e08ee2bc68c9d034ffcfac29db41f +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e581906735e5887b2698140b87444e68d4a44a18 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:60e9975cf63df458c7d4d89b65f320a0489710ac82c48fa65aca4130ece6a677 +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76fd2b3e0fbfa8176f376fcde12222468757ac9b --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19dc5c4e3da91b17caee0ec5e49486a613ceb85ae351129506385c771b747f62 +size 27478231 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a2879f2067ba2b7c70f250a3ee1b8dc731066faa --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69d1ec94e82e12ad841a92822a41b5311b20fefa5a542e2991caff0be3ccb613 +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1a20b08d9ad017135a4bba7e4709d5e3de16b658 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0baa3e4315daf911df7a3d5aa3e42e3f6301488f3a6b57d75e52fb07bc4ded28 +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..03828270f869f0bf22ab59db724eb9b57d6d2680 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b64723a3e31f68490a773e54ed006eecb0f7535f683e42f4bb2e00ebb94beef7 +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c28c57300a5983c1ea975516da8ff45534dab84b --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b7fc39bc5f6862520e37359564f145c412b85714f9a65d10bdcb01d0cf9a8d5a +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4461de366de871c6b95d633e74a8c52a77397e4d --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:977357f02b5ed47cbf1e6f4bbac0d11b459cb82595d3ebc2a3a6f099ea8aab58 +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f5322a3276c91628bc164475c33503c636b499e0 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd119bbbbe9850004f65f71e40a5a1e8d904f3781206c553b1b2a4e06dc245af +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aab58750546fbfd11e9852b87ca66ef4d8408aac --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:116dd7916ea80f9c34a5f546f248acf3f9b2e690a58196e542409f675212c74c +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d4b04b062655c340ea1b9c524d48773d5d1b0125 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7658b7ac9092e141226efae77d51813c62f01b0a0ca592763fc96dc3ee2c6734 +size 27478114 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..936504a6fbb4eef81661a7640293911ae603c699 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a50608e5bb83be3a66ebc942d47bab5ca35cda4936252fcb788252f7dcd2991 +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..597a226b1ab4f36a2395a44bde8529e760118653 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce64ce0a00d4aa828090d7793adcde123db589d6ae288d20a802f3e1c535e772 +size 27478434 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ebaf5f5c73409f33a46ee3ac79f4e3846c93b277 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91b8bf66a25e5c037a7567a6f2ae8d5563caf1d08d897fffcf5e63b6d440cea0 +size 27478167 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..da42778e1b4eecf5cd1efa4f63f88557f3859957 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ae38de6f77a6f234cac0469a12647913966ca964bde81d898cfc5c4c311fb96 +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..78da50a9ca90682ad62c36141666b726b3365596 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:44c3d4592f2df5224d744e756d7ea45b10cb81e79e72d408cd12011313c23cf7 +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6cb61c0261d901e3d00e1b54511e11de2d97c8de --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b607313fe3e556eb188e7ee303e6aa3b7bc0f8539c0d0fd7c7511c6825b29077 +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1756c43a9655d494c7e888d399137fa1cb184d97 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c77e8eacf56c24398a384857a2395adeaf94e6c3aae6f305520b285f3122fb0 +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..89004f5fc90c2438acccbc0b123855fad8564eba --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd032b6f6a34ec36d2a760bf26e174e4c954cd9c68c864911a500627381c5e25 +size 27478434 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..33bb119bd24d4cc6703a92bd626b67ce532e80c5 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f174196cc60db5446e3535e9b0dce2c674331af14b91bc608ff4171e168e0711 +size 27478050 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4fb236c1d3d0337c4c7890a056bde5351d2fa827 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1319d4472576715aee3fff94d43e468bd6a4ab314fad99dc4b20435cb33eac67 +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1550cce7834ccf08bdbf9f5d6dfc42763971c097 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a532fcb1077ff7d6157bf47cb425a83ceb005a09248aafe34955752e8d15fb8d +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..091650147c47b93c1b3ad4c20d6cbc363e1b5a78 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:16ff6839b1688d34f57c104236eb740ab6dc792a792dd5df5b716dcbdedb7ccc +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3210f72561473913d2007b6b00d1a54e1677fdb --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:980204012b23073a07445ae8aebd14f0e0147f249b248d992bafa03e00b27884 +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5d643d61a293b42a0a53fd50d1630e9b4aef2d71 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a508ba4a7cd3b58f533f5af2682933326455b014e6c35aea4623e447ce270f72 +size 27478231 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d4a6a39cd16f0ab30bac5d67fa81e2d6978eb1b5 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c00e407d68c488180f23e08f3ad452b8fd1074b7539431ab12451150a273e1ca +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..13ab69d447ccf82263999c608f4c8d8bfb230bdc --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:04921d669c1cc1b116fec0c229c9295803d03275a7108684575ee96c0075e07a +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..344217054af43e3a10b6554fce99dc76f4939952 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7a076406b23dabbabca2e9ec4ce12733d7954e8403237857cad4b4f9fc1dac9f +size 27478434 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..234a147869f57c9692cd08076447cad3bada8b7c --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65177a60302649ebbc4376d3b53d680f6fdc224b36769ef52ac83f3a7695724a +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..98dc386b58a84acd184b1c1aab203643d98ddfc0 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8fa86150515d1ac15a53288a6997d73afa7fe2267ab80c763aa111e4f8937f5c +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..94a8e77f327003416a642f9c30a23633118160e2 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1634eec0426b274f970e5b62fc6670a325dd305378e1de5912adb59e6389db7d +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e67ebae4df18e4120032f0ccc79b6bf1fb41337 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91206ea59d853be1b8a76b02b92162b83347c10eaef049d3baa5bc6b504509fd +size 27478306 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dc314b4d17d0656cba32a41726affb6b19e46bbc --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:014c60165c4465ca8846676d20718b5df5f517aa8152e475870bf724a6588ef0 +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2ee3c9aa0737d7d788574e4513778561025c9950 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:671275caa849dc7a5b11f5ea19ea61498708194f01911cbedfc895fded4bc1fc +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6c2811e94c2cf4aa2f33ced9b13eccd564ef50d2 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4d8b751e625ff2e541cdd40930843b348aec4029eba31e5386df9fc414513f89 +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5ecac8d4efa4c892c28813a1bc0594fbb5991baa --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3659e6ae9f40a571bcf6048f2f90e1ed3c821cf5ef25379f44c4926357bddde7 +size 27478167 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..599fa23281a8da4f771456bf32d28093028ff247 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:940dfb8e4b9a3ebbd8ab8768a46b61c6ffc8dbdb23fb34c3b24fc83604deef00 +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..443e621248811709185542551c7ff97bedc05678 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4be6aa7fb9202ffade8e3c63f7e5c9a07d832c8e7605ca9b0f097f9e92f086c9 +size 27478370 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cddeca1fe3051c59304cb4b55c4a5204f6ea1e79 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d9b437a580216709b767d9c9251ea11a7b4203e74347519eaaaee79571caddc +size 27478178 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..94b2f0e21cfe0ef579fb58fc625e44e1a02f8686 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e18dbf99d409b41050789eaee53aa9fdcce9d2c6d281279b79894995ef8315fb +size 27478242 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dde22777cbc339335fc1eaf01123544e0722a37a --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:295f4d888f37232141248a382c5989eefd25df1f5601e63e1e5c449232800354 +size 27478359 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..03371cf288aea44180e1cba47b9ea227a97191d4 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9cb241e0c751b49f624a072ab44d75fd45801a590444687ed392b6d1edcdd12c +size 27478103 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..326f4641dc42f4f30454ef86e537e416e2aa5e34 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:136611ead743dfb913c9cd45c5af7879fce2415a2ce36f75262d80db2eeb823c +size 27478359 diff --git a/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9ffc8aa441374951865c97577dd4f5f5e94ccf00 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb02ef5c7b5b1122646c5956265f8513732da9a042286b7526c45bb7d6454fad +size 27478167 diff --git a/146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ac805177d9dc8383c4b1c94d475b4b0f8fbf5864 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9f0f9d8e44ba3bcb626b73656b65459ec819d243f30d8360d654ed9d66f462d +size 80413955 diff --git a/146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6f78e3e25a02e9cfc036a6df64f915cc70414a44 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9968d3452c2a3860f6643ad7444a0f19931918e4353395d4d960634dbe1ce9f4 +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2bb091dcae788f665a77859f4dba5f05d77e7e5d --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb6e9932bade5238cada611c46850f7746dfaa0fa72eaa49df6f7a592962230e +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e509c2004833ebd8133f56f8e49386241d885a0a --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4ed187671b4bd904db6194f8550a40ce5684cdab75a37e4aa19ce037f2b2cea +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..03e91c8ab9baa7be94e8237c3a785cb5d8d4ec28 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97dc401409a6c634872e5ded635913be919b35394545096efc3130db7361c09b +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e399fab8a0d5cad8902298135d2a53ef9c1fb03e --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fae5f88e2344d254fc6b8b74097c3908bcb8f7a2ffc51cbe8eb28cd8d61675e2 +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6a8116598d9cb25aa2c9944fcf54018acb5c700f --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ad7bb102e4c5b723fcdcc18a1e5f875bbe5239ce8fae797afe8cb0b099776a1 +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c2f46511e6f4b905d50d441edbeb2d6e6c96ed7 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0a406f8327273c91dc201f41ffca0deecc817b475aa65fe9346a10e7644f7097 +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..33210c7de27b85771ea144d4eb39bd15a7bd7460 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37df9437bda57163c0b35e4b15345e5c874a0dc2f5b37dfc0f169bc86d3eaf60 +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d97384719e6c68fa6a7b24c30f5bf9ebda64cc3 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aa6b1149df6d2d940a42de2ce9950644e02ab84fdb5a65686c78799cd2f59b3e +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c77ddcba91337df97f16ec3fefd8e572b34ce346 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:07c18efed12cc8afc05cc37a39300c6074f830ef97de4ed9d61ab986b7cb8aec +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..61369f26187c149f9b48fdaeb1853ad0d94ae323 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:30e80d892bf815afc919f36ea2017289ddbbdbab83cc6402b0b7d8c4751f09d4 +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..815eaa061aecab572964ad506cb0251e4a89ce75 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72c25ee8b26be6ee4fc4f20c65f0a3d0392f00357a3923f7e957f696023c21a9 +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..46e5b833734dcc31c79fb688752a06839bea328f --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9af55079f5c66da82404ed578505be43deaed207b8847c4a8b36db50f9ca109d +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..500711e1eedb1a06619725b8b429c3b52bac110b --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:859e57395d8ad78034961a05722fa00960d203fd190f51e7f0699409fa1d9680 +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c2ef1418d93b5745c3fd2e99b68759858cce207 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21d63969cf946e353a032f5ac80728d714a0a1825920e92c56bbcb8f4960d3d4 +size 14180099 diff --git a/146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt b/146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4eda27960f35b71308b261795345aaf84d6a03af --- /dev/null +++ b/146m7b5100mdedup/global_step14324/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff8f9adc0710fed1a24544f6886063a00e219896546f64d8fe6f1fe9849b11f8 +size 4291 diff --git a/146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt b/146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..73cdd25728355a9978879982b792391cab8fad96 --- /dev/null +++ b/146m7b5100mdedup/global_step14324/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17cdf8ed707b0d96d082e6dc521c37b177e507c9588e2e7107f143e6b23a496b +size 35443 diff --git a/146m7b5100mdedup/sbatch_146m7b5100mdedup.sh b/146m7b5100mdedup/sbatch_146m7b5100mdedup.sh new file mode 100644 index 0000000000000000000000000000000000000000..a1468ef18127fdcdea318b385e03392554b1d64e --- /dev/null +++ b/146m7b5100mdedup/sbatch_146m7b5100mdedup.sh @@ -0,0 +1,162 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 24:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m7b5100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100mdedup.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 7510000000 +# -> Samples: 3_666_992 +TRAIN_SAMPLES=3_666_992 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 36_670 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m7b5100mdedup/sbatch_146m7b5100mdedupval.sh b/146m7b5100mdedup/sbatch_146m7b5100mdedupval.sh new file mode 100644 index 0000000000000000000000000000000000000000..c66f72396eba37c12c515ee692212099c7229b11 --- /dev/null +++ b/146m7b5100mdedup/sbatch_146m7b5100mdedupval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m7b5100mdedupval +VARIANT_CKPT=146m7b5100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m7b5100mdedup/tensorboard_146m7b5100mdedup/events.out.tfevents.1678999827.nid006236.103043.0 b/146m7b5100mdedup/tensorboard_146m7b5100mdedup/events.out.tfevents.1678999827.nid006236.103043.0 new file mode 100644 index 0000000000000000000000000000000000000000..28f5329669b0a8be378f6be98d5be7a89e7b2f4d --- /dev/null +++ b/146m7b5100mdedup/tensorboard_146m7b5100mdedup/events.out.tfevents.1678999827.nid006236.103043.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:89a4e31db0bed2c0db084ac204524db97aee02b0db1378e18ee11342b4ae95a6 +size 25510338 diff --git a/146m7b5100mdedup/tensorboard_146m7b5100mdedupval/events.out.tfevents.1679007019.nid006868.127374.0 b/146m7b5100mdedup/tensorboard_146m7b5100mdedupval/events.out.tfevents.1679007019.nid006868.127374.0 new file mode 100644 index 0000000000000000000000000000000000000000..e9517ea50842805681a2261969047f008d9178ef --- /dev/null +++ b/146m7b5100mdedup/tensorboard_146m7b5100mdedupval/events.out.tfevents.1679007019.nid006868.127374.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b4f465faa93218ac45068d1a8f9aa7a878b59a7ff1aedac4b41f4a0f607356e +size 980 diff --git a/146m91b100m/3328571.err b/146m91b100m/3328571.err new file mode 100644 index 0000000000000000000000000000000000000000..d9fabb507da58c0ec5464ecc6a07e9f1637cf74b --- /dev/null +++ b/146m91b100m/3328571.err @@ -0,0 +1,1121 @@ +2: 2023-03-17 09:43:16.092795: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:43:16.092790: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:43:16.092809: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:43:16.092807: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:43:16.092817: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:43:16.092814: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:43:16.092817: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 09:43:16.092831: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:43:16.093366: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:43:16.093369: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:43:16.093377: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:43:16.093373: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:43:16.093369: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:43:16.093385: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:43:16.093378: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 09:43:16.093387: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:43:16.093532: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:43:16.093533: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:43:16.093531: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:43:16.093541: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:43:16.093540: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:43:16.093535: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:43:16.093546: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 09:43:16.093530: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:43:16.093679: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:43:16.093683: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:43:16.093689: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:43:16.093698: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:43:16.093696: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:43:16.093678: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:43:16.093701: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 09:43:16.093704: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:43:16.093899: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:43:16.093903: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:43:16.093909: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: 2023-03-17 09:43:16.093900: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:43:16.093912: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:43:16.093910: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:43:16.093913: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:43:16.093916: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:43:16.093920: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:43:16.093923: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:43:16.093909: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:43:16.093913: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 09:43:16.093911: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: 2023-03-17 09:43:16.093928: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:43:16.093929: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:43:16.093938: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:43:16.131119: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:43:16.131131: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:43:16.131121: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:43:16.131125: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:43:16.131137: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:43:16.131139: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:43:16.131126: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 09:43:16.131139: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:43:16.135999: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:43:16.136007: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:43:16.136007: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:43:16.135999: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:43:16.136013: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:43:16.136009: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:43:16.136006: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 09:43:16.136017: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 09:43:17.801237: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:17.801245: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:17.801246: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:17.801248: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:17.801245: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:17.801254: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:17.801249: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:17.801250: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:17.801621: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:43:17.801621: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:43:17.801627: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:43:17.801627: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:43:17.801629: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:43:17.801633: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:43:17.801635: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 09:43:17.801637: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:43:17.801919: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:17.801915: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:17.801915: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:17.801925: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:17.801931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:17.801921: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:17.801926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:17.801923: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:17.802332: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:43:17.802331: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:43:17.802336: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:43:17.802340: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:43:17.802342: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:43:17.802343: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:43:17.802345: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 09:43:17.802349: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:43:17.830401: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:17.830402: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:17.830413: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:17.830410: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:17.830415: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:17.830422: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:17.830406: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:17.830420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:17.830809: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:43:17.830811: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:43:17.830814: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:43:17.830816: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:43:17.830817: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:43:17.830818: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:43:17.830819: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 09:43:17.830820: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:43:17.842544: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:17.842542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:17.842552: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:17.842552: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:17.842553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:17.842561: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:17.842555: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:17.842555: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:17.842958: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:43:17.842957: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:43:17.842964: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:43:17.842965: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:43:17.842968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:43:17.842968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:43:17.842968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 09:43:17.842971: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:43:17.866523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:17.866530: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:17.866535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:17.866534: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:17.866538: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:17.866540: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:17.866541: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:17.866544: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:17.866933: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:43:17.866939: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:43:17.866941: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:43:17.866941: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:43:17.866943: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:43:17.866946: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:43:17.866950: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:43:17.866951: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:43:17.878697: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:17.878690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:17.878699: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:17.878697: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:17.878708: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:17.878704: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:17.878690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:17.878699: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:17.878892: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:43:17.878894: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:43:17.878895: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:43:17.878896: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:43:17.878897: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:43:17.878899: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:43:17.878902: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 09:43:17.878905: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:43:17.920711: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:17.920717: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:17.920731: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:17.920728: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:17.920727: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:17.920736: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:17.920735: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:17.920738: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:17.921155: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:43:17.921156: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:43:17.921162: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:43:17.921162: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:43:17.921164: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:43:17.921169: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:43:17.921173: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 09:43:17.921175: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:43:17.922180: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:17.922187: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:17.922185: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:17.922188: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:17.922197: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:17.922197: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:17.922190: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:17.922194: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:17.922572: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:43:17.922577: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:43:17.922577: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:43:17.922576: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:43:17.922582: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:43:17.922584: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:43:17.922592: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 09:43:17.922596: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 09:43:23.279608: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.279758: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 09:43:23.279674: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 09:43:23.279617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.279682: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 09:43:23.279615: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 09:43:23.279803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-17 09:43:23.279761: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.279683: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 09:43:23.279619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.279765: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.279812: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.279680: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-17 09:43:23.279625: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.279767: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.279890: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.279810: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.279684: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:23.279624: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 09:43:23.279873: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.279772: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.279887: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.279819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.279690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:23.279628: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 09:43:23.279871: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.279773: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.279901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.279815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.279687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 09:43:23.280018: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:23.279627: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 09:43:23.279879: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.279768: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.279898: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.279821: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.279689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 09:43:23.280024: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.279885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.279773: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.279897: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.279821: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.280033: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.279881: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.279898: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.280035: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.279883: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.279898: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.280028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.279890: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.279904: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.280028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.279889: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.280027: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.280030: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.280765: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.280761: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.280772: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.280778: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.280774: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.280781: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.280777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.280785: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.281501: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.281502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.281503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:43:23.281570: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.281502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.281509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:23.281572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 09:43:23.281508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:23.281576: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 09:43:23.281513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.281518: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:43:23.281518: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:43:23.281575: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 09:43:23.281521: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:43:23.281519: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:43:23.281525: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.281527: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 09:43:23.281525: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:43:23.281572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 09:43:23.281546: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 09:43:23.281559: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:43:23.281745: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:43:23.281578: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:23.281585: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:43:23.281588: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:23.281587: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 09:43:23.281749: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:23.281591: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:43:23.281593: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 09:43:23.281594: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:43:23.281596: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:43:23.281749: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:43:23.281592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.281751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 09:43:23.281602: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 09:43:23.281608: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.281752: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.281758: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:43:23.281756: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.281762: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:43:23.281758: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.281762: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:43:23.281761: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 09:43:23.281769: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:43:23.281768: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:43:23.281771: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:43:23.281775: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 09:43:23.281778: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:43:23.282067: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:43:23.282054: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.282086: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 09:43:23.282069: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:43:23.282056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.282071: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:43:23.282055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.282087: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.282072: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.282056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.282092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.282073: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.282059: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.282095: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.282074: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.282059: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.282095: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.282074: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.282100: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:43:23.282100: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:43:23.282056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.282081: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:43:23.282097: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.282073: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:43:23.282074: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:43:23.282080: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.282076: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:43:23.282075: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:43:23.282075: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 09:43:23.282085: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:43:23.282085: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:43:23.282096: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 09:43:23.282077: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:43:23.282080: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:43:23.282089: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:43:23.282090: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.282107: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:43:23.282109: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 09:43:23.282099: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 09:43:23.282092: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:43:23.282091: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 09:43:23.282096: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:43:23.282113: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:43:23.282116: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:43:23.282116: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 09:43:23.282112: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 09:43:23.282127: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 09:43:23.282142: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:43:23.282960: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.282960: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.282962: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.282965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.282966: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.282968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.282978: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:43:23.282978: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:43:23.282979: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:43:23.282981: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:43:23.282982: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:43:23.282983: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:43:23.282995: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.282995: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 09:43:23.283010: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 09:43:23.283010: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.279824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.281731: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.281730: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.281732: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.281731: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.281734: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.281737: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.281738: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.281748: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:43:23.281749: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:43:23.281749: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:43:23.281750: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:43:23.281752: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:43:23.281751: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:43:23.281753: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 09:43:23.281759: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 09:43:23.281773: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +6: Successfully preprocessed all matching files. +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Loading extension module utils... +1: Loading extension module utils... +3: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... +0: Loading extension module utils... +0: +0: +0: +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +6: Loading extension module utils... +2: Loading extension module utils... +6: Loading extension module utils... +2: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +4: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +7: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: +1: +1: Loading extension module utils...Loading extension module utils... +1: Loading extension module utils... +1: +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: +1: Loading extension module utils...Loading extension module utils... +1: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +2: +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils...Loading extension module utils... +4: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +7: No modifications detected for re-loaded extension module utils, skipping build step... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils...Loading extension module utils... +5: +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/146m91b100m/3328571.out b/146m91b100m/3328571.out new file mode 100644 index 0000000000000000000000000000000000000000..b3e7159e68db8d8877d69e89bf82ccaae4bc7253 --- /dev/null +++ b/146m91b100m/3328571.out @@ -0,0 +1,5664 @@ +Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-146m91b100mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_146m91b100mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m91b100m --load checkpoints_146m91b100m --train-weighted-split-paths-path train14b.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3328571.json --zero-stage 0 +START 3328571: Fri 17 Mar 2023 09:42:54 AM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 49.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 41.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 39.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 41.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 40.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 42.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 44.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 40.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 38.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 49.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 47.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 41.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 47.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 45.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 39.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 46.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 43.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 43.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 34.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 41.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 42.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 43.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 40.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 36.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 47.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 43.0c 79.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +1: Launching on nid006546 (1/8), master nid006545 port 9999, GPUs 8, CUDA: True +6: Launching on nid006551 (6/8), master nid006545 port 9999, GPUs 8, CUDA: True +7: Launching on nid006552 (7/8), master nid006545 port 9999, GPUs 8, CUDA: True +3: Launching on nid006548 (3/8), master nid006545 port 9999, GPUs 8, CUDA: True +4: Launching on nid006549 (4/8), master nid006545 port 9999, GPUs 8, CUDA: True +5: Launching on nid006550 (5/8), master nid006545 port 9999, GPUs 8, CUDA: True +2: Launching on nid006547 (2/8), master nid006545 port 9999, GPUs 8, CUDA: True +0: Launching on nid006545 (0/8), master nid006545 port 9999, GPUs 8, CUDA: True +7: > setting tensorboard ... +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3328571.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3072 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 768 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-146m91b100mval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_146m91b100m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 12 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 15 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_146m91b100m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_146m91b100mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-17 09:43:41,815] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.095 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: [1/1] c++ scaled_masked_softmax_hip.o scaled_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 24.916 seconds +0: time to initialize megatron (seconds): 73.507 +0: [after megatron is initialized] datetime: 2023-03-17 09:44:09 +0: building GPT model ... +0: [2023-03-17 09:44:09,643] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-17 09:44:09,644] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-17 09:44:09,644] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.71 GB, percent = 6.1% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-17 09:44:11,654] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=22 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: undo +0: 19: MixedFusedLayerNorm +0: 20: EmbeddingPipe +0: 21: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-17 09:44:11,928] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-17 09:44:11,929] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB +0: [2023-03-17 09:44:11,929] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.73 GB, percent = 6.1% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-17 09:44:11,931] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-17 09:44:25,113] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-17 09:44:25,113] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-17 09:44:25,113] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-17 09:44:25,118] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-17 09:44:25,118] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-17 09:44:25,244] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-17 09:44:25,245] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 09:44:25,245] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.41 GB, percent = 6.2% +0: ninja: no work to do. +0: Time to load utils op: 0.16383123397827148 seconds +1: Time to load utils op: 0.20911502838134766 seconds +3: Time to load utils op: 0.20942282676696777 seconds +0: Time to load utils op: 0.0006108283996582031 seconds +0: Time to load utils op: 0.10233688354492188 seconds +0: Time to load utils op: 0.10212135314941406 secondsTime to load utils op: 0.10199904441833496 secondsTime to load utils op: 0.10211896896362305 seconds +0: +0: +0: Time to load utils op: 0.1017143726348877 seconds +0: Time to load utils op: 0.10172843933105469 seconds +0: Time to load utils op: 0.10174894332885742 seconds +1: Time to load utils op: 0.10193634033203125 seconds +1: Time to load utils op: 0.10246586799621582 secondsTime to load utils op: 0.10257911682128906 seconds +1: +1: Time to load utils op: 0.10241389274597168 seconds +1: Time to load utils op: 0.10248827934265137 seconds +1: Time to load utils op: 0.10250568389892578 seconds +1: Time to load utils op: 0.10265302658081055 seconds +3: Time to load utils op: 0.10230159759521484 seconds +3: Time to load utils op: 0.10241961479187012 seconds +3: Time to load utils op: 0.10228538513183594 seconds +3: Time to load utils op: 0.10218977928161621 seconds +3: Time to load utils op: 0.10213232040405273 seconds +3: Time to load utils op: 0.10193443298339844 seconds +3: Time to load utils op: 0.10233283042907715 seconds +5: Time to load utils op: 0.11013078689575195 seconds +5: Time to load utils op: 0.1107780933380127 seconds +5: Time to load utils op: 0.11094832420349121 seconds +5: Time to load utils op: 0.11048340797424316 secondsTime to load utils op: 0.11114215850830078 seconds +5: +5: Time to load utils op: 0.11051154136657715 seconds +5: Time to load utils op: 0.11088109016418457 secondsTime to load utils op: 0.1102442741394043 seconds +5: +2: Time to load utils op: 0.1116793155670166 seconds +2: Time to load utils op: 0.11172652244567871 seconds +2: Time to load utils op: 0.1117391586303711 seconds +2: Time to load utils op: 0.11171817779541016 seconds +2: Time to load utils op: 0.1117258071899414 secondsTime to load utils op: 0.11175322532653809 seconds +2: +2: Time to load utils op: 0.11175537109375 seconds +2: Time to load utils op: 0.11176753044128418 seconds +0: Time to load utils op: 0.0003447532653808594 seconds +6: Time to load utils op: 0.10994243621826172 secondsTime to load utils op: 0.11002326011657715 secondsTime to load utils op: 0.11025214195251465 seconds +6: +6: +0: Time to load utils op: 0.00046253204345703125 seconds +6: Time to load utils op: 0.11111259460449219 seconds +0: Time to load utils op: 0.0004012584686279297 seconds +6: Time to load utils op: 0.11006951332092285 seconds +0: Time to load utils op: 0.00039577484130859375 seconds +6: Time to load utils op: 0.11004185676574707 secondsTime to load utils op: 0.11000394821166992 seconds +6: +6: Time to load utils op: 0.11003851890563965 seconds +0: Time to load utils op: 0.0003933906555175781 seconds +7: Time to load utils op: 0.11080765724182129 seconds +7: Time to load utils op: 0.11090469360351562 seconds +7: Time to load utils op: 0.11090993881225586 seconds +7: Time to load utils op: 0.11086463928222656 seconds +7: Time to load utils op: 0.11055397987365723 seconds +7: Time to load utils op: 0.11004829406738281 secondsTime to load utils op: 0.11092233657836914 seconds +7: +7: Time to load utils op: 0.11069607734680176 seconds +0: Time to load utils op: 0.0003960132598876953 seconds +4: Time to load utils op: 0.11256623268127441 seconds +4: Time to load utils op: 0.11260581016540527 seconds +4: Time to load utils op: 0.11262226104736328 seconds +4: Time to load utils op: 0.11263895034790039 secondsTime to load utils op: 0.11263012886047363 seconds +4: +4: Time to load utils op: 0.11264371871948242 seconds +4: Time to load utils op: 0.11263632774353027 seconds +4: Time to load utils op: 0.11266970634460449 seconds +1: Time to load utils op: 0.0005042552947998047 seconds +1: Time to load utils op: 0.0004935264587402344 seconds +1: Time to load utils op: 0.0004229545593261719 seconds +1: Time to load utils op: 0.0004107952117919922 seconds +1: Time to load utils op: 0.0004980564117431641 seconds +1: Time to load utils op: 0.0005097389221191406 seconds +1: Time to load utils op: 0.0005397796630859375 seconds +1: Time to load utils op: 0.0005223751068115234 seconds +3: Time to load utils op: 0.0005059242248535156 seconds +3: Time to load utils op: 0.000370025634765625 seconds +3: Time to load utils op: 0.0003960132598876953 seconds +3: Time to load utils op: 0.000385284423828125 seconds +3: Time to load utils op: 0.0003864765167236328 seconds +3: Time to load utils op: 0.00041294097900390625 seconds +3: Time to load utils op: 0.0004260540008544922 seconds +3: Time to load utils op: 0.000408172607421875 seconds +2: Time to load utils op: 0.00092315673828125 seconds +2: Time to load utils op: 0.0011436939239501953 seconds +2: Time to load utils op: 0.0011706352233886719 seconds +2: Time to load utils op: 0.0013148784637451172 seconds +2: Time to load utils op: 0.0013039112091064453 seconds +2: Time to load utils op: 0.0011899471282958984 seconds +2: Time to load utils op: 0.001203298568725586 seconds +2: Time to load utils op: 0.0012676715850830078 seconds +7: Time to load utils op: 0.000675201416015625 seconds +7: Time to load utils op: 0.0009064674377441406 seconds +7: Time to load utils op: 0.0010449886322021484 seconds +4: Time to load utils op: 0.0009593963623046875 seconds +4: Time to load utils op: 0.0008947849273681641 seconds +7: Time to load utils op: 0.0011243820190429688 seconds +7: Time to load utils op: 0.0013077259063720703 seconds +7: Time to load utils op: 0.0011637210845947266 seconds +4: Time to load utils op: 0.0010802745819091797 seconds +7: Time to load utils op: 0.0012128353118896484 seconds +7: Time to load utils op: 0.001169443130493164 seconds +4: Time to load utils op: 0.0011708736419677734 seconds +4: Time to load utils op: 0.001220703125 seconds +4: Time to load utils op: 0.00115966796875 seconds +4: Time to load utils op: 0.001149892807006836 seconds +4: Time to load utils op: 0.001270294189453125 seconds +6: Time to load utils op: 0.0007977485656738281 seconds +5: Time to load utils op: 0.0009560585021972656 seconds +5: Time to load utils op: 0.0008931159973144531 seconds +6: Time to load utils op: 0.0011458396911621094 seconds +6: Time to load utils op: 0.0010950565338134766 seconds +6: Time to load utils op: 0.0010821819305419922 seconds +6: Time to load utils op: 0.001104593276977539 seconds +6: Time to load utils op: 0.0011849403381347656 seconds +6: Time to load utils op: 0.0011620521545410156 seconds +6: Time to load utils op: 0.0011479854583740234 seconds +5: Time to load utils op: 0.0013141632080078125 seconds +5: Time to load utils op: 0.0012307167053222656 secondsTime to load utils op: 0.0011930465698242188 seconds +5: +5: Time to load utils op: 0.0012161731719970703 seconds +5: Time to load utils op: 0.001262664794921875 seconds +5: Time to load utils op: 0.0013167858123779297 seconds +0: [2023-03-17 09:44:25,475] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-17 09:44:25,476] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB +0: [2023-03-17 09:44:25,476] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.55 GB, percent = 6.3% +0: [2023-03-17 09:44:25,590] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-17 09:44:25,591] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 09:44:25,591] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.55 GB, percent = 6.3% +0: [2023-03-17 09:44:25,695] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-17 09:44:25,696] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB +0: [2023-03-17 09:44:25,696] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.56 GB, percent = 6.3% +0: [2023-03-17 09:44:25,800] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-17 09:44:25,800] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:44:25,800] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.56 GB, percent = 6.3% +0: [2023-03-17 09:44:25,902] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-17 09:44:25,903] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:44:25,903] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.56 GB, percent = 6.3% +0: [2023-03-17 09:44:26,007] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-17 09:44:26,007] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:44:26,007] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.56 GB, percent = 6.3% +0: [2023-03-17 09:44:26,109] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-17 09:44:26,109] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:44:26,109] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.56 GB, percent = 6.3% +0: [2023-03-17 09:44:26,219] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-17 09:44:26,220] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:44:26,220] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.56 GB, percent = 6.3% +0: [2023-03-17 09:44:26,323] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-17 09:44:26,323] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB +0: [2023-03-17 09:44:26,323] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.56 GB, percent = 6.3% +0: [2023-03-17 09:44:26,323] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-17 09:44:26,324] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-17 09:44:26,324] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-17 09:44:26,324] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-17 09:44:26,324] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-17 09:44:26,324] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-17 09:44:26,324] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-17 09:44:26,324] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-17 09:44:26,324] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-17 09:44:26,325] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-17 09:44:26,326] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-17 09:44:26,326] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.00043272972106933594 seconds +0: [2023-03-17 09:44:26,327] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-17 09:44:26,401] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) +0: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +2: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +3: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +0: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +0: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt... +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +2: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +6: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +3: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +7: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/mp_rank_00_model_states.pt. +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:44:26,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:44:26,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +2: [2023-03-17 09:44:26,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +4: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +7: [2023-03-17 09:44:26,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +5: [2023-03-17 09:44:26,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +1: [2023-03-17 09:44:26,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:44:26,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +3: [2023-03-17 09:44:26,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt... +6: [2023-03-17 09:44:26,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +2: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +7: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +6: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +4: [2023-03-17 09:44:26,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +5: [2023-03-17 09:44:26,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +3: [2023-03-17 09:44:26,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_01-model_00-model_states.pt. +1: [2023-03-17 09:44:26,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +1: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +5: [2023-03-17 09:44:26,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +6: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +4: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:44:26,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +7: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +5: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +1: [2023-03-17 09:44:26,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +4: [2023-03-17 09:44:26,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +3: [2023-03-17 09:44:26,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +6: [2023-03-17 09:44:26,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt... +2: [2023-03-17 09:44:26,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_03-model_00-model_states.pt. +2: [2023-03-17 09:44:26,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +5: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +6: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +4: [2023-03-17 09:44:26,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt... +1: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +6: [2023-03-17 09:44:26,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +1: [2023-03-17 09:44:26,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +5: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +2: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +4: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +3: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_04-model_00-model_states.pt. +7: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +7: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +2: [2023-03-17 09:44:26,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +5: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +6: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +3: [2023-03-17 09:44:26,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:26,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:26,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt... +1: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +1: [2023-03-17 09:44:26,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +6: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +3: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +2: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:26,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +7: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:26,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:26,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:26,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:27,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:27,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:27,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:27,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +4: [2023-03-17 09:44:27,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:27,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:27,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:44:27,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:27,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:27,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_05-model_00-model_states.pt. +5: [2023-03-17 09:44:27,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:27,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:27,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +2: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +3: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +6: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +6: [2023-03-17 09:44:27,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +4: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +7: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:44:27,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +2: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +3: [2023-03-17 09:44:27,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +7: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +1: [2023-03-17 09:44:27,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +5: [2023-03-17 09:44:27,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +4: [2023-03-17 09:44:27,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:44:27,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +1: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +3: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +7: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +4: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +6: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +2: [2023-03-17 09:44:27,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +5: [2023-03-17 09:44:27,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt... +6: [2023-03-17 09:44:27,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +7: [2023-03-17 09:44:27,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +3: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +2: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +4: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +1: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +5: [2023-03-17 09:44:27,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:44:27,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +7: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +4: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +5: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +6: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +2: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +2: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +6: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +7: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +3: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +4: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +3: [2023-03-17 09:44:27,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt... +1: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +1: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +5: [2023-03-17 09:44:27,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:44:27,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +4: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +7: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +2: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +6: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +7: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +3: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +5: [2023-03-17 09:44:27,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +3: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +5: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +2: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +6: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +4: [2023-03-17 09:44:27,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:44:27,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt... +1: [2023-03-17 09:44:27,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_09-model_00-model_states.pt. +1: [2023-03-17 09:44:27,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +7: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +4: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +7: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +5: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +2: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +6: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +3: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +6: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +4: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt... +1: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +3: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +5: [2023-03-17 09:44:27,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +1: [2023-03-17 09:44:27,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +2: [2023-03-17 09:44:27,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:44:27,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:44:27,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:44:27,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:44:27,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +6: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +5: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +1: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +4: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +2: [2023-03-17 09:44:27,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt... +7: [2023-03-17 09:44:27,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +6: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +1: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +7: [2023-03-17 09:44:27,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +5: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +4: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +2: [2023-03-17 09:44:27,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_11-model_00-model_states.pt. +3: [2023-03-17 09:44:27,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +2: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +3: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +6: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +4: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +5: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt... +1: [2023-03-17 09:44:27,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +2: [2023-03-17 09:44:27,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +1: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +6: [2023-03-17 09:44:27,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +4: [2023-03-17 09:44:27,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +7: [2023-03-17 09:44:27,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +5: [2023-03-17 09:44:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_12-model_00-model_states.pt. +3: [2023-03-17 09:44:27,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +7: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +5: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +1: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +3: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +4: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +6: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt... +2: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +2: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +7: [2023-03-17 09:44:27,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +6: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:44:27,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +3: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +4: [2023-03-17 09:44:27,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +1: [2023-03-17 09:44:27,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_13-model_00-model_states.pt. +5: [2023-03-17 09:44:27,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +6: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +7: [2023-03-17 09:44:27,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +1: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +2: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +3: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +4: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +4: [2023-03-17 09:44:27,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt... +5: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +6: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +7: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +2: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +5: [2023-03-17 09:44:27,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +1: [2023-03-17 09:44:27,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +3: [2023-03-17 09:44:27,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:44:27,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:44:27,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:27,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +6: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +4: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +2: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +1: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +7: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:27,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +6: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt... +3: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:27,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:27,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:27,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:27,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:27,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:27,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +7: [2023-03-17 09:44:27,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +3: [2023-03-17 09:44:27,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:27,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:27,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +1: [2023-03-17 09:44:27,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:27,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:27,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +5: [2023-03-17 09:44:27,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:27,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +4: [2023-03-17 09:44:27,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_15-model_00-model_states.pt. +2: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:27,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:27,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:28,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:28,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:28,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:28,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:28,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:44:28,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +3: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +1: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +7: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +6: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +2: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +5: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt... +4: [2023-03-17 09:44:28,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +7: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +6: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +1: [2023-03-17 09:44:28,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +2: [2023-03-17 09:44:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +4: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +5: [2023-03-17 09:44:28,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_16-model_00-model_states.pt. +3: [2023-03-17 09:44:28,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +6: [2023-03-17 09:44:28,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +2: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +4: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +6: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +5: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +1: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt... +3: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +3: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +4: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +2: [2023-03-17 09:44:28,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +1: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +5: [2023-03-17 09:44:28,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +7: [2023-03-17 09:44:28,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +6: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +5: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +5: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: > overriding warmup iterations value to 0 +5: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +1: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +7: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt... +4: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +4: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +3: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +2: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +7: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +1: [2023-03-17 09:44:28,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:44:28,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:44:28,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:44:28,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:44:28,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:44:28,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:44:28,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:44:28,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:44:28,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:44:28,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:44:28,293] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +7: [2023-03-17 09:44:28,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,294] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +7: [2023-03-17 09:44:28,295] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +5: [2023-03-17 09:44:28,295] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +2: [2023-03-17 09:44:28,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:44:28,297] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +1: [2023-03-17 09:44:28,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:44:28,297] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +3: [2023-03-17 09:44:28,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:44:28,298] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +1: [2023-03-17 09:44:28,299] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +2: [2023-03-17 09:44:28,299] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +3: [2023-03-17 09:44:28,300] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +4: [2023-03-17 09:44:28,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:44:28,300] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +4: [2023-03-17 09:44:28,301] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +4: [2023-03-17 09:44:28,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:44:28,302] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +5: [2023-03-17 09:44:28,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:44:28,303] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +4: [2023-03-17 09:44:28,303] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +0: [2023-03-17 09:44:28,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:44:28,304] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +5: [2023-03-17 09:44:28,305] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +0: [2023-03-17 09:44:28,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:44:28,306] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +0: [2023-03-17 09:44:28,306] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +0: [2023-03-17 09:44:28,308] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +7: [2023-03-17 09:44:28,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,308] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +1: [2023-03-17 09:44:28,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:44:28,309] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +3: [2023-03-17 09:44:28,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:44:28,309] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +7: [2023-03-17 09:44:28,310] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +1: [2023-03-17 09:44:28,310] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +3: [2023-03-17 09:44:28,310] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +6: [2023-03-17 09:44:28,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:44:28,315] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +6: [2023-03-17 09:44:28,318] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +2: [2023-03-17 09:44:28,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:44:28,319] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +2: [2023-03-17 09:44:28,321] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +6: [2023-03-17 09:44:28,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:44:28,329] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +6: [2023-03-17 09:44:28,330] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +5: [2023-03-17 09:44:28,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:44:28,334] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +5: [2023-03-17 09:44:28,336] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +7: [2023-03-17 09:44:28,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,337] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +4: [2023-03-17 09:44:28,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:44:28,338] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +5: [2023-03-17 09:44:28,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,339] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +5: [2023-03-17 09:44:28,339] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +2: [2023-03-17 09:44:28,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:44:28,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:44:28,340] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +3: [2023-03-17 09:44:28,340] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +4: [2023-03-17 09:44:28,340] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +5: [2023-03-17 09:44:28,341] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +3: [2023-03-17 09:44:28,342] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +2: [2023-03-17 09:44:28,342] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +3: [2023-03-17 09:44:28,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:44:28,343] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +3: [2023-03-17 09:44:28,344] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +1: [2023-03-17 09:44:28,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:44:28,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:44:28,344] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +0: [2023-03-17 09:44:28,344] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +1: [2023-03-17 09:44:28,346] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +4: [2023-03-17 09:44:28,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:44:28,346] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +0: [2023-03-17 09:44:28,346] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +7: [2023-03-17 09:44:28,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,347] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +4: [2023-03-17 09:44:28,348] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +7: [2023-03-17 09:44:28,349] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +5: [2023-03-17 09:44:28,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:44:28,350] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +7: [2023-03-17 09:44:28,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,352] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +5: [2023-03-17 09:44:28,352] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +4: [2023-03-17 09:44:28,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:44:28,353] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +3: [2023-03-17 09:44:28,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,353] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +3: [2023-03-17 09:44:28,353] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +4: [2023-03-17 09:44:28,354] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +4: [2023-03-17 09:44:28,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:44:28,355] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +3: [2023-03-17 09:44:28,355] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +1: [2023-03-17 09:44:28,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:44:28,356] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +7: [2023-03-17 09:44:28,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,356] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +4: [2023-03-17 09:44:28,356] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +1: [2023-03-17 09:44:28,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:44:28,357] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +0: [2023-03-17 09:44:28,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:44:28,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:44:28,357] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +0: [2023-03-17 09:44:28,358] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +7: [2023-03-17 09:44:28,358] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +6: [2023-03-17 09:44:28,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:44:28,358] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +6: [2023-03-17 09:44:28,358] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +1: [2023-03-17 09:44:28,359] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +0: [2023-03-17 09:44:28,359] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +0: [2023-03-17 09:44:28,359] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +6: [2023-03-17 09:44:28,360] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +1: [2023-03-17 09:44:28,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:44:28,364] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +2: [2023-03-17 09:44:28,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:44:28,364] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +6: [2023-03-17 09:44:28,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:44:28,365] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +1: [2023-03-17 09:44:28,365] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +5: [2023-03-17 09:44:28,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:44:28,366] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +6: [2023-03-17 09:44:28,367] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +2: [2023-03-17 09:44:28,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:44:28,367] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +2: [2023-03-17 09:44:28,367] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +5: [2023-03-17 09:44:28,367] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +6: [2023-03-17 09:44:28,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:44:28,368] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +3: [2023-03-17 09:44:28,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:44:28,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:44:28,368] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +5: [2023-03-17 09:44:28,368] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +2: [2023-03-17 09:44:28,369] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +6: [2023-03-17 09:44:28,369] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +2: [2023-03-17 09:44:28,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:44:28,369] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +5: [2023-03-17 09:44:28,369] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +3: [2023-03-17 09:44:28,370] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +2: [2023-03-17 09:44:28,371] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +0: [2023-03-17 09:44:28,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:44:28,375] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +0: [2023-03-17 09:44:28,377] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +6: [2023-03-17 09:44:28,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:44:28,378] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +6: [2023-03-17 09:44:28,379] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +4: [2023-03-17 09:44:28,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:44:28,380] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +7: [2023-03-17 09:44:28,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:44:28,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,381] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +2: [2023-03-17 09:44:28,381] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +4: [2023-03-17 09:44:28,382] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +7: [2023-03-17 09:44:28,383] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +2: [2023-03-17 09:44:28,383] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +5: [2023-03-17 09:44:28,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:44:28,383] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +2: [2023-03-17 09:44:28,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:44:28,384] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +5: [2023-03-17 09:44:28,385] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +4: [2023-03-17 09:44:28,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:44:28,385] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +2: [2023-03-17 09:44:28,386] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +7: [2023-03-17 09:44:28,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:44:28,387] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +4: [2023-03-17 09:44:28,387] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +7: [2023-03-17 09:44:28,388] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +1: [2023-03-17 09:44:28,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:44:28,390] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +1: [2023-03-17 09:44:28,391] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +6: [2023-03-17 09:44:28,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:44:28,396] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +0: [2023-03-17 09:44:28,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:44:28,397] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +6: [2023-03-17 09:44:28,397] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +0: [2023-03-17 09:44:28,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:44:28,398] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +1: [2023-03-17 09:44:28,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:44:28,398] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +0: [2023-03-17 09:44:28,399] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +0: [2023-03-17 09:44:28,399] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +1: [2023-03-17 09:44:28,400] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +3: [2023-03-17 09:44:28,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:44:28,403] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +3: [2023-03-17 09:44:28,405] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +6: [2023-03-17 09:44:28,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:44:28,423] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +6: [2023-03-17 09:44:28,425] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +3: [2023-03-17 09:44:28,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_146m91b100m/global_step173500/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:44:28,543] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +3: [2023-03-17 09:44:28,545] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +0: successfully loaded checkpoint from checkpoints_146m91b100m at iteration 0 +7: time (ms) | load-checkpoint: 2147.72 +0: estimated model parameters: 0.146525952 +0: estimated model parameters without embeddings: 0.106319616 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-17 09:44:28 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.026980 seconds +0: number of documents: 28730568 +0: > dataset split: +0: train: +0: document indices in [0, 28730568) total of 28730568 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.079 seconds +0: total number of samples: 6713794 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.037825 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.019 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-17 09:44:42 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 19358.05 | train/valid/test-data-iterators-setup: 12964.38 +0: [after training is done] datetime: 2023-03-17 09:44:42 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.862952E+00 | lm loss PPL: 4.760565E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3328571: Fri 17 Mar 2023 09:45:03 AM EET diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a26cd0caa5a71779f9b42867ef6dac836a37478d --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54424016eaf41b221cb444fc8ba1aa733c3aad340b378db717373cf4b7cad3c2 +size 27478295 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80fd2301ff4678e3a40277447d9b5cf4dfbe3231 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:55aa77859958e7ab5aceca87d666a019f95caa7d2a320e3bd5804b481f5cdab6 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c01788a069fcbd2405f2ee0f16f2dc9861d8f632 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d2a3368c19ea344547c5981fc231fdc9cdeec32ee8a6f44965ed4b5b8975fd99 +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8c7922948a261dc0146210dfc25515e8e20c27e3 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1bbbe3772e3014c455b4efef3d1b44677126892ea906e954462776c102222df8 +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..326d7b2d8fd044b41eb038dede0d9c4d17a8dd6b --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:264c43c9e8ff8952db262b0326910afee9f475aa8f97d6a0815f7776a1cf2973 +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a24087d04a691984c0935151cc1e24621a00da1b --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d21a81c9f9c9318400224bea1d59d74505ee97c17f30412d5fedc3618aaffa91 +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d55630b141f6e1a2c2ba8119c255c5fc1fad057 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9607142b693861bc7afd504756d7e527fc067a26a9744a3e7d95b0e0cf51c09d +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e61a784e07ac2a4df7503e287a5cd9fb6765926b --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32a5bc93a0318405795c19d10b62ed96a4f2c83537c7e0c51b66f287c85e8963 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cf301f5dffc2ef3bff2e84e5455db5dc5df27260 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ba5d2bf4073aa45f63e2c4b8bf21f94c8588b896686a3935f3c1f7b89700d8c +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7eddb2b1abec5a44ead0b79023a322cd0b54a811 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11a9ed1df2dab22c3dbeb8b8879ef7cf2559219a352d3e8ba77d5a8bb3abb1ba +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..87ce2e24f720f56b23753a2b18c84d27880452a9 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0a2988badcdb05658f6da7e00771a7da3ef5b99e94f069b421105115e95276d4 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d125821b518d7c73dd513ad13a13ad5a9fe5ba2b --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5071fefef8760abbb7b7d03b3defa4041b285545d46b9dcd08fb3ebd53cb3f2d +size 27478231 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bfa4f703942a891c56e8306a8a73a63cec16918a --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a8dd7f1c45fe9cdeca78b85061e2c42b7c9f8aafdbbafcfe0c2a7415bfb187a6 +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb016a19d86adc5fc9b527d6a96fbc28caad1611 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:36b774b6739e2c49433dabb659c4dfa26082e4319ac013e924400c5775469660 +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c8769a0e16f061cf828d3b0aae009b17dd30124d --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72f27647fd5eb2b7ecaddf9496c8a3a3a8ac6363a6c7214a4d9dc3c9991b1529 +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..803c4754c0b2c272ac69540e92ef27a454d4ba4f --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8e25e850877add2aa2ae24143cc58645a81e30b24157061fe5ad175ecebf4c6c +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..478f5423f41525399e5a3a833929629ae0d5e41e --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c4afaaed6f797c52e8b2426266077a456af003168634724271dca607a921cf4 +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6531f3a2996a0397ff8a119708ef27f93a880f56 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:50b33db635ede5978f5e6b90cee4eea3506a318c80298aac0d1205cdbd575c1f +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4effc4fbbd51dae09753f385bf8ec3bbcca583f0 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa28fe9c56c4c83cb8dcbdb0029e6ac46d963cb9ba17a9b64a84c8ef3b5ee84a +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7748b500ea4820a167d0775dd42537a8ab923ec7 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a7444aa90e8637a82d88e4c6a50bc77546c64854a5ae8db6be92b69586c94600 +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..24e69e5bb2588bc78fe437ae679403aa629173dd --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:521271ba96cfc8d2277c635872cd96b0f4d2c4a0a868aac760dbbe11bdc6d4dc +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fd8d23fc3f6dd1adc9e9677329431254bd902414 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63e04f4666f7c31cd858debb8eda91ae4a6ca3b735e71db81e5f14acd8d60555 +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8e6ae1ba6ace5e5147eeacf35145657d6307bf74 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:53cc26236e3d525bd87371f0637144ab011bf64459544beff9e4065ed4411cea +size 27478231 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..63f0c60ae6497a28b1cb8e10207827340b38b907 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:742d1fb40dc920a3e5e875f69acf31912566bba659e8e88d9b746c7e22ee676f +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b9234395c33844e01f24be91228d455e494e2bcb --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d0d08cc349db37cf0bf14dfbf737e7146721fd89777cc060b7db92ad17d254df +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0b4bf53f0dfd8e437b21b5e89029282cf82f53eb --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e35ae0c1dce7a33f5bfca3a1238ab424af4490929dfcb0743dfe05f448c87c2b +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee81de7fae987e381a8dd3bb88433217992fcd2f --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6f217df1b5632e33e0a5d3b835713ecfa10234efa060a778a6e899932e95a0a +size 27478434 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5d9c711538aacc5340c73ca3423be7fd0112771f --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d0811c72940284f3eff6a4ded8cf50fbd207c06a7b473df8184b0dcfc8053b0e +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b4f29c39e9109422bfeb3f3145eff97d2cdc3741 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aff8f12aba9f02449fd3a4c8ee5cde6dc1876f36225d91d69fdcec3d5a263e2a +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1522650d3d8469d19e644fc4589cf96e8067be7 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:625cc7566d5829c54c9c66a0c69ff665484344164432b77e9fe8092505ec3ef5 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c30023e171113722eb3fe24d20d9979e2a4c925c --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:12fc25aae4003cfc79ffb85dc21b51e9f322ed77a0129ab8303b00271a81d078 +size 27478114 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4192e8c1fc674ea0e9daf4e6320613399733bccf --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fcc83db3bcee5826daa05dd0a287ce35f85ca8497c20fbf50eb1ae7945de15ef +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1b7947f93736f63d5632e39616559ad9460e4fb9 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fc0bed804ce264d86a0db5aef36c124ec8c44c1cc628f28930f26b74d7109cd9 +size 27478434 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a49ecffcccdb682faffc0bfec18415c663ea0ef --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a5e1a269b98e62bdd7fe249e1d971ff8af583c55403791bf5c8c758018b8cc5 +size 27478231 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a6d5a66a796a41e076579b9d5283ab36da45b0a6 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43efe77eeeef96c1286bc53b3f04406dcfed53df2f659c4b300a2460ca2c6c99 +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc4286ed98bae3f45fd152a5137e544f1b14d84e --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b8a372f68fb9a2d5b29258d100168ba5cb4e8d2f0ecdd123c3c791574081858a +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8b7f4f191ee5716022e8f2d6c09a77837136ac4d --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67b1bcbd3edf20b5ec45673123bbe54984323724d28887f3a0a280639fcfc050 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5065bd6ac1fe03bbf8cb6c40e1d3f2d951cc111b --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11b516f5423a7b6f10125820ed235f8907f169b07c91db00e519d2fd2bce4877 +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..49996319728b3cb62ed773cbb31ba7d1333403fa --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9efd683a893ce192536489ca50e36e9a11b1a00627ff97ae9f05cfdd100b8b06 +size 27478434 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6abbcdfbbe2fdcc2c2877cff9c203752fde31bda --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:daaeed81c5c2db1918d7c799212a493777bc32f680957531e7a5e54d486957b0 +size 27478114 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9761ecd279aeeb1de36653d889aedde8dedefd5 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fcca8a9cdc0ee33c65eccb73d59219417bb6b0987be5f1233a8ab114b33148df +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e947b7a00786ce89fd0c81a86ecf0eb5e940bb62 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd10ed2d15a6ca4bc1a96ae93888f1e1fbcaa4b7ec3c4967f05eaa54427707b2 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4f6ab5db72b51a37896636c4f58fbca66db1fdb4 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d4c15044e3e400046f15390e251fcb6a1afa4bde1a2762df5328a5103568872 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2361810b88569633a98ce570995012a3516c8adc --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4fb10616704a9d23c5af83945dec86feb7ee91e45e307f1c2d40035d3d0a3c25 +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0ac873aed21b0553f3f09c9cd1bed340c52afa1a --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:04d8466eeb064e5fbc23134fe4f36dd429a4b4c917ac7c2c0369e999c031e722 +size 27478231 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c24fce101d3472ef6e229486a6725c43ef2fc23 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3deb51284fe4f4409f1387a0e977b88b17fec319f7f5c04ae9938160603744d4 +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..459949781f875ab7312bf421adc91e5a9528a74b --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae537588f955b02826074580223ebbf381fadbc5ee9657455b9aa8c0f2bdda97 +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a7180f7c0485fbca0240a664f388016173ed6ff1 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a960e9a845734ea4ba73015769f2cdca0276c9b15759b84ff6b492140e3020f +size 27478434 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f821f8e0eb8555111ff083c256105f21dc636ec --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d237ccbeb44fc7683ce8c9d790173dcfc477e376ee23a2dbe51253e9df56ba74 +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..114e261a73c061caa613d23b9ccdc51b1a58bd95 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:79ea0c7434ba8e3cdbcaae4a7e091308dc0374c8a6e3135cfa1969d1326dbe61 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5817b6900c4d564c5f62d84c5ea2730373940bfd --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a6a6bd45980520a18b22b85319038052cffef1b190e6ba803f27bf70437afba4 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1acee9dd3ac33b33dac1e80e3d35075a880dad80 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a81d924655c9f26f5b9f2aff222096b2f8beb0e59fc9c94dbed71036567dd319 +size 27478306 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7a388b91d8157347836cb247b666324a41e2e1a2 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e1e85b6dbb7c81ffffbcb4cc9156e8cbea82b8674b5a3b446ac770bb34d4bdc6 +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..26d416237966ade123efb5dd480d16279388d62c --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1539173480f2da013921bca366b3e85b74041732c9abec0fae72a7d9b1ea26fc +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c32533a9eebf17c95703f133e7f81dcc455cb760 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db32eebf509b58c636dc94c4d3ef64dbdb1e30e406f8ab5c1f52fdd125f8a921 +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..10da7ae6abfbfcafd33b3fb4730cce8205f2938d --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3181a9d3ef2e731e01353025d92653bdcb9e113acd9f122cc5d429739385472c +size 27478167 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..484dc14b89e5374c0067416d4ad77a974798044f --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:895954ed589eca500f0cffe07d89acea0a99f0325e2d1a174c74ee4996d47b3f +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80efd2af1e761a7fab60825c88062ef7ca07c69c --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37d74abbb20ed8a8453da340ba706478fa986c8f122dd89eecf5120a913723a3 +size 27478370 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4ae19ca214117f857fddb569d7bad0373d0883f7 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21f02bbd0957e279744948adbd7f89f1138e0958b0d106e974e21f85a84977b0 +size 27478178 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bf7e2b3046f7a9410d01ded4a30268efd303d9a --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c4ceaf5bab4cf46ba0afa4d771011d59128eef749232ea4df33b4c5a991c838 +size 27478242 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a67c72272c74a3ed7d87e8a407ea9b4036bae4fd --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37651c9fbd4f9f98351d8945ae8c3730acabbfc5d06ac37099f6ed38dffc6a2a +size 27478359 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c0d0e32fe9a56b39a46f6a19148a0db1519f8a2 --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b4131eaf6c4dec77490e70c7cc99291fd868dd64ea26836331409b50b600583a +size 27478103 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..96e7ef76de34ecb7d8950cb6ec98aa74cf91450d --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:953a2def3d61fe5caf14b8f15a918cd98627c535a6d15aa62f1d63ab5d90e26e +size 27478359 diff --git a/146m91b100m/global_step173500/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/146m91b100m/global_step173500/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b1ac6c52bade200d5d9b1cf486617f235489f36d --- /dev/null +++ b/146m91b100m/global_step173500/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd9920f7a380711212ee85ddcb81628f816139da6889783b7da8ac36f495af42 +size 27478167 diff --git a/146m91b100m/global_step173500/layer_01-model_00-model_states.pt b/146m91b100m/global_step173500/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..75301cea64468873df8006a4df5221de1df36b03 --- /dev/null +++ b/146m91b100m/global_step173500/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c2379725890650e2c2bc5c428023380e3e01e8474a8a32decc2af2465bb7c5e +size 80413955 diff --git a/146m91b100m/global_step173500/layer_03-model_00-model_states.pt b/146m91b100m/global_step173500/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..889d71edd486021782025b727533b63773da343a --- /dev/null +++ b/146m91b100m/global_step173500/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d75303bb1d4ed2a091ba77104617e1f771ff5340606f4eb94ad117aae7da6b3b +size 14180099 diff --git a/146m91b100m/global_step173500/layer_04-model_00-model_states.pt b/146m91b100m/global_step173500/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..db5a0e27b33d9958e26f03c95b0cc551770f08ce --- /dev/null +++ b/146m91b100m/global_step173500/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:994b10414d2e5bea8fa13b91a6131e15513448c903901139200c384ff0ed0f39 +size 14180099 diff --git a/146m91b100m/global_step173500/layer_05-model_00-model_states.pt b/146m91b100m/global_step173500/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3f6ec5b7bb68e48fec13779c2fd16d824171585f --- /dev/null +++ b/146m91b100m/global_step173500/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:648fdd84940fb6e5356bb445c4e1b69ce9ea429488e04995604851819bb4c19a +size 14180099 diff --git a/146m91b100m/global_step173500/layer_06-model_00-model_states.pt b/146m91b100m/global_step173500/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1b3d032de2fc9ff1ea7487542a0d4c78615db946 --- /dev/null +++ b/146m91b100m/global_step173500/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5245232691460786fdcc97bca9bc42161cf89ec7770019df19417aa92b5da89 +size 14180099 diff --git a/146m91b100m/global_step173500/layer_07-model_00-model_states.pt b/146m91b100m/global_step173500/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..716512fac0e5f5cf1a8cfe1901ea8c5dbb61b255 --- /dev/null +++ b/146m91b100m/global_step173500/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af88a97a96f56771756c8fdba607c8e58032c9c6b2a8737fcc02f2eae22c9655 +size 14180099 diff --git a/146m91b100m/global_step173500/layer_08-model_00-model_states.pt b/146m91b100m/global_step173500/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b9e5491f7f4d06c5f5c3d1a8f8b613b46766382f --- /dev/null +++ b/146m91b100m/global_step173500/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dc8ac2aa9aaabb7631efc5cff6665298cceef4a8761fca573cdff1cb87d85c69 +size 14180099 diff --git a/146m91b100m/global_step173500/layer_09-model_00-model_states.pt b/146m91b100m/global_step173500/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..768a74317f3d603e3dddcc40a3362e64ef83a199 --- /dev/null +++ b/146m91b100m/global_step173500/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a2b4ad4b19fafe02abcc96613fac620f2ff375c9252840bd65ab8d61b9ac127d +size 14180099 diff --git a/146m91b100m/global_step173500/layer_10-model_00-model_states.pt b/146m91b100m/global_step173500/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a31d2fd6429fb524abc3fe9b16932324513e20e7 --- /dev/null +++ b/146m91b100m/global_step173500/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:04d9b99f790c1f899cbc84fbd0e3ad50dae45e3925ddfb4013e7d4a7411386fc +size 14180099 diff --git a/146m91b100m/global_step173500/layer_11-model_00-model_states.pt b/146m91b100m/global_step173500/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f57cd2663dc46fe7482dc5dfbd61f8cae7c031a --- /dev/null +++ b/146m91b100m/global_step173500/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1aad11a5310c10c89ef6536945a48b87e8d17a777a546b2eee09b5e3d01b337f +size 14180099 diff --git a/146m91b100m/global_step173500/layer_12-model_00-model_states.pt b/146m91b100m/global_step173500/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7267200887098781c033b2e4eb455f3c8689a1e --- /dev/null +++ b/146m91b100m/global_step173500/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:274afb1fe4933ec5f6dc48c791c9f0cf73e24f54a79da0267baaf7c5c8864b83 +size 14180099 diff --git a/146m91b100m/global_step173500/layer_13-model_00-model_states.pt b/146m91b100m/global_step173500/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..52785030d31d147bb6d744e59b14a02e96ba1006 --- /dev/null +++ b/146m91b100m/global_step173500/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92d88642e8959f61fe7fc8b806ba659d95c21bfa3099be1ba48477ca267b0b34 +size 14180099 diff --git a/146m91b100m/global_step173500/layer_14-model_00-model_states.pt b/146m91b100m/global_step173500/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a6e1468e636b1bd47b80a6d91ee932cf7184b01e --- /dev/null +++ b/146m91b100m/global_step173500/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:396d719f022705bcc686c95f35d1552fc5fcaf81f09901585fe4e3fd392f065d +size 14180099 diff --git a/146m91b100m/global_step173500/layer_15-model_00-model_states.pt b/146m91b100m/global_step173500/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..74f8dda2e52065388fe6f87c429b310d169d0c85 --- /dev/null +++ b/146m91b100m/global_step173500/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e304d851f488a7359d67edea8ca5e734c986ee6153d34cb86492a0b42c912580 +size 14180099 diff --git a/146m91b100m/global_step173500/layer_16-model_00-model_states.pt b/146m91b100m/global_step173500/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d78e40c103eaef145d569760257bd7f6dc0ad25 --- /dev/null +++ b/146m91b100m/global_step173500/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:30816d792b103c84959cb432a47c11872ffb3d359b73ae5e6fd51cc56068b3b3 +size 14180099 diff --git a/146m91b100m/global_step173500/layer_17-model_00-model_states.pt b/146m91b100m/global_step173500/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a939257a27232162bd41ed3994280013ff4bbd41 --- /dev/null +++ b/146m91b100m/global_step173500/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4315a5037710be2afa71903a3aea59df60b947fa12cb69729cbc3b916759665 +size 14180099 diff --git a/146m91b100m/global_step173500/layer_19-model_00-model_states.pt b/146m91b100m/global_step173500/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..56073dd99ae6f7732660595cc6ec8ff838749dd6 --- /dev/null +++ b/146m91b100m/global_step173500/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c8446a24ba026c99f8f462a58501fe39fc8869145de412742f22fe7ef85365aa +size 4291 diff --git a/146m91b100m/global_step173500/mp_rank_00_model_states.pt b/146m91b100m/global_step173500/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6240d59e17d65f6403e25bb12a6cb37d9eec3e2c --- /dev/null +++ b/146m91b100m/global_step173500/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ffa48cbf33c99858b49b19abc23e20f9d634c5abe0c1bdce7b37e0bfed36737 +size 35443 diff --git a/146m91b100m/sbatch_146m91b100m.sh b/146m91b100m/sbatch_146m91b100m.sh new file mode 100644 index 0000000000000000000000000000000000000000..4f495734966bed764ff11126e101681988c964cb --- /dev/null +++ b/146m91b100m/sbatch_146m91b100m.sh @@ -0,0 +1,177 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 2-0:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m91b100m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT +mkdir -p $CHECKPOINT_PATH +mkdir -p $TENSORBOARD_PATH + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 19873180000 +# -> Samples: 9703701 +# TRAIN_SAMPLES=9_703_701 +# Tokens: 31633480000 +# -> Samples: 15446035 +# TRAIN_SAMPLES=15_446_035 +# Tokens: 60400000000 +# -> Samples: 29492188 +# TRAIN_SAMPLES=29_492_188 +# Tokens: 90964260000 +# -> Samples: 44416143 +TRAIN_SAMPLES=44_416_143 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 444_161 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + --checkpoint-activations \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 100 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 10000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m91b100m/sbatch_146m91b100mval.sh b/146m91b100m/sbatch_146m91b100mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..a8bfcb18e682f49225406d22735c9585b58d291a --- /dev/null +++ b/146m91b100m/sbatch_146m91b100mval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=146m91b100mval +VARIANT_CKPT=146m91b100m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train14b.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_14B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_140M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 11300000000 +# -> Samples: 5517578 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1678953740.nid005483.49344.0 b/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1678953740.nid005483.49344.0 new file mode 100644 index 0000000000000000000000000000000000000000..8606595dd227c5fd703bfd60ba8c6fd26e781e71 --- /dev/null +++ b/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1678953740.nid005483.49344.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ea6925e6ab033308fa5aa3063c1690fd5a443a22eed717777739b94dc2511dc +size 4257482 diff --git a/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1678954806.nid005483.56682.0 b/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1678954806.nid005483.56682.0 new file mode 100644 index 0000000000000000000000000000000000000000..91ae9bee181b72234eeac78f833d1efb2556cce3 --- /dev/null +++ b/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1678954806.nid005483.56682.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26cc37d25d31a6beedafd1aff1c83ac70f6f987aa2149cba0571f3be84aa74cb +size 299020874 diff --git a/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1679020127.nid005299.86974.0 b/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1679020127.nid005299.86974.0 new file mode 100644 index 0000000000000000000000000000000000000000..15d350af3893ac2c736d016b557327dce64afeb2 --- /dev/null +++ b/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1679020127.nid005299.86974.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9fb75cce89844ef48bb28e46b9f17d77b3efc68a3801e238bca4ec6759ba6586 +size 21475 diff --git a/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1679020286.nid005365.111048.0 b/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1679020286.nid005365.111048.0 new file mode 100644 index 0000000000000000000000000000000000000000..dd08f7b1e017648df14912e709213bb568d86caf --- /dev/null +++ b/146m91b100m/tensorboard_146m91b100m/events.out.tfevents.1679020286.nid005365.111048.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:895ae73e6c14162f306374ed341a468c4353411a8cfab7c8d472461f1f40fbff +size 40 diff --git a/146m91b100m/tensorboard_146m91b100mval/events.out.tfevents.1679039021.nid006552.49270.0 b/146m91b100m/tensorboard_146m91b100mval/events.out.tfevents.1679039021.nid006552.49270.0 new file mode 100644 index 0000000000000000000000000000000000000000..aacd0f434ea67414a4bc9bb021825d98d0918a44 --- /dev/null +++ b/146m91b100m/tensorboard_146m91b100mval/events.out.tfevents.1679039021.nid006552.49270.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e8e551abf514e32583ec66311112f64af08a2ad9eea350e2571003ea7681fe33 +size 980 diff --git a/14m14b100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt diff --git a/14m14b100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/14m14b100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt similarity index 100% rename from 14m14b100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt rename to 14m14b100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt diff --git a/14m14b100m/layer_01-model_00-model_states.pt b/14m14b100m/global_step/layer_01-model_00-model_states.pt similarity index 100% rename from 14m14b100m/layer_01-model_00-model_states.pt rename to 14m14b100m/global_step/layer_01-model_00-model_states.pt diff --git a/14m14b100m/layer_03-model_00-model_states.pt b/14m14b100m/global_step/layer_03-model_00-model_states.pt similarity index 100% rename from 14m14b100m/layer_03-model_00-model_states.pt rename to 14m14b100m/global_step/layer_03-model_00-model_states.pt diff --git a/14m14b100m/layer_04-model_00-model_states.pt b/14m14b100m/global_step/layer_04-model_00-model_states.pt similarity index 100% rename from 14m14b100m/layer_04-model_00-model_states.pt rename to 14m14b100m/global_step/layer_04-model_00-model_states.pt diff --git a/14m14b100m/layer_05-model_00-model_states.pt b/14m14b100m/global_step/layer_05-model_00-model_states.pt similarity index 100% rename from 14m14b100m/layer_05-model_00-model_states.pt rename to 14m14b100m/global_step/layer_05-model_00-model_states.pt diff --git a/14m14b100m/layer_06-model_00-model_states.pt b/14m14b100m/global_step/layer_06-model_00-model_states.pt similarity index 100% rename from 14m14b100m/layer_06-model_00-model_states.pt rename to 14m14b100m/global_step/layer_06-model_00-model_states.pt diff --git a/14m14b100m/layer_08-model_00-model_states.pt b/14m14b100m/global_step/layer_08-model_00-model_states.pt similarity index 100% rename from 14m14b100m/layer_08-model_00-model_states.pt rename to 14m14b100m/global_step/layer_08-model_00-model_states.pt diff --git a/14m14b100m/mp_rank_00_model_states.pt b/14m14b100m/global_step/mp_rank_00_model_states.pt similarity index 100% rename from 14m14b100m/mp_rank_00_model_states.pt rename to 14m14b100m/global_step/mp_rank_00_model_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/14m1b5100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt similarity index 100% rename from 14m1b5100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt rename to 14m1b5100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt diff --git a/14m1b5100m/layer_01-model_00-model_states.pt b/14m1b5100m/global_step/layer_01-model_00-model_states.pt similarity index 100% rename from 14m1b5100m/layer_01-model_00-model_states.pt rename to 14m1b5100m/global_step/layer_01-model_00-model_states.pt diff --git a/14m1b5100m/layer_03-model_00-model_states.pt b/14m1b5100m/global_step/layer_03-model_00-model_states.pt similarity index 100% rename from 14m1b5100m/layer_03-model_00-model_states.pt rename to 14m1b5100m/global_step/layer_03-model_00-model_states.pt diff --git a/14m1b5100m/layer_04-model_00-model_states.pt b/14m1b5100m/global_step/layer_04-model_00-model_states.pt similarity index 100% rename from 14m1b5100m/layer_04-model_00-model_states.pt rename to 14m1b5100m/global_step/layer_04-model_00-model_states.pt diff --git a/14m1b5100m/layer_05-model_00-model_states.pt b/14m1b5100m/global_step/layer_05-model_00-model_states.pt similarity index 100% rename from 14m1b5100m/layer_05-model_00-model_states.pt rename to 14m1b5100m/global_step/layer_05-model_00-model_states.pt diff --git a/14m1b5100m/layer_06-model_00-model_states.pt b/14m1b5100m/global_step/layer_06-model_00-model_states.pt similarity index 100% rename from 14m1b5100m/layer_06-model_00-model_states.pt rename to 14m1b5100m/global_step/layer_06-model_00-model_states.pt diff --git a/14m1b5100m/layer_08-model_00-model_states.pt b/14m1b5100m/global_step/layer_08-model_00-model_states.pt similarity index 100% rename from 14m1b5100m/layer_08-model_00-model_states.pt rename to 14m1b5100m/global_step/layer_08-model_00-model_states.pt diff --git a/14m1b5100m/mp_rank_00_model_states.pt b/14m1b5100m/global_step/mp_rank_00_model_states.pt similarity index 100% rename from 14m1b5100m/mp_rank_00_model_states.pt rename to 14m1b5100m/global_step/mp_rank_00_model_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/14m2b7100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt similarity index 100% rename from 14m2b7100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt rename to 14m2b7100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt diff --git a/14m2b7100m/layer_01-model_00-model_states.pt b/14m2b7100m/global_step/layer_01-model_00-model_states.pt similarity index 100% rename from 14m2b7100m/layer_01-model_00-model_states.pt rename to 14m2b7100m/global_step/layer_01-model_00-model_states.pt diff --git a/14m2b7100m/layer_03-model_00-model_states.pt b/14m2b7100m/global_step/layer_03-model_00-model_states.pt similarity index 100% rename from 14m2b7100m/layer_03-model_00-model_states.pt rename to 14m2b7100m/global_step/layer_03-model_00-model_states.pt diff --git a/14m2b7100m/layer_04-model_00-model_states.pt b/14m2b7100m/global_step/layer_04-model_00-model_states.pt similarity index 100% rename from 14m2b7100m/layer_04-model_00-model_states.pt rename to 14m2b7100m/global_step/layer_04-model_00-model_states.pt diff --git a/14m2b7100m/layer_05-model_00-model_states.pt b/14m2b7100m/global_step/layer_05-model_00-model_states.pt similarity index 100% rename from 14m2b7100m/layer_05-model_00-model_states.pt rename to 14m2b7100m/global_step/layer_05-model_00-model_states.pt diff --git a/14m2b7100m/layer_06-model_00-model_states.pt b/14m2b7100m/global_step/layer_06-model_00-model_states.pt similarity index 100% rename from 14m2b7100m/layer_06-model_00-model_states.pt rename to 14m2b7100m/global_step/layer_06-model_00-model_states.pt diff --git a/14m2b7100m/layer_08-model_00-model_states.pt b/14m2b7100m/global_step/layer_08-model_00-model_states.pt similarity index 100% rename from 14m2b7100m/layer_08-model_00-model_states.pt rename to 14m2b7100m/global_step/layer_08-model_00-model_states.pt diff --git a/14m2b7100m/mp_rank_00_model_states.pt b/14m2b7100m/global_step/mp_rank_00_model_states.pt similarity index 100% rename from 14m2b7100m/mp_rank_00_model_states.pt rename to 14m2b7100m/global_step/mp_rank_00_model_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/14m7b5100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt similarity index 100% rename from 14m7b5100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt rename to 14m7b5100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt diff --git a/14m7b5100m/layer_01-model_00-model_states.pt b/14m7b5100m/global_step/layer_01-model_00-model_states.pt similarity index 100% rename from 14m7b5100m/layer_01-model_00-model_states.pt rename to 14m7b5100m/global_step/layer_01-model_00-model_states.pt diff --git a/14m7b5100m/layer_03-model_00-model_states.pt b/14m7b5100m/global_step/layer_03-model_00-model_states.pt similarity index 100% rename from 14m7b5100m/layer_03-model_00-model_states.pt rename to 14m7b5100m/global_step/layer_03-model_00-model_states.pt diff --git a/14m7b5100m/layer_04-model_00-model_states.pt b/14m7b5100m/global_step/layer_04-model_00-model_states.pt similarity index 100% rename from 14m7b5100m/layer_04-model_00-model_states.pt rename to 14m7b5100m/global_step/layer_04-model_00-model_states.pt diff --git a/14m7b5100m/layer_05-model_00-model_states.pt b/14m7b5100m/global_step/layer_05-model_00-model_states.pt similarity index 100% rename from 14m7b5100m/layer_05-model_00-model_states.pt rename to 14m7b5100m/global_step/layer_05-model_00-model_states.pt diff --git a/14m7b5100m/layer_06-model_00-model_states.pt b/14m7b5100m/global_step/layer_06-model_00-model_states.pt similarity index 100% rename from 14m7b5100m/layer_06-model_00-model_states.pt rename to 14m7b5100m/global_step/layer_06-model_00-model_states.pt diff --git a/14m7b5100m/layer_08-model_00-model_states.pt b/14m7b5100m/global_step/layer_08-model_00-model_states.pt similarity index 100% rename from 14m7b5100m/layer_08-model_00-model_states.pt rename to 14m7b5100m/global_step/layer_08-model_00-model_states.pt diff --git a/14m7b5100m/mp_rank_00_model_states.pt b/14m7b5100m/global_step/mp_rank_00_model_states.pt similarity index 100% rename from 14m7b5100m/mp_rank_00_model_states.pt rename to 14m7b5100m/global_step/mp_rank_00_model_states.pt diff --git a/196m1b51b5/eval.txt b/196m1b51b5/eval.txt new file mode 100644 index 0000000000000000000000000000000000000000..691635437643b818c280e72a923c16ce53e2c43f --- /dev/null +++ b/196m1b51b5/eval.txt @@ -0,0 +1 @@ +3.929866E+00 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..058dd7405a8cb5c413c3cda0bc34028ef1845494 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:992ca688b2fc61879d331f2fa837073ab6b9d93c97e43a961723cbf4e6aa2256 +size 37736407 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8bb26beed4b7af21d71abf95b5a971ae4b6eefc0 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:08e2c988ad51029328a8ea3b519ff38bef24ac1d32c3ef12e5dac88214058001 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f19ff65dcf18b28f7d1390478534e0a0e91b1408 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:558a68e0ec2586c20bc08cb774047e33dbe7a8f99b4103a596b2961258ae720c +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..64edf2952384e71d25b9616b5a84e9af8021f3c9 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1fafbf6b826152eaf913c6326c048dea31bec8cdfacbad74da75a30204a3a6be +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6d34cc68f7d15892382bab7dbe4fd774fd658cf5 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d71825ced2dd6d5d7bc6d1139393e783152d011b8e707516e98331695ddc8ed2 +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..83cd2a6415493be730626807714448ea5b00746e --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c547fe7f3e1b2634ab65c260dc2f6c3015d7d715d06e2e90843699c72154b3d9 +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3602e071fd502c0ab30332a7ecf6c3642ac0185d --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f6f558ca0971ae0d989ebea5f7cc99db2829fc56e21b08b1e3abd645b279c6a +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c3bcac9be3ab5caf74a63f533bd115285a770c5 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8de6825cb1610c5be16974cd36aead2da0caaa3e8d8432962c42f2b3b6da1a6c +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c6c64a77b67ad25f59e2c18d97e8ecd69901c5c6 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a60501d49a97b21901de27a7a5793c65654bb1d3535b1df168f023e5f5fe883d +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3e0f071f2b26194164d7f95411a95c4257d81f6e --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67ac1cc21266af94d7e7ccd463d0543a316b90d57568607b0182c2d7fb30644c +size 37736354 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..16677a53aadc5b9a9b1fc924cdaf1e3a06cd7170 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:431c43af878c99910601e24a031c2a6a0ba27489590aef7d14a55ac713825311 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8fa92318543e188db336c9d2c73a7f5e3e9c9a96 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8588e4c24d4d38dc1cadc5b7ed8d7d9aff54cc8c929ca85a8eb2aeea3ecd09bc +size 37736407 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5bfae14bc34c5415b79426b69c30a81a7d57d0cd --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:95b766fc61dba220ff37bceff1e87a8ca53d4480fa95487e27650b50e842c9bc +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..48674431c3ad4168cab45600823f5a845e8af7b4 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b98aeb2087471a35c5a0524eccfc4a4a1ea59ddcc4dc7450e1126a2bb921e63 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bf9c86f638ad44fd15e575ae9ed48dfa1bf2440f --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cadff60a734ea7e2d4a1ca6e8beeb19611cf9b4c09510d01df057c4d9b03f82e +size 37736354 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..190283bbcb276fe27839e1de2cf652595d803b51 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fdd28b329aa5e81d74c1315f14949bfcffb51cd1615a8a9c5df1a2b49dd7f09e +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..12c8b85378f0142bed46a52ae76645d627e7430c --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d9b57b4a1d5c14ab4171328c74ed1d16108b115c6dd06f1cc92f79ef5231f7e +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f900c23d8c98e1069a54d18b31b313862f738be9 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:168027a457c65f83b3f22ada014bd3163eef9cf42196f28ef231380a0eef75be +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..952499b029904b0ff41157d9b135de34fc5e2f44 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:411c94ac9451d0f1552a7df17d491740e33258ba94d2214d30b8f20d90b2405c +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f9fa178316b6d1d5415a3227f764e7c227b5f8d6 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:96ee820e3ac3fdbed513ab9cd36a7e74825eb996396b79e9bb2a1fdf130949da +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eb3a1380cea7574c611a9809eb1b9203a581f547 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d8717eb8080bd29dd0f03237bd5027798856dda8139985e35081f091a09b188 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d86fe8d892ee1cb62a39a76bf2873b83411ab3c --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c32dcbccdde5e354931b4ce442ad7dad0eb53bdc27a0ea3fa8827ffe832d63e7 +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3d759103d6335b492ef63a35c6e1efd0f92e4c8c --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c63fa3b0c769dbf1cfee16ff4abd735df84aff184255ca311dea4b40f036cbb2 +size 37736471 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f366ed81aee87d330ee74a8857463dcfaa7dc3ab --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:73b01ddb0bec4e994a2c7559d1394b17b6353cf5f7762ca31ee37b6b65a21cff +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..703fe3583b67f89de2d455124426b82a3a6ae860 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2148fbab89efed1851d2d3df7cced26435a3003e414ec27994faa6ed8738983b +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c12d1ee84bc4fafe1875ec4792cc26884ac434c9 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c19659f30a53752c0cb935e4ff38d77f860060637da025c035b808bde0505258 +size 37736354 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..01e0617312c380c579ba0d634c1a8691c38db6ba --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc9df48b0640dbfee7d026d5bc04a362358217d111d9c08ddaed72ffa9aa6a8e +size 37736610 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9026b64fc23d759727aa3144d883d4d11f5eff7c --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fdd814270c4d6ac037a3bdc855e4f2401bf06324cba583d534dbee31acbe8c9b +size 37736354 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f51b5403e231a28c9b6babb184a5b7242d6b8903 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:76a28325f18e47ae544280692cd269952e8553ca32cf14d66701ddbfafaf93dc +size 37736610 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6c30851c8abfc01e77bfaa8286234adbffc49381 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8525778dc1684b75115426f03cf50972f2cdf19ec0ecca255b803886239ee8e0 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8dad0738842ab372705eb10d282245076d0e016c --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:da17871a329877b6524321dad65fc19694ac762fc9abf636911ce04d8b89a712 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ab2f64fc48cffce4e629cf542a71d85db44d83ee --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63544d93438c8f6877978faa4b5dc8cab86e1a9cf123104b5512e6884e3ad271 +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8dba2f1eed4a22e05670837275587b102f5c1a9d --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9d538eebb48db9dc4a44e80d43dfde9504db9fb1f3b372a72590e951a742f6b8 +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f0bfeab2127e8a89ab110e05b6489c2bcfac859e --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c8894e70a096e3222b747cab960541203c80f93c7a24da788e1a0e726599a9aa +size 37736407 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6e7424c268da232a5ac59dc7b45d2bcf11ad4209 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9590a50dcc7b20026a6a816935e4ba9c82f6b0bba7dd1192c940f067ec09a322 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4b910c96f7b01658df36ea94d1e4ecc4cf43b832 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1df3387e55afe564e5b0dec88037c92613e416ca22aa817ecb12274a9f6a4830 +size 37736610 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4960f95f6916709e2e3a109396bdd79135a0807d --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2c7dad081994221f7f7d1fba6f254a4e9d438341137a033c566140692bd85256 +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a9d0d312af2cae1defb138c8948e851f7e9282eb --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66c821e8406b4f114d182072a968f39b405ee74fd4226f01f748bfff249871d8 +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9599e76566dfeed893bfa44696db368009930077 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9b64f7b7f8b2da0087af3695ea8996511835673b445b370e8ac3aef9345e5aea +size 37736354 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f0b52bcf543cedf0abd276e80a41994ff62166c4 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c55419ac31eef02cbbe1ca7fabfbc634c86cd360a60713e9caadbee9795a7413 +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..49ce309f5b9cb84a5b0583fab723d6c161e6063c --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d7c27f9a7073a36f0f0842e104a8b913e5f8af20239f89c5807bf9274924820 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f7fc636a75a9b437cff6711e5c4640349872bd56 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aeb68ef04a4fce0861f37c1887deb579ee9f2540ac3281c50bedafd513e2a071 +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8abb353db101778014582ccbc361d857c61ffc27 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c59108db2f7eb5cd6c2dca9752456235c3d65baf12a9883e9aed156960bd8e1c +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4878af0ed1df8c0cddf212d672e08d7021d92444 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3a15d3bc50105c143f9e2380e2b7566a9bad6d70e845c21b592ad6ea6635311 +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..37b59af3efdaf6b9388b88ef6cd7e8666f040318 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d849672416adf06271f9b0abd35713a14f35b3e007286e0adb64344ec620d1c8 +size 37736471 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0a037f2735f1996dfde9e54c4f2a086e4b8ece0c --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6f3396e00437e1536fba46f4d54143abd266428353a04ba310d5c74c23825849 +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a5c9358b6ad2c6b7e5090a700c460de66abc576d --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3747771c2d29697ee9e1c1101990f282b0f27ebc485fc34c6bf5c9d80516042d +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7949509b45169ae1618bebde92ef713fc7649d83 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:987878f5e5e651247099f30eaf6d82bf5cd3bf954554b55fccc41e4411897f32 +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e7eb50cc764741631ba7f514de5b09468202f505 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9df84bc095dbd7644534e66807cac7d8258a345ae935dd27a62dbaac8fc938c1 +size 37736610 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1288ffd62ea9243a5a56a9a6876c85cdc7ee56f0 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c649025b028e40d58dc3929c2ad5457256f149ec1587bb165c882b71a0cbec39 +size 37736226 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0792ed72a31a81a90a7d7985f118b7ee37e198c6 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8567c80da42b4b5aa9b044325f0adc92f164d6993fde14dc77057df7d3349332 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4eedcd0a93fd84a3d65b0f4ee749e562a14cdac9 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:95a8b838c64a55bce72f510f1b1064c6d522ea6a9c4fff92f4b9966269d6b9d6 +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f579a245d44715a6e0dcdeb894b5890ce237ea6 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:be19007caeea4fb0f236dee77807a2df5b2e0ca310bd5e2975745565939c31b5 +size 37736546 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4e1200946aad67fa95d3b109045672556d49c8ff --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0cb5d43f17b6d0695b00ac8776396f51882c5709bae1b5e20590d514e6977a49 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0fdf82c36e16f2a632f81de310877f7bcb5e178c --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21f8fbb2cc05d49036b3495711547c20f543ff1763db3c15db03500713e630ef +size 37736418 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7bb794f80d08db08f1463e4f868e64654690a902 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b828b8dc8931f97e4dbde985dd81375c5fd1a77a1c6ffb6b4bfaf917f78aeed9 +size 37736471 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e0a865cfe79eadd9351e1167c1978ddf64fbb98a --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09c20985a8e669c96b1fb9a16348b84c315f50d675345e8ed62998dfacd0d5f5 +size 37736482 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76e9e7bcf2b7d3f4459f8d4b5bd25c9e800d8ec9 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e8353c39828bcde9f09fa19a1a95c9c435665a82d552344b4f538067cbe2c65 +size 37736610 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..88828fe0033ca6eca7900a750d4e5e0388c189a3 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:44d5b8b3c8a0fbb99491e4e09bbc564b2206c1041d700a27e4dec7046cac6a0d +size 37736354 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d4db2e7422724642bc67d8b425e78989fd26c442 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed1fde309f0e65390f8170bddf0d66dfdef7fe53d232f286dc42f6c365ff2c0a +size 37736354 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dfffeee5e9a14b2996434ce8c9b21d5ff0535d18 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d028fd8dd83345a692f33c7e13947650502f9f59d36e7282511f564ce6788f70 +size 37736407 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..de5b5a6179fe7aa7e31f1c2653f56151aafcc6fc --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:068b90fdfa4c8c7a7db277aa818f81e98fd68398f347722cfc650f1696e79c9a +size 37736407 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe9ae162f5ec5d6448dbd53a36631dccc8debbf4 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e455c2568d5e6a9ce6cc51356a2b58128621298765e66a414b4fde039258d255 +size 37736407 diff --git a/196m1b51b5/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/196m1b51b5/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0cf51da660978211320b3246c47a09e7ac4c8058 --- /dev/null +++ b/196m1b51b5/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b31fbc5b48526df6f3c3ac93e33d468df79c3ac298c343e790808e848b00e22 +size 37736407 diff --git a/196m1b51b5/global_step2891/layer_01-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..52f6aaaecdbd58f4a4b86f0e44f3951ef462752b --- /dev/null +++ b/196m1b51b5/global_step2891/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a558dcdd443bab52fd2a1a24701d49f03ed6321c10877e442f6341061c471a69 +size 93816067 diff --git a/196m1b51b5/global_step2891/layer_03-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..27c294646744e795397394485bf4fc27377e7f6c --- /dev/null +++ b/196m1b51b5/global_step2891/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:689a3a7888f4dd870b36af954078a601d4c3c25f111a0f076012a3e5fbecfba9 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_04-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cf5ba96ded565afad5bf96c34ccf5a459da5edd4 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e669ac5109e0a420f924f69327467c86d02f61a183d51476dd384a69d99bd30c +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_05-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0cedbc14ea6d3fdbe270d5ba8cf50ea54e3d4ae3 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63f52021e0a452448ed7b68955efe4412e7aafe613f97bb7914a173a8886b1d2 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_06-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bc6d8006bf271d54a24da59e61bf7a0149bc356 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5176421f6be75f0e6da642bf0f2cc410aa52aa7a8087de7476d63c5d58130d80 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_07-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb69b0202cb531069cf63bfd522904087a092af8 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2f8da86f2f359bfcab81c42d5a49d0683a938542fd0d991ea6eb4554aeaad741 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_08-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..66ece6d2842443d8f943a8eef8fd95e6ceaf7fb9 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0533064d00d5704145859621b6aa5a2a3b9a254510ae99d7324f86e56faf0405 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_09-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..52c58cbe45fa76b200e5c229024e3c49e1d41a54 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2fc4693cf465bc9361dc5f8ad65e560d751b42c1bae954fdee1908b2e1871687 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_10-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a7cdef0905684d80a431584c3678879d46db06e --- /dev/null +++ b/196m1b51b5/global_step2891/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7db4b39cae9674d491e6b58db4d27a8e4b89bd2d3f2a12d1a861ed75c849330c +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_11-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b0cda3c540f2dae4c0bd21436ad75da303693c8 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:98c23a061576040fbd73ce9345ddcb7f92882754960a419c9333d4fc94a50073 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_12-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5b75353a3146cc91e4aaa0b65418dc58c0c592b --- /dev/null +++ b/196m1b51b5/global_step2891/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3517a7d537e4228fe61f5f1ddd390b578d18a7d2ca7c97bb1fef214edb9e5830 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_13-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f5dd73f06daa446aa29bdeeedec500b33ed95806 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5fa76db361b714d120820ad256c97fb119051d98ad3459970a808a298ed0d80b +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_14-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9aae6364b4e3daffc14ba9e27ff5f5e3c29aaff9 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e8e0c3015f85740f9e24bc13ac5c8105c288cdb25f28b012ade0154e28b931e1 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_15-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b4fff0a9cf7530b243728149bca2cb70c84349a --- /dev/null +++ b/196m1b51b5/global_step2891/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f6bf3b65a43a8d63c12c32722486ffe57f334513622c06a32075f8025354f49 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_16-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..597e4d48ce825226a273373834669a171208f9a4 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eca7ab08abd1ba3fcb78d575c9346de2117573dead731304093e9bf827eb05fc +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_17-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..65e831b06a286f8750875f74627ee88cc4d1508f --- /dev/null +++ b/196m1b51b5/global_step2891/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:89e413bf3c27fb642636d30e64fbd3b886943a6211e82addbc85d91f40c521b7 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_18-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6bf91ce97c0b0be383cb874b1403b09916090f64 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:994d035a7bf9260b996ad497d35c5acea0ea2ed751b7217d17a6d3694d647015 +size 19295235 diff --git a/196m1b51b5/global_step2891/layer_20-model_00-model_states.pt b/196m1b51b5/global_step2891/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a2a58da5e97d09e73dc9cc47a46dd23c32586f76 --- /dev/null +++ b/196m1b51b5/global_step2891/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7eb02fd8109c05e76afe91f326494395dbd0dc3eeb5f60386d8122ab950774c5 +size 4803 diff --git a/196m1b51b5/global_step2891/mp_rank_00_model_states.pt b/196m1b51b5/global_step2891/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..359d233a2e8a4e1cd2f40ecc734af35353e66498 --- /dev/null +++ b/196m1b51b5/global_step2891/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06d0248388cfe4e1931b70bf22bd55264bffd770c49e8686600c3c452afca61d +size 36211 diff --git a/196m1b51b5/sbatch_196m1b51b5.sh b/196m1b51b5/sbatch_196m1b51b5.sh new file mode 100644 index 0000000000000000000000000000000000000000..b10ced5a7845662860408f3ce7ca80d56cb6f4b0 --- /dev/null +++ b/196m1b51b5/sbatch_196m1b51b5.sh @@ -0,0 +1,166 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=196m1b51b5 + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train1b5.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_196M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 1516071000 +# -> Samples: 740269 +TRAIN_SAMPLES=740_269 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 7403 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/196m1b51b5/sbatch_196m1b51b5val.sh b/196m1b51b5/sbatch_196m1b51b5val.sh new file mode 100644 index 0000000000000000000000000000000000000000..fc7fa3131e83c887e2cdf41bd286f34e5ce7921d --- /dev/null +++ b/196m1b51b5/sbatch_196m1b51b5val.sh @@ -0,0 +1,173 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=2 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p small-g +#SBATCH -t 2-0:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=196m1b51b5val +VARIANT_CKPT=196m1b51b5 + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT +mkdir -p $CHECKPOINT_PATH +mkdir -p $TENSORBOARD_PATH + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train1b5.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_20B_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_196M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 19873180000 +# -> Samples: 9703701 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + --checkpoint-activations \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/196m1b51b5/tensorboard_196m1b51b5/events.out.tfevents.1677500035.nid005918.25790.0 b/196m1b51b5/tensorboard_196m1b51b5/events.out.tfevents.1677500035.nid005918.25790.0 new file mode 100644 index 0000000000000000000000000000000000000000..d6bbd418177415583c3bcb6992ee3ee8f2615b78 --- /dev/null +++ b/196m1b51b5/tensorboard_196m1b51b5/events.out.tfevents.1677500035.nid005918.25790.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c8ad0336e12a1c07c5e14b5e55668f217f07926509e3c9294cf48b633eb23342 +size 5153185 diff --git a/196m1b51b5/tensorboard_196m1b51b5val/events.out.tfevents.1677509741.nid007243.113259.0 b/196m1b51b5/tensorboard_196m1b51b5val/events.out.tfevents.1677509741.nid007243.113259.0 new file mode 100644 index 0000000000000000000000000000000000000000..d985e9467152165ed3aec3a8cfd4d23497df8f73 --- /dev/null +++ b/196m1b51b5/tensorboard_196m1b51b5val/events.out.tfevents.1677509741.nid007243.113259.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf2891aeb0c5a63d50426ae25fe140fc1c2209ab0ffc270f122b042e49dcafd0 +size 980 diff --git a/1b1100m100m/3324218.err b/1b1100m100m/3324218.err new file mode 100644 index 0000000000000000000000000000000000000000..f3e90c2125de978c614b3d8b3d15a2482d51b5d6 --- /dev/null +++ b/1b1100m100m/3324218.err @@ -0,0 +1,294 @@ +1: 2023-03-16 18:50:42.323359: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:42.323360: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:42.323403: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:42.323419: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:42.323425: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:42.323405: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:42.323448: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:42.323411: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:42.324571: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:42.324575: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:42.324592: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:42.324605: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:42.324618: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:42.324625: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:42.324630: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:42.324644: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:55.435681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.435715: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.436051: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.435751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.436084: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.435791: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.435800: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.436099: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.435807: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.435774: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.435842: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.436140: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.436147: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.436158: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.436165: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.436179: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.435849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435817: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435879: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435904: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435894: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435910: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435946: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435951: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.436514: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436546: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436582: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436596: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436627: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436639: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436669: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436694: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:51:23.214107: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.214142: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.214164: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.214179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.214193: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.214195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.214223: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.214278: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222680: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222710: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222729: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222738: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222753: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222772: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222775: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225622: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225632: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225630: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225631: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225632: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225638: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225636: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225637: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225646: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225652: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225651: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225651: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225657: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225658: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.217216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.217218: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.217222: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.217222: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.217226: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.217228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.217229: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.217232: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.217246: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.217249: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.217246: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.217251: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.217251: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.217252: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.217254: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.217255: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +1: Building extension module utils... +1: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/1b1100m100m/3324218.out b/1b1100m100m/3324218.out new file mode 100644 index 0000000000000000000000000000000000000000..d88483476e4080798540ab1e202fe04c7bc16cd5 --- /dev/null +++ b/1b1100m100m/3324218.out @@ -0,0 +1,2526 @@ +Model parameters: d_model 1792 ffw_size 7168 kv_size 128 n_heads 14 n_layers 26 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 26 --hidden-size 1792 --num-attention-heads 14 --kv-channels 128 --ffn-hidden-size 7168 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 64 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-1b1100m100mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --override-lr-scheduler --no-load-optim --reset-progress --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --tensorboard-dir tensorboard_1b1100m100mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_1b1100m100m --load checkpoints_1b1100m100m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3324218.json --zero-stage 0 +START 3324218: Thu 16 Mar 2023 06:49:20 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 38.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 47.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 38.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 44.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +1: Launching on nid007372 (1/2), master nid007371 port 9999, GPUs 8, CUDA: True +0: Launching on nid007371 (0/2), master nid007371 port 9999, GPUs 8, CUDA: True +0: using world size: 16, data-parallel-size: 16, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 16 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3324218.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... None +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 7168 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 64 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 1792 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-1b1100m100mval +0: kv_channels ..................................... 128 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_1b1100m100m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 14 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 26 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_1b1100m100m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_1b1100m100mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 16 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +1: > setting tensorboard ... +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 18:52:40,894] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.115 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 102 +0: [1/1] c++ scaled_masked_softmax_hip.o scaled_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 26.326 seconds +0: time to initialize megatron (seconds): -12.017 +0: [after megatron is initialized] datetime: 2023-03-16 18:53:07 +0: building GPT model ... +0: [2023-03-16 18:53:08,115] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 18:53:08,115] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 18:53:08,115] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.34 GB, percent = 6.0% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15} +0: [2023-03-16 18:53:08,595] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=33 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: ParallelTransformerLayerPipe +0: 22: ParallelTransformerLayerPipe +0: 23: ParallelTransformerLayerPipe +0: 24: ParallelTransformerLayerPipe +0: 25: ParallelTransformerLayerPipe +0: 26: ParallelTransformerLayerPipe +0: 27: ParallelTransformerLayerPipe +0: 28: ParallelTransformerLayerPipe +0: 29: undo +0: 30: MixedFusedLayerNorm +0: 31: EmbeddingPipe +0: 32: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 18:53:08,956] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 18:53:08,956] [INFO] [utils.py:828:see_memory_usage] MA 2.05 GB Max_MA 2.05 GB CA 2.19 GB Max_CA 2 GB +0: [2023-03-16 18:53:08,957] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.39 GB, percent = 6.0% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 18:53:08,959] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 18:53:19,434] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 18:53:19,435] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 18:53:19,435] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 18:53:19,446] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 18:53:19,446] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 18:53:19,563] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 18:53:19,564] [INFO] [utils.py:828:see_memory_usage] MA 2.04 GB Max_MA 2.06 GB CA 2.19 GB Max_CA 2 GB +0: [2023-03-16 18:53:19,564] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.06 GB, percent = 6.2% +1: ninja: no work to do. +1: Time to load utils op: 0.1680746078491211 seconds +1: Time to load utils op: 0.20203018188476562 seconds +1: Time to load utils op: 0.20212292671203613 seconds +1: Time to load utils op: 0.20218801498413086 seconds +1: Time to load utils op: 0.20232391357421875 seconds +1: Time to load utils op: 0.20234918594360352 seconds +1: Time to load utils op: 0.20280718803405762 seconds +1: Time to load utils op: 0.20283985137939453 seconds +0: Time to load utils op: 0.21144890785217285 secondsTime to load utils op: 0.21098637580871582 seconds +0: +0: Time to load utils op: 0.2113938331604004 seconds +0: Time to load utils op: 0.2115471363067627 seconds +0: Time to load utils op: 0.21170806884765625 seconds +0: Time to load utils op: 0.21166276931762695 seconds +0: Time to load utils op: 0.2115781307220459 seconds +1: Time to load utils op: 0.0007572174072265625 seconds +0: Time to load utils op: 0.10219860076904297 seconds +1: Time to load utils op: 0.00032591819763183594 seconds +1: Time to load utils op: 0.0004010200500488281 seconds +1: Time to load utils op: 0.00037980079650878906 seconds +1: Time to load utils op: 0.00046753883361816406 seconds +1: Time to load utils op: 0.00042057037353515625 seconds +1: Time to load utils op: 0.00039696693420410156 seconds +1: Time to load utils op: 0.0003752708435058594 seconds +0: Time to load utils op: 0.0006859302520751953 seconds +0: Time to load utils op: 0.0008692741394042969 seconds +0: Time to load utils op: 0.0006535053253173828 seconds +0: Time to load utils op: 0.0006723403930664062 seconds +0: Time to load utils op: 0.0004680156707763672 seconds +0: Time to load utils op: 0.00040030479431152344 seconds +0: Time to load utils op: 0.00040459632873535156 seconds +0: [2023-03-16 18:53:19,797] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 18:53:19,798] [INFO] [utils.py:828:see_memory_usage] MA 2.04 GB Max_MA 2.04 GB CA 2.19 GB Max_CA 2 GB +0: [2023-03-16 18:53:19,798] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.2 GB, percent = 6.2% +0: [2023-03-16 18:53:19,915] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 18:53:19,916] [INFO] [utils.py:828:see_memory_usage] MA 4.35 GB Max_MA 4.35 GB CA 5.58 GB Max_CA 6 GB +0: [2023-03-16 18:53:19,916] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.22 GB, percent = 6.2% +0: [2023-03-16 18:53:20,020] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 18:53:20,021] [INFO] [utils.py:828:see_memory_usage] MA 4.35 GB Max_MA 4.35 GB CA 5.58 GB Max_CA 6 GB +0: [2023-03-16 18:53:20,021] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.22 GB, percent = 6.2% +0: [2023-03-16 18:53:20,125] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 18:53:20,126] [INFO] [utils.py:828:see_memory_usage] MA 6.38 GB Max_MA 6.38 GB CA 8.57 GB Max_CA 9 GB +0: [2023-03-16 18:53:20,126] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.22 GB, percent = 6.2% +0: [2023-03-16 18:53:20,228] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 18:53:20,228] [INFO] [utils.py:828:see_memory_usage] MA 6.38 GB Max_MA 6.38 GB CA 8.57 GB Max_CA 9 GB +0: [2023-03-16 18:53:20,229] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.22 GB, percent = 6.2% +0: [2023-03-16 18:53:20,334] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 18:53:20,335] [INFO] [utils.py:828:see_memory_usage] MA 6.38 GB Max_MA 6.38 GB CA 8.57 GB Max_CA 9 GB +0: [2023-03-16 18:53:20,335] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.22 GB, percent = 6.2% +0: [2023-03-16 18:53:20,437] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 18:53:20,438] [INFO] [utils.py:828:see_memory_usage] MA 6.38 GB Max_MA 6.38 GB CA 8.57 GB Max_CA 9 GB +0: [2023-03-16 18:53:20,438] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.22 GB, percent = 6.2% +0: [2023-03-16 18:53:20,546] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 18:53:20,546] [INFO] [utils.py:828:see_memory_usage] MA 6.89 GB Max_MA 6.89 GB CA 8.95 GB Max_CA 9 GB +0: [2023-03-16 18:53:20,546] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.22 GB, percent = 6.2% +0: [2023-03-16 18:53:20,648] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 18:53:20,649] [INFO] [utils.py:828:see_memory_usage] MA 6.89 GB Max_MA 6.89 GB CA 8.95 GB Max_CA 9 GB +0: [2023-03-16 18:53:20,649] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.22 GB, percent = 6.2% +0: [2023-03-16 18:53:20,649] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 18:53:20,649] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 18:53:20,649] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 18:53:20,649] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 18:53:20,650] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 18:53:20,650] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 18:53:20,650] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 18:53:20,650] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 18:53:20,650] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 18:53:20,651] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] train_batch_size ............. 64 +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] world_size ................... 16 +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 18:53:20,652] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 18:53:20,652] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 64, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0004203319549560547 seconds +0: [2023-03-16 18:53:20,653] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-16 18:53:20,706] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=33 [0, 33) STAGE_PARAMS=1096338432 (1096.338M) TOTAL_PARAMS=1096338432 (1096.338M) UNIQUE_PARAMS=1096338432 (1096.338M) +0: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:21,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:21,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:21,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:21,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:21,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:21,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:21,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:21,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:21,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:22,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:22,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:22,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:22,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:22,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:22,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:22,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:22,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:22,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:22,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:22,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:22,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:22,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:22,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:22,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:22,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:22,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:22,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:22,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:22,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:23,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:23,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:23,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:23,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:23,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:23,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:23,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:23,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:23,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:23,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:23,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:23,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:23,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:23,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:23,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:23,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:23,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:23,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:23,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:23,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:23,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:23,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:23,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:24,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:24,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:24,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:24,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:24,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:24,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:24,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:24,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:24,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:24,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:24,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:24,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:24,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:24,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:24,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:24,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:24,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:24,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:24,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:24,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:24,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:24,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:24,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:24,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:24,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:24,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:24,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:24,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:24,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:25,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:25,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:25,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:25,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:25,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:25,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:25,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:25,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:25,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:25,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:25,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:25,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:25,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:25,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:25,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:25,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:25,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:25,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:25,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:25,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:25,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:25,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:25,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:25,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:25,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:25,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:25,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:25,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:25,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:25,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:25,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:25,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:25,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:25,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:25,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:25,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:25,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:25,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:25,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:25,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:25,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:25,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:25,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:25,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:26,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:26,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:26,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:26,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:26,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:26,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:26,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:26,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:26,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:26,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:26,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:26,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:26,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:26,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:26,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:26,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:26,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:26,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:26,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:26,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:26,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:26,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:26,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:26,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:26,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:26,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:26,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:26,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:26,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:26,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:26,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:26,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:26,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:26,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:26,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:27,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:27,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:27,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:27,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:27,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:27,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:27,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:27,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:27,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:27,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:27,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:27,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:27,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:27,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:27,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:27,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:27,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:27,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:27,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:27,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:27,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:27,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:27,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:27,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:27,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:27,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:27,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:27,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:27,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:27,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:27,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:28,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:28,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:28,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:28,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:28,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:28,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:28,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:28,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:28,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:28,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:28,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:28,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:28,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:28,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:28,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:28,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:28,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:28,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:28,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:28,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:28,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:28,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:28,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:28,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:28,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:28,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:28,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:28,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:28,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:28,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:28,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:28,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:28,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:28,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:28,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/layer_30-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +0: [2023-03-16 18:53:28,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:28,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:30,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:30,821] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 9 +1: [2023-03-16 18:53:30,841] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 9 +0: [2023-03-16 18:53:30,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:30,872] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 2 +0: [2023-03-16 18:53:30,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:30,881] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 1 +0: [2023-03-16 18:53:30,897] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 2 +0: [2023-03-16 18:53:30,904] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 1 +0: [2023-03-16 18:53:30,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:30,933] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 3 +0: [2023-03-16 18:53:30,955] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 3 +1: [2023-03-16 18:53:30,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:30,968] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 11 +1: [2023-03-16 18:53:30,987] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 11 +0: [2023-03-16 18:53:31,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:31,003] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 4 +0: [2023-03-16 18:53:31,025] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 4 +1: [2023-03-16 18:53:31,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:31,057] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 12 +1: [2023-03-16 18:53:31,080] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 12 +0: [2023-03-16 18:53:31,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:31,104] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 5 +0: [2023-03-16 18:53:31,124] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 5 +0: [2023-03-16 18:53:31,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:31,158] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 7 +0: [2023-03-16 18:53:31,183] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 7 +1: [2023-03-16 18:53:31,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:31,199] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 13 +0: [2023-03-16 18:53:31,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:31,201] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 6 +1: [2023-03-16 18:53:31,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:31,210] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 10 +1: [2023-03-16 18:53:31,222] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 13 +1: [2023-03-16 18:53:31,233] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 10 +0: [2023-03-16 18:53:31,237] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 6 +0: [2023-03-16 18:53:31,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:31,248] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 0 +0: [2023-03-16 18:53:31,274] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 0 +1: [2023-03-16 18:53:31,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:31,285] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 8 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +1: [2023-03-16 18:53:31,309] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 8 +1: [2023-03-16 18:53:31,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:31,348] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 14 +1: [2023-03-16 18:53:31,373] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 14 +1: [2023-03-16 18:53:31,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:31,395] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 15 +1: [2023-03-16 18:53:31,420] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 15 +0: successfully loaded checkpoint from checkpoints_1b1100m100m at iteration 0 +1: time (ms) | load-checkpoint: 10736.48 +0: estimated model parameters: 1.096338432 +0: estimated model parameters without embeddings: 1.002523648 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 18:53:32 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 6400 +0: test: 6400 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.018110 seconds +0: number of documents: 208931 +0: > dataset split: +0: train: +0: document indices in [0, 208931) total of 208931 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.080 seconds +0: total number of samples: 48805 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.030587 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_6400ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_6400ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_6400ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.071 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-16 18:53:42 +0: done with setup ... +0: training ... +1: time (ms) | model-and-optimizer-setup: 24149.63 | train/valid/test-data-iterators-setup: 9940.29 +0: [after training is done] datetime: 2023-03-16 18:53:42 +1: ----------------------------------------------------------------------------------------------------------------- +1: validation loss at the end of training for val data | lm loss value: 6.611002E+00 | lm loss PPL: 7.432277E+02 | +1: ----------------------------------------------------------------------------------------------------------------- +END 3324218: Thu 16 Mar 2023 06:54:35 PM EET diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..84dc09f1404e62138b2c53fe40f65bc6adc4c3f5 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f7ee4d93bff564e66205a145454dbac7a2cf42c4423906d34d03e814fa83bc2b +size 822259223 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5bf003f3b054f0a31eaaedaefb10423065223e77 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:287cdcfd7a451835801c49f9160adf8c524b76569a28233f0f3a502356c2fc6d +size 822259426 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fbcc202aea485245ddfe7e4eb42b58db67bb06a3 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:790409a6ee570e175f144cfabeab8a61132cf9df4a5e124d9eb8c2f5cc3d18b0 +size 822259490 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..01d085b41b9afda63f2aef23ad6f5174e66f4754 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5af46c5447311313047cfe3bc02a2b4bfce7c0841c60850446b8ef6f11f2910 +size 822259362 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0ff1ff02bf0896cd7cd850329b662668f2766017 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b0d22e26ea68c70bfe4921c2c130988877bc4cb599c34d38c39d4f2541215d3 +size 822259426 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..811789e1531615df9d74481451344307798a5663 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee71b1787d9a856cac362d4d1ceee0ee394bfac3ac73df51361d57bb0ea3dfc4 +size 822259426 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c0ca19a1fb15b7acfe6de4d0bccf84b248cda7ad --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9c479bd1b5d1375f503a073f6dae1265721845328c1917493761aa41b138260c +size 822259362 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..00c6dfa704bf8d3ee0f615b9e52ad5687745f9ed --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a9a199534bd0c3a8bf389f3d075aea7fccee0a87bf462be199b846d1b8e5f2b +size 822259095 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1c93ed0833dbe69946492b71f64076f89eb1edae --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:487087edd6fea22bf3188242ef125fab8f9dfd41abe180b1c1d1f49ed617fd87 +size 822259351 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f49d9c534571ce7a07f24e1320477e7282063802 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:edd55b059a4575e7adbb278e8dd9e9c6f62886b7f6714cec098d9a67b765369b +size 822259479 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9e3bf8cdd059637d5eeac4fea4bee0df4bf2e4e6 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06992459c912c50a9bcfb6abde4db0c5eb1177e1f3f631e217fafbdd4bf756b9 +size 822259351 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d054fd9ac76216bfa1b75676bde1c14fd07155a9 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34ffc840b242726bf505180bfad3ca39b03965fb515637330243c78eaad272e9 +size 822259351 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bd496dce06e03f2b3347aaac453b94d508b1c8b7 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:415d751d26f17a8c5592b51ae7ff74d8a3a52aba06f19463de13458891a1b6f7 +size 822259415 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..22671e465f33953ff50d6d92a4f960c533d07100 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b2c9d8eac7e78efeabf21b72b9d72a01619dc5bbc4962b803651b30d57f8ec31 +size 822259415 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4f1f265c7a2785142c8eb27b73af264f7cac8531 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c009cd241db7fc546dc68a8e21e47646f5a3ad20af1f0c6d0a98a10e20f83f4 +size 822259415 diff --git a/1b1100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/1b1100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..36e56413f8323a894899dde7df4c5f6b6c1bb290 --- /dev/null +++ b/1b1100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9db5f0ae59cd6b08bd3171d3799cbebf96bfc09d5f9528203c4e2d53798f0840 +size 822259351 diff --git a/1b1100m100m/global_step190/layer_01-model_00-model_states.pt b/1b1100m100m/global_step190/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f836e206114896c682c78d5aeb0eb26c2f4d2455 --- /dev/null +++ b/1b1100m100m/global_step190/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b7682eae96fe19ddfce5ab16e1d28f70055ec700014c85bdf5869849a29d80a4 +size 187630851 diff --git a/1b1100m100m/global_step190/layer_03-model_00-model_states.pt b/1b1100m100m/global_step190/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7690653dc9af3bc7a15e136c10ae57554ba7bd66 --- /dev/null +++ b/1b1100m100m/global_step190/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e6ab7e047a9b9aa0d1038af96fc7bcd591072ea80d3b899f1d4a4a9d8bfc42e +size 77121283 diff --git a/1b1100m100m/global_step190/layer_04-model_00-model_states.pt b/1b1100m100m/global_step190/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ca16c9d10b26a5e15630c6839ae461903b124208 --- /dev/null +++ b/1b1100m100m/global_step190/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84424c848c7ddeefec96544d4a11b0cbe2caa0ecd7077b47adedafa79d21bbfa +size 77121283 diff --git a/1b1100m100m/global_step190/layer_05-model_00-model_states.pt b/1b1100m100m/global_step190/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..04919f1f1a80e325326a4a8dc2ffac3a120b34e8 --- /dev/null +++ b/1b1100m100m/global_step190/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b22bfc9beceed98f8750cad269fabe3ec3487baa241fbaac425f2c0c95c09398 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_06-model_00-model_states.pt b/1b1100m100m/global_step190/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..713571099e5e7b0921d1c113b188853c15b9005f --- /dev/null +++ b/1b1100m100m/global_step190/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c977cf76c4e3d17efebf7e9eebab8003e0952323f00a2f298710890b6c7334f7 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_07-model_00-model_states.pt b/1b1100m100m/global_step190/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fd6573a79f817f7704ffc7a408d98db020f7c068 --- /dev/null +++ b/1b1100m100m/global_step190/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e0984d2cb2e1af93be06fd43e84e51582b39faa841e28fa37b9b008c4259c709 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_08-model_00-model_states.pt b/1b1100m100m/global_step190/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e2e7262e63980c4b585013ad3e5b551ff2283208 --- /dev/null +++ b/1b1100m100m/global_step190/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51ae1c92994f6f845559dc2c90bc19da88901aa23425ea10fbd8831d0dc0ca1f +size 77121283 diff --git a/1b1100m100m/global_step190/layer_09-model_00-model_states.pt b/1b1100m100m/global_step190/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..22b0fb535ffdaf39f75b2b7eb6a6771cec8e54d9 --- /dev/null +++ b/1b1100m100m/global_step190/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca9a45a35d9b1ec7bd2df50f2bc13fdd912b26bcc0d3163b389df74b51d01dc0 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_10-model_00-model_states.pt b/1b1100m100m/global_step190/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b7557ddadd81a272f848654daa6d8ca60cd3fa85 --- /dev/null +++ b/1b1100m100m/global_step190/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6aa1dd19f34240ea1f8b044963448ac55ace8439c375a4517ffa5f389fbc5ccd +size 77121283 diff --git a/1b1100m100m/global_step190/layer_11-model_00-model_states.pt b/1b1100m100m/global_step190/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fcf2b7eae7b3edf5744e3338326d266261f932aa --- /dev/null +++ b/1b1100m100m/global_step190/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0fd774eaa670c3c29bcc23b919181e925171b53ce0be898ad26879771e38de97 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_12-model_00-model_states.pt b/1b1100m100m/global_step190/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..52a3f43b3ffd1f91c829eb281f2293267d8738ca --- /dev/null +++ b/1b1100m100m/global_step190/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4e155e6b4ff38d34a2f17d21e0169a89d0216890f77d6c48f0aa66695fcafd5 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_13-model_00-model_states.pt b/1b1100m100m/global_step190/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3fe992c7fcd524b919a53eb7e07c1de784f7e786 --- /dev/null +++ b/1b1100m100m/global_step190/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c8bc26fc12995dd6c60e62e7fac2d123477e3fc8073a17b938a61748da975379 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_14-model_00-model_states.pt b/1b1100m100m/global_step190/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e03f2d50fb571f1859690e8d5a6a356582d15265 --- /dev/null +++ b/1b1100m100m/global_step190/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eeda6d79e15e11ece4164976eb04f71f036835510405e309737a59df8c759584 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_15-model_00-model_states.pt b/1b1100m100m/global_step190/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..578a695addf635ecf651af8bd30c60c6d56107c7 --- /dev/null +++ b/1b1100m100m/global_step190/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f4df40cc7e019d7e765ac801edf771bfdef305b877c7ffe04d807a42618f31b +size 77121283 diff --git a/1b1100m100m/global_step190/layer_16-model_00-model_states.pt b/1b1100m100m/global_step190/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d428ece815ad3c1c153293fe1d681b5ae22f32ad --- /dev/null +++ b/1b1100m100m/global_step190/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:41769706ab4f8aa972405d9d766bb28a6c5baa1435d0b0e83eb0eb7e5e0dc0bb +size 77121283 diff --git a/1b1100m100m/global_step190/layer_17-model_00-model_states.pt b/1b1100m100m/global_step190/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0485d70db455e2c6219d45ff303687842e5ff1d2 --- /dev/null +++ b/1b1100m100m/global_step190/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4d883b8f474e1557597ae0c2f2bcd1555bb2290f7ca4df62ef8f2fc18d069569 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_18-model_00-model_states.pt b/1b1100m100m/global_step190/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8840f1117da770c6e57e95c15d23f29dee1c1d05 --- /dev/null +++ b/1b1100m100m/global_step190/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b8a97ddcdffe77900158fd852130ba774f18655eaee6871b1595297be26e669a +size 77121283 diff --git a/1b1100m100m/global_step190/layer_19-model_00-model_states.pt b/1b1100m100m/global_step190/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fb4ab6438ad03da7f10a5b9c09dbb5706c676827 --- /dev/null +++ b/1b1100m100m/global_step190/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:be24b597cde81924dbd71bf5fdef5953823f2454a004b3830f79fb72faefc50a +size 77121283 diff --git a/1b1100m100m/global_step190/layer_20-model_00-model_states.pt b/1b1100m100m/global_step190/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3791ded08680305dcf52c742a445d9308e204090 --- /dev/null +++ b/1b1100m100m/global_step190/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:126f0e21841c84fe2dc8dd734032dd1266ccc665205656bbff0fe8860049ea15 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_21-model_00-model_states.pt b/1b1100m100m/global_step190/layer_21-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..af9e4e6ae459841008c690842c3284505586f6d7 --- /dev/null +++ b/1b1100m100m/global_step190/layer_21-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91e3020638d0861aa2f44b583e812123e334b35fbf154c49b6d82b4969d02076 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_22-model_00-model_states.pt b/1b1100m100m/global_step190/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5987e5060e9612bf40d469737eb7605d4f6f089e --- /dev/null +++ b/1b1100m100m/global_step190/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f1cb8763fd320f7280c999cf17e67e473710f0f3d3744aef5a8bb20ca80d2bc8 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_23-model_00-model_states.pt b/1b1100m100m/global_step190/layer_23-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0798983e0db69f6570817ed707708e6e5d1176d5 --- /dev/null +++ b/1b1100m100m/global_step190/layer_23-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f08e09df0bb835213ff06110a39be33fb52e5164f82b6f632df28451903f5820 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_24-model_00-model_states.pt b/1b1100m100m/global_step190/layer_24-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..56060209f0e73fa22bd08e16d504d0061056c5e2 --- /dev/null +++ b/1b1100m100m/global_step190/layer_24-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f9239fab666a880665db9701425c24b3bb4bba738bf1f2e3f80b8eb473e02f9 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_25-model_00-model_states.pt b/1b1100m100m/global_step190/layer_25-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..36322959900c61f6d054d6a9eba689a6ee2703ec --- /dev/null +++ b/1b1100m100m/global_step190/layer_25-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c1cb9539760ceb8d3a5feaebbe3602c001e514550a384f0c99588b6d9354075e +size 77121283 diff --git a/1b1100m100m/global_step190/layer_26-model_00-model_states.pt b/1b1100m100m/global_step190/layer_26-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..68c4cb4b776620b082324fd4a24d3bb409a1502b --- /dev/null +++ b/1b1100m100m/global_step190/layer_26-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f8521cd9b173032994dbdf4be5a48d101663fce9593ef4b120b294e918a38107 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_27-model_00-model_states.pt b/1b1100m100m/global_step190/layer_27-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fc0728723d0dc808698bd627967027e319eda7bc --- /dev/null +++ b/1b1100m100m/global_step190/layer_27-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f2c467310b68887875a61de801d9a923832a7905e3cab2d5d6573307cf534f3 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_28-model_00-model_states.pt b/1b1100m100m/global_step190/layer_28-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f749fc6c69941a8d9cf741727a455ee20449e69 --- /dev/null +++ b/1b1100m100m/global_step190/layer_28-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6ee344346297d3d49504e053a23a57a408771b21281f0be90be10e16a678f039 +size 77121283 diff --git a/1b1100m100m/global_step190/layer_30-model_00-model_states.pt b/1b1100m100m/global_step190/layer_30-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..734566e93037334f92efb77e15c2a09003928117 --- /dev/null +++ b/1b1100m100m/global_step190/layer_30-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd8f463862f526ee477bbf7f0b5b26d81b9f99388ae4c29980a81df02e511d8d +size 8387 diff --git a/1b1100m100m/global_step190/mp_rank_00_model_states.pt b/1b1100m100m/global_step190/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f58097132ca370192ca85231472a5bbde1a56a5 --- /dev/null +++ b/1b1100m100m/global_step190/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5f37b91fb360b93028ebe6045ae70bc3ad77b99405e1c31505e505d1dd464439 +size 43827 diff --git a/1b1100m100m/sbatch_1b1100m100m.sh b/1b1100m100m/sbatch_1b1100m100m.sh new file mode 100644 index 0000000000000000000000000000000000000000..926145a301659d73b0fcd1d388ccb5cb61f4f008 --- /dev/null +++ b/1b1100m100m/sbatch_1b1100m100m.sh @@ -0,0 +1,168 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=2 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 2-0:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b1100m100m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT +mkdir -p $CHECKPOINT_PATH +mkdir -p $TENSORBOARD_PATH + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=16 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1143M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 100 000 000 +# -> Samples: 48828.125 +TRAIN_SAMPLES=48_828 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 488 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + --checkpoint-activations \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b1100m100m/sbatch_1b1100m100mval.sh b/1b1100m100m/sbatch_1b1100m100mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..33959dbe55945b5e04b18f01ac361fbe1feec7db --- /dev/null +++ b/1b1100m100m/sbatch_1b1100m100mval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=2 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p small-g +#SBATCH -t 12:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b1100m100mval +VARIANT_CKPT=1b1100m100m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1143M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 31633480000 +# -> Samples: 15446035 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --override-lr-scheduler \ + --no-load-optim \ + --reset-progress \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b1100m100m/tensorboard_1b1100m100m/events.out.tfevents.1678973124.nid005096.97525.0 b/1b1100m100m/tensorboard_1b1100m100m/events.out.tfevents.1678973124.nid005096.97525.0 new file mode 100644 index 0000000000000000000000000000000000000000..930ddba2af6bea0de0ca400de6bd18eb150f8b9c --- /dev/null +++ b/1b1100m100m/tensorboard_1b1100m100m/events.out.tfevents.1678973124.nid005096.97525.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f9584598ec48eb92ce6b03347265335839c1682c9167eae4df3433e2dbcc936 +size 355630 diff --git a/1b1100m100m/tensorboard_1b1100m100mval/events.out.tfevents.1678985560.nid007372.76678.0 b/1b1100m100m/tensorboard_1b1100m100mval/events.out.tfevents.1678985560.nid007372.76678.0 new file mode 100644 index 0000000000000000000000000000000000000000..b0ac94f8b2843987223a118b0061f28b1d4b415c --- /dev/null +++ b/1b1100m100m/tensorboard_1b1100m100mval/events.out.tfevents.1678985560.nid007372.76678.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54ff5fc0cab006d30cc1f3ac559aaa852cabf324c542ed8219939d13f3485160 +size 980 diff --git a/1b112b400m/3319358.err b/1b112b400m/3319358.err new file mode 100644 index 0000000000000000000000000000000000000000..f4edbc1017e568f83a93e0fb1bee6de3d12a78b2 --- /dev/null +++ b/1b112b400m/3319358.err @@ -0,0 +1,2207 @@ +13: 2023-03-16 09:04:02.585530: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:02.585537: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:02.585545: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:02.585545: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:02.585542: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:02.585541: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:02.585563: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:02.585557: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:02.620195: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:02.620206: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:02.620202: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:02.620204: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:02.620195: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:02.620210: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:02.620211: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:02.620204: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:02.685633: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:02.685646: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:02.685652: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:02.685639: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:02.685656: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:02.685658: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:02.685660: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:02.685651: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:02.705302: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:02.705310: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:02.705315: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:02.705301: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:02.705306: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:02.705315: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:02.705324: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:02.705327: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:02.710570: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:02.710579: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:02.710577: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:02.710573: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:02.710573: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:02.710575: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:02.710582: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:02.710584: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:02.711227: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:02.711239: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:02.711236: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:02.711232: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:02.711250: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:02.711251: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:02.711256: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:02.711244: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:02.721509: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:02.721505: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:02.721500: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:02.721499: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:02.721511: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:02.721521: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:02.721511: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:02.721520: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:02.722480: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:02.722487: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:02.722490: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:02.722483: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:02.722479: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:02.722484: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:02.722481: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:02.722476: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:02.759904: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:02.759909: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:02.759903: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: 2023-03-16 09:04:02.759921: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:02.759930: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:02.759931: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:02.759921: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:02.759922: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:02.759934: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:02.759923: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:02.759917: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:02.759930: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:02.759914: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: 2023-03-16 09:04:02.759927: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:02.759923: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:02.759936: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:02.764701: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:02.764702: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:02.764708: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:02.764699: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:02.764709: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:02.764699: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:02.764703: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:02.764704: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:02.862501: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:02.862508: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:02.862512: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:02.862510: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:02.862507: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:02.862519: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:02.862508: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:02.862506: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:02.879086: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:02.879088: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:02.879090: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:02.879096: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:02.879102: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:02.879106: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:02.879100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:02.879101: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:02.880969: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:02.880981: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:02.880981: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:02.880979: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:02.880986: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:02.880974: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:02.880993: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:02.881005: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:02.943361: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:02.943361: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:02.943366: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:02.943359: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:02.943374: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:02.943370: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:02.943372: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:02.943374: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:03.021955: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:03.021962: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:03.021959: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:03.021969: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:03.021956: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:03.021965: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:03.021970: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:03.021965: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:04.349715: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:04.349727: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:04.349719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:04.349718: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:04.349730: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:04.349735: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:04.349729: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:04.349726: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:04.349917: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:04.349920: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:04.349925: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:04.349928: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:04.349928: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:04.349933: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:04.349933: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:04.349936: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:04.411956: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:04.411952: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:04.411962: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:04.411954: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:04.411965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:04.411968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:04.411963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:04.411973: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:04.412399: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:04.412404: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:04.412408: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:04.412414: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:04.412412: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:04.412415: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:04.412417: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:04.412420: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:04.429081: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:04.429091: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:04.429094: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:04.429081: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:04.429321: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:04.429098: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:04.429092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:04.429091: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:04.429100: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:04.429328: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:04.429327: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:04.429337: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:04.429335: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:04.429340: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:04.429343: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:04.429345: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:04.438493: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:04.438499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:04.438507: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:04.438507: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:04.438510: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:04.438515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:04.438516: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:04.438733: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:04.438503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:04.438740: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:04.438743: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:04.438741: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:04.438746: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:04.438748: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:04.438753: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:04.438754: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:04.472885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:04.472890: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:04.472901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:04.472896: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:04.472905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:04.472907: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:04.472904: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:04.472902: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:04.473309: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:04.473311: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:04.473315: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:04.473315: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:04.473316: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:04.473320: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:04.473319: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:04.473324: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:04.475291: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:04.475295: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:04.475298: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:04.475306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:04.475312: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:04.475313: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:04.475310: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:04.475309: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:04.475734: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:04.475740: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:04.475741: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:04.475743: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:04.475744: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:04.475744: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:04.475746: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:04.475748: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:04.478688: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:04.478687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:04.478700: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:04.478697: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:04.478698: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:04.478704: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:04.478695: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:04.478695: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:04.479204: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:04.479207: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:04.479209: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:04.479210: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:04.479211: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:04.479211: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:04.479214: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:04.479217: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:04.504865: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:04.504869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:04.504879: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:04.504879: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:04.504885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:04.504879: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:04.504882: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:04.504882: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:04.505302: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:04.505305: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:04.505307: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:04.505311: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:04.505312: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:04.505314: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:04.505316: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:04.505318: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:04.568092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:04.568092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:04.568095: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:04.568089: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:04.568098: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:04.568099: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:04.568087: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:04.568092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:04.568517: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:04.568518: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:04.568523: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:04.568528: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:04.568527: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:04.568528: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:04.568526: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:04.568534: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:04.581929: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:04.581939: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:04.581935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:04.581941: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:04.581942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:04.581935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:04.581938: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:04.581947: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:04.582139: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:04.582142: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:04.582142: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:04.582143: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:04.582145: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:04.582145: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:04.582151: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:04.582155: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:04.607018: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:04.607030: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:04.607027: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:04.607022: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:04.607036: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:04.607036: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:04.607038: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:04.607036: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:04.607442: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:04.607445: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:04.607453: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:04.607453: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:04.607455: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:04.607456: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:04.607459: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:04.607461: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:04.609566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:04.609572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:04.609579: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:04.609580: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:04.609570: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:04.609584: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:04.609582: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:04.609584: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:04.610027: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:04.610033: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:04.610039: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:04.610041: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:04.610044: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:04.610046: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:04.610045: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:04.610047: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:04.628257: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:04.628257: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:04.628266: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:04.628269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:04.628265: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:04.628267: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:04.628269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:04.628263: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:04.628655: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:04.628655: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:04.628662: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:04.628660: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:04.628666: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:04.628669: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:04.628671: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:04.628668: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:04.671590: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:04.671602: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:04.671595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:04.671594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:04.671614: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:04.671604: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:04.671610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:04.671606: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:04.671826: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:04.671829: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:04.671832: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:04.671830: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:04.671834: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:04.671836: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:04.671840: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:04.671846: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:04.693716: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:04.693717: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:04.693726: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:04.693725: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:04.693722: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:04.693723: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:04.693730: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:04.693738: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:04.694157: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:04.694158: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:04.694163: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:04.694164: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:04.694166: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:04.694166: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:04.694166: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:04.694170: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:04.892640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:04.892645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:04.892640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:04.892650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:04.892647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:04.892654: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:04.892654: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:04.892656: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:04.893093: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:04.893095: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:04.893097: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:04.893100: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:04.893101: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:04.893103: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:04.893105: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:04.893108: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:07.643801: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.643808: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.643803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.643811: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.643813: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.643809: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: 2023-03-16 09:04:07.644014: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.643816: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.643818: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.644020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.644024: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.644028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.644031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.644025: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.644028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.644028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.645606: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.645608: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.645609: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.645611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.645610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.645615: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.645617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.645621: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:07.645623: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:07.645625: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:07.645626: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:07.645626: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:07.645627: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:07.645629: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:07.645629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:07.645646: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:07.645347: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.645351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.645354: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.645352: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.645355: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.645363: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:07.645361: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:07.645358: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.645368: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:07.645372: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:07.645372: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:07.645376: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:07.645398: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.645398: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:07.645414: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:07.645416: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:07.715404: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.715415: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.715410: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.715419: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.715417: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.715420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.715421: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.715424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.717348: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.717351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.717353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.717355: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.717357: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.717356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.717366: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:07.717365: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:07.717367: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:07.717373: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:07.717374: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:07.717374: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:07.717396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.717397: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:07.717410: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:07.717411: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:07.747342: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.747340: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.747352: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.747353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.747349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.747349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.747350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.747355: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.748537: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.748542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.748546: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.748550: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.748551: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.748552: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.748554: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.748556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.749765: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.749766: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.749768: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.749769: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.749769: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.749775: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.749780: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:07.749781: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:07.749780: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:07.749789: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:07.749789: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:07.749792: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:07.749798: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.749801: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:07.749812: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:07.749813: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:07.750259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.750257: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.750259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.750261: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.750261: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.750265: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.750268: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.750276: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:07.750280: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:07.769527: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.769532: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.769537: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.769542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.769543: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.769544: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.769548: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.769550: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.771802: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.771804: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.771804: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.771803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.771808: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.771805: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.771806: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.771819: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:07.771818: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:07.771820: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:07.771822: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:07.771825: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:07.771825: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:07.771827: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:07.771835: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:07.771852: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:07.750279: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:07.750284: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:07.750284: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:07.750283: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:07.750287: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:07.750317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:07.750334: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:07.841335: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.841426: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 09:04:07.841332: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.841429: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 09:04:07.841350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.841440: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 09:04:07.841677: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 09:04:07.841348: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.841436: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 09:04:07.841678: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 09:04:07.841345: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.841444: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 09:04:07.841684: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 09:04:07.841351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.841446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 09:04:07.841345: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.841686: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.841450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:07.841356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.841686: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.841453: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.841687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.841689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.841692: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.843259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.843262: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.843266: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.843265: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.843267: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.843268: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.843270: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.843274: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:07.843275: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:07.843278: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:07.843273: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:07.843283: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:07.843285: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:07.843285: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:07.843287: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:07.843289: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:07.843457: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:07.843456: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:07.843456: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.843558: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: 2023-03-16 09:04:07.843460: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:07.843462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:07.843471: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:07.843471: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:07.843476: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:07.843560: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: 2023-03-16 09:04:07.843468: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.843562: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: 2023-03-16 09:04:07.843471: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:07.843471: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:07.843480: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:07.843479: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:07.843487: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:07.843488: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:07.843563: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: 2023-03-16 09:04:07.843489: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.843565: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.843567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.843575: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:07.843576: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:07.843581: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:07.843580: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:07.843581: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:07.843588: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:07.843598: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.843607: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:07.843614: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:07.843621: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:07.891313: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.891323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.891323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.891320: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:07.891484: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 09:04:07.891326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.891326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:07.891494: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.891331: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:07.891496: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.891333: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:07.891496: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:07.891499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:07.891501: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:07.891498: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:07.891502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.891814: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.891815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.891820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.891822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.891819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.891821: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.891823: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.891828: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:07.893698: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 09:04:07.893696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.893696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: 2023-03-16 09:04:07.893700: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.893701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: 2023-03-16 09:04:07.893700: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.893698: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: 2023-03-16 09:04:07.893703: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.893703: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: 2023-03-16 09:04:07.893700: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.893704: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: 2023-03-16 09:04:07.893701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:07.893707: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:07.893708: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.893714: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:07.893713: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:07.893716: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:07.893712: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:07.893712: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:07.893717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:07.893719: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:07.893720: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:07.893709: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: 2023-03-16 09:04:07.893720: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:07.893723: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:07.893724: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.893718: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:07.893721: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:07.893721: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:07.893726: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:07.893753: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:07.893769: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:07.894402: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.894407: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.894407: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.894412: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.894417: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:07.894414: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.894420: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:07.894422: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:07.894415: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.894416: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.894431: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:07.894433: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:07.894435: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:07.894436: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:07.894437: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:07.894456: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:07.968420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.968428: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.968430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.968427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.968431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.968436: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.968438: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.968438: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.970434: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.970435: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.970440: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.970441: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.970449: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:07.970449: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:07.970445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.970448: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.970450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.970459: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:07.970463: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:07.970465: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:07.970467: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:07.970468: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:07.970467: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:07.970488: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:08.021182: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.021187: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.021199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.021202: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.021199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.021203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.021206: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.021207: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.023272: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.023273: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.023276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.023276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.023277: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.023280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.023276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.023287: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:08.023289: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:08.023291: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:08.023294: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:08.023296: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:08.023297: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:08.023299: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:08.023339: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:08.023351: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:08.183455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.183452: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.183460: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.183462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.183465: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.183463: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.183469: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.183477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.185690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.185692: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.185691: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.185694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.185695: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.185696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.185698: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.185694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:08.185707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:08.185707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:08.185709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:08.185712: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:08.185712: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:08.185714: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:08.185716: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:08.185716: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:08.282000: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.281998: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.282007: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.282007: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.282020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.282016: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.282014: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.282016: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.284213: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.284214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.284216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.284217: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.284216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.284217: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.284219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.284218: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:08.284227: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:08.284227: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:08.284233: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:08.284235: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:08.284236: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:08.284237: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:08.284239: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:08.284241: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: Loading extension module scaled_upper_triang_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module scaled_masked_softmax_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module scaled_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module fused_mix_prec_layer_norm_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module fused_mix_prec_layer_norm_cuda... + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. +12: Successfully preprocessed all matching files. +12: Successfully preprocessed all matching files. +12: Successfully preprocessed all matching files. + 1: Successfully preprocessed all matching files. + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 1: Building extension module utils... + 1: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: + 2: + 2: + 2: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: + 3: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: + 4: + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: + 6: + 6: + 6: + 6: + 6: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: + 5: + 5: + 5: + 5: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: + 7: + 7: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: +10: +10: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: +13: +13: +13: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: +14: +14: +14: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +15: +15: +15: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 0: Building extension module utils... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module utils... + 9: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 0: + 0: + 0: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 1: + 1: + 1: Loading extension module utils...Loading extension module utils... + 1: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 9: Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + 9: + 9: + 9: + 9: Loading extension module utils... + 3: Loading extension module utils... + 9: Loading extension module utils... + 3: Loading extension module utils... + 9: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 6: Loading extension module utils... + 4: Loading extension module utils... + 6: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 5: Loading extension module utils... + 8: Loading extension module utils... + 6: Loading extension module utils... + 5: Loading extension module utils... + 6: Loading extension module utils... + 5: Loading extension module utils... + 6: Loading extension module utils... + 5: Loading extension module utils... + 6: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... +12: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 8: Loading extension module utils... +12: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 8: Loading extension module utils... + 8: Loading extension module utils... + 7: Loading extension module utils... + 8: Loading extension module utils... +12: Loading extension module utils... + 7: Loading extension module utils... + 8: Loading extension module utils... +12: Loading extension module utils... + 8: Loading extension module utils... + 7: Loading extension module utils... +12: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... +12: Loading extension module utils... +12: Loading extension module utils... + 7: Loading extension module utils... +12: Loading extension module utils... + 8: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +10: Loading extension module utils... +11: Loading extension module utils... +10: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +13: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... +13: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Loading extension module utils... +13: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... +13: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... +13: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... +13: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... +11: Loading extension module utils... +14: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... +14: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Loading extension module utils... +14: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +15: Loading extension module utils... +14: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... + 0: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 9: + 9: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: +12: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +12: +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +12: +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +12: +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 3: + 3: Loading extension module utils... + 3: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 3: + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 2: + 2: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 2: + 2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 2: + 2: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: + 5: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 5: + 5: Loading extension module utils...Loading extension module utils... + 5: + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 8: + 8: + 8: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 8: + 8: + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 4: + 4: Loading extension module utils...Loading extension module utils... + 4: + 4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 4: + 4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 4: + 4: + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 7: + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 7: + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Loading extension module utils... +11: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +11: +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +10: +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +10: +10: Loading extension module utils... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +13: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Loading extension module utils... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +13: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +13: +13: Loading extension module utils... +13: Loading extension module utils... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +13: +13: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +15: +15: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + 0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/1b112b400m/3319358.out b/1b112b400m/3319358.out new file mode 100644 index 0000000000000000000000000000000000000000..4b225c12649fde22244e6f87fec27caacd0906b2 --- /dev/null +++ b/1b112b400m/3319358.out @@ -0,0 +1,16421 @@ +Model parameters: d_model 1792 ffw_size 7168 kv_size 128 n_heads 14 n_layers 26 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 26 --hidden-size 1792 --num-attention-heads 14 --kv-channels 128 --ffn-hidden-size 7168 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 128 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-1b112b400mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --override-lr-scheduler --reset-progress --no-load-optim --log-interval 10 --save-interval 10000 --eval-interval 1 --eval-iters 100 --tensorboard-dir tensorboard_1b112b400mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_1b112b400m --load checkpoints_1b112b400m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3319358.json --zero-stage 0 +START 3319358: Thu 16 Mar 2023 09:03:41 AM EET + 0: + 0: + 0: ======================= ROCm System Management Interface ======================= + 0: ================================= Concise Info ================================= + 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 0: 0 54.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 2 46.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 4 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 6 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: ================================================================================ + 0: ============================= End of ROCm SMI Log ============================== +10: +10: +10: ======================= ROCm System Management Interface ======================= +10: ================================= Concise Info ================================= +10: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +10: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 2 42.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 4 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 6 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: ================================================================================ +10: ============================= End of ROCm SMI Log ============================== + 3: + 3: + 3: ======================= ROCm System Management Interface ======================= + 3: ================================= Concise Info ================================= + 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 3: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 2 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 4 47.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 6 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: ================================================================================ + 3: ============================= End of ROCm SMI Log ============================== + 4: + 4: + 4: ======================= ROCm System Management Interface ======================= + 4: ================================= Concise Info ================================= + 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 4: 0 45.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 2 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 4 43.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 6 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: ================================================================================ + 4: ============================= End of ROCm SMI Log ============================== +14: +14: +14: ======================= ROCm System Management Interface ======================= +14: ================================= Concise Info ================================= +14: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +14: 0 50.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 2 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 4 41.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 6 40.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: ================================================================================ +14: ============================= End of ROCm SMI Log ============================== + 1: + 1: + 1: ======================= ROCm System Management Interface ======================= + 1: ================================= Concise Info ================================= + 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 1: 0 49.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 2 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 4 47.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 5 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 6 42.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: ================================================================================ + 1: ============================= End of ROCm SMI Log ============================== +13: +13: +13: ======================= ROCm System Management Interface ======================= +13: ================================= Concise Info ================================= +13: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +13: 0 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 2 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 4 51.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 6 46.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 7 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: ================================================================================ +13: ============================= End of ROCm SMI Log ============================== + 7: + 7: + 7: ======================= ROCm System Management Interface ======================= + 7: ================================= Concise Info ================================= + 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 7: 0 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 2 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 4 45.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 6 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: ================================================================================ + 7: ============================= End of ROCm SMI Log ============================== + 2: + 2: + 2: ======================= ROCm System Management Interface ======================= + 2: ================================= Concise Info ================================= + 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 2: 0 48.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 2 38.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 4 39.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 6 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: ================================================================================ + 2: ============================= End of ROCm SMI Log ============================== + 6: + 6: + 6: ======================= ROCm System Management Interface ======================= + 6: ================================= Concise Info ================================= + 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 6: 0 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 2 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 3 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 4 50.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 6 46.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: ================================================================================ + 6: ============================= End of ROCm SMI Log ============================== + 8: + 8: + 8: ======================= ROCm System Management Interface ======================= + 8: ================================= Concise Info ================================= + 8: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 8: 0 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 2 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 4 47.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 5 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 6 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: ================================================================================ + 8: ============================= End of ROCm SMI Log ============================== +15: +15: +15: ======================= ROCm System Management Interface ======================= +15: ================================= Concise Info ================================= +15: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +15: 0 44.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 2 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 3 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 4 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 6 38.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: ================================================================================ +15: ============================= End of ROCm SMI Log ============================== + 5: + 5: + 5: ======================= ROCm System Management Interface ======================= + 5: ================================= Concise Info ================================= + 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 5: 0 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 2 45.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 4 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 6 40.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: ================================================================================ + 5: ============================= End of ROCm SMI Log ============================== + 9: + 9: + 9: ======================= ROCm System Management Interface ======================= + 9: ================================= Concise Info ================================= + 9: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 9: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 2 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 4 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 6 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: ================================================================================ + 9: ============================= End of ROCm SMI Log ============================== +11: +11: +11: ======================= ROCm System Management Interface ======================= +11: ================================= Concise Info ================================= +11: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +11: 0 48.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 2 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 4 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 6 44.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: ================================================================================ +11: ============================= End of ROCm SMI Log ============================== +12: +12: +12: ======================= ROCm System Management Interface ======================= +12: ================================= Concise Info ================================= +12: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +12: 0 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 2 39.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 3 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 4 44.0c 79.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 6 41.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: ================================================================================ +12: ============================= End of ROCm SMI Log ============================== + 8: Launching on nid006717 (8/16), master nid006709 port 9999, GPUs 8, CUDA: True + 2: Launching on nid006711 (2/16), master nid006709 port 9999, GPUs 8, CUDA: True +10: Launching on nid006719 (10/16), master nid006709 port 9999, GPUs 8, CUDA: True +15: Launching on nid006724 (15/16), master nid006709 port 9999, GPUs 8, CUDA: True + 9: Launching on nid006718 (9/16), master nid006709 port 9999, GPUs 8, CUDA: True + 4: Launching on nid006713 (4/16), master nid006709 port 9999, GPUs 8, CUDA: True + 1: Launching on nid006710 (1/16), master nid006709 port 9999, GPUs 8, CUDA: True + 7: Launching on nid006716 (7/16), master nid006709 port 9999, GPUs 8, CUDA: True +12: Launching on nid006721 (12/16), master nid006709 port 9999, GPUs 8, CUDA: True + 3: Launching on nid006712 (3/16), master nid006709 port 9999, GPUs 8, CUDA: True + 5: Launching on nid006714 (5/16), master nid006709 port 9999, GPUs 8, CUDA: True +14: Launching on nid006723 (14/16), master nid006709 port 9999, GPUs 8, CUDA: True +11: Launching on nid006720 (11/16), master nid006709 port 9999, GPUs 8, CUDA: True + 0: Launching on nid006709 (0/16), master nid006709 port 9999, GPUs 8, CUDA: True +13: Launching on nid006722 (13/16), master nid006709 port 9999, GPUs 8, CUDA: True + 6: Launching on nid006715 (6/16), master nid006709 port 9999, GPUs 8, CUDA: True + 0: using world size: 128, data-parallel-size: 128, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 + 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. + 0: using torch.bfloat16 for parameters ... + 0: ------------------------ arguments ------------------------ + 0: abort_on_unmet_fused_kernel_constraints ......... False + 0: accumulate_allreduce_grads_in_fp32 .............. True + 0: adam_beta1 ...................................... 0.9 + 0: adam_beta2 ...................................... 0.999 + 0: adam_eps ........................................ 1e-08 + 0: adlr_autoresume ................................. False + 0: adlr_autoresume_interval ........................ 1000 + 0: apply_query_key_layer_scaling ................... True + 0: apply_residual_connection_post_layernorm ........ False + 0: attention_dropout ............................... 0.1 + 0: attention_softmax_in_fp32 ....................... False + 0: bert_binary_head ................................ True + 0: bert_load ....................................... None + 0: bf16 ............................................ True + 0: bias_dropout_fusion ............................. True + 0: bias_gelu_fusion ................................ True + 0: biencoder_projection_dim ........................ 0 + 0: biencoder_shared_query_context_model ............ False + 0: block_data_path ................................. None + 0: checkpoint_activations .......................... False + 0: checkpoint_in_cpu ............................... False + 0: checkpoint_num_layers ........................... 1 + 0: clip_grad ....................................... 1.0 + 0: codecarbon_dir .................................. None + 0: consumed_train_samples .......................... 0 + 0: consumed_train_tokens ........................... 0 + 0: consumed_valid_samples .......................... 0 + 0: contigious_checkpointing ........................ False + 0: cpu_optimizer ................................... False + 0: cpu_torch_adam .................................. False + 0: curriculum_learning ............................. False + 0: data_impl ....................................... mmap + 0: data_parallel_size .............................. 128 + 0: data_path ....................................... None + 0: dataloader_type ................................. single + 0: DDP_impl ........................................ local + 0: decoder_seq_length .............................. None + 0: deepscale ....................................... False + 0: deepscale_config ................................ None + 0: deepspeed ....................................... True + 0: deepspeed_activation_checkpointing .............. False + 0: deepspeed_config ................................ ds_configs/3319358.json + 0: deepspeed_mpi ................................... False + 0: distribute_checkpointed_activations ............. False + 0: distributed_backend ............................. nccl + 0: embed_layernorm ................................. False + 0: embedding_path .................................. None + 0: encoder_seq_length .............................. 2048 + 0: eod_mask_loss ................................... False + 0: eval_interval ................................... 1 + 0: eval_iters ...................................... 100 + 0: eval_only ....................................... None + 0: evidence_data_path .............................. None + 0: exit_duration_in_mins ........................... None + 0: exit_interval ................................... None + 0: ffn_hidden_size ................................. 7168 + 0: finetune ........................................ False + 0: fp16 ............................................ False + 0: fp16_lm_cross_entropy ........................... False + 0: fp32_residual_connection ........................ False + 0: gigaflos_no_embeds .............................. 0 + 0: global_batch_size ............................... 128 + 0: glu_activation .................................. None + 0: hidden_dropout .................................. 0.1 + 0: hidden_size ..................................... 1792 + 0: hysteresis ...................................... 2 + 0: ict_head_size ................................... None + 0: ict_load ........................................ None + 0: img_dim ......................................... 224 + 0: indexer_batch_size .............................. 128 + 0: indexer_log_interval ............................ 1000 + 0: inference ....................................... False + 0: init_method_std ................................. 0.02 + 0: init_method_xavier_uniform ...................... False + 0: initial_loss_scale .............................. 4294967296 + 0: kill_switch_path ................................ kill-switch-1b112b400mval + 0: kv_channels ..................................... 128 + 0: layer_norm_fusion ............................... True + 0: layernorm_epsilon ............................... 1e-05 + 0: lazy_mpu_init ................................... None + 0: load ............................................ checkpoints_1b112b400m + 0: local_rank ...................................... None + 0: log_batch_size_to_tensorboard ................... True + 0: log_interval .................................... 10 + 0: log_learning_rate_to_tensorboard ................ True + 0: log_level ....................................... None + 0: log_level_replica ............................... None + 0: log_loss_scale_to_tensorboard ................... True + 0: log_num_zeros_in_grad ........................... False + 0: log_params_norm ................................. False + 0: log_path ........................................ None + 0: log_timers_to_tensorboard ....................... True + 0: log_validation_ppl_to_tensorboard ............... True + 0: loss_on_targets_only ............................ False + 0: loss_scale ...................................... None + 0: loss_scale_window ............................... 1000 + 0: lr .............................................. 0.0002 + 0: lr_decay_iters .................................. None + 0: lr_decay_samples ................................ 1 + 0: lr_decay_style .................................. cosine + 0: lr_decay_tokens ................................. None + 0: lr_warmup_fraction .............................. None + 0: lr_warmup_iters ................................. 0 + 0: lr_warmup_samples ............................... 0 + 0: make_vocab_size_divisible_by .................... 128 + 0: mask_prob ....................................... 0.15 + 0: masked_softmax_fusion ........................... True + 0: max_position_embeddings ......................... 2048 + 0: mean_noise_span_length .......................... None + 0: memory_centric_tiled_linear ..................... False + 0: merge_file ...................................... gpt2/merges.txt + 0: micro_batch_size ................................ 1 + 0: min_loss_scale .................................. 1.0 + 0: min_lr .......................................... 2e-05 + 0: mmap_warmup ..................................... False + 0: no_load_optim ................................... True + 0: no_load_rng ..................................... None + 0: no_save_optim ................................... None + 0: no_save_rng ..................................... None + 0: noise_density ................................... None + 0: num_attention_heads ............................. 14 + 0: num_channels .................................... 3 + 0: num_classes ..................................... 1000 + 0: num_layers ...................................... 26 + 0: num_layers_per_virtual_pipeline_stage ........... None + 0: num_workers ..................................... 2 + 0: onnx_safe ....................................... None + 0: openai_gelu ..................................... False + 0: optimizer ....................................... adam + 0: optimizer_fusion ................................ True + 0: override_lr_scheduler ........................... True + 0: pad_vocab_size_to ............................... None + 0: params_dtype .................................... torch.bfloat16 + 0: partition_activations ........................... False + 0: patch_dim ....................................... 16 + 0: pipeline_model_parallel_size .................... 1 + 0: position_embedding_type ......................... PositionEmbeddingType.absolute + 0: pp_partition_method ............................. None + 0: profile_backward ................................ False + 0: query_in_block_prob ............................. 0.1 + 0: rampup_batch_size ............................... None + 0: rank ............................................ 0 + 0: remote_device ................................... none + 0: reset_attention_mask ............................ False + 0: reset_position_ids .............................. False + 0: reset_progress .................................. True + 0: retriever_report_topk_accuracies ................ [] + 0: retriever_score_scaling ......................... False + 0: retriever_seq_length ............................ 256 + 0: reweight_loss_based_on_position_frequency ....... False + 0: sample_rate ..................................... 1.0 + 0: save ............................................ checkpoints_1b112b400m + 0: save_interval ................................... 10000 + 0: scatter_gather_tensors_in_pipeline .............. True + 0: scattered_embeddings ............................ False + 0: seed ............................................ 1234 + 0: seq_length ...................................... 2048 + 0: sgd_momentum .................................... 0.9 + 0: short_seq_prob .................................. 0.1 + 0: skip_train_iteration_range ...................... None + 0: split ........................................... None + 0: split_transformers .............................. False + 0: sync_tp_duplicated_parameters ................... False + 0: synchronize_each_layer .......................... False + 0: tensor_model_parallel_size ...................... 1 + 0: tensorboard_dir ................................. tensorboard_1b112b400mval + 0: tensorboard_log_interval ........................ 1 + 0: tensorboard_queue_size .......................... 5 + 0: test_weighted_split_paths ....................... None + 0: test_weighted_split_paths_path .................. None + 0: tile_factor ..................................... 1 + 0: titles_data_path ................................ None + 0: tokenizer_name_or_path .......................... None + 0: tokenizer_type .................................. GPT2BPETokenizer + 0: train_iters ..................................... None + 0: train_samples ................................... 1 + 0: train_tokens .................................... None + 0: train_weighted_split_names ...................... ['train'] + 0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] + 0: train_weighted_split_paths_path ................. None + 0: train_weighted_split_splits ..................... [['0:1']] + 0: train_weighted_split_weights .................... [['1.0']] + 0: universal_checkpoint ............................ False + 0: use_bnb_optimizer ............................... False + 0: use_checkpoint_lr_scheduler ..................... False + 0: use_contiguous_buffers_in_ddp ................... True + 0: use_cpu_initialization .......................... None + 0: use_one_sent_docs ............................... False + 0: use_pin_memory .................................. False + 0: valid_num_workers ............................... 2 + 0: valid_weighted_split_names ...................... ['validation'] + 0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] + 0: valid_weighted_split_paths_path ................. None + 0: valid_weighted_split_splits ..................... [['0:1']] + 0: valid_weighted_split_weights .................... [['1.0']] + 0: virtual_pipeline_model_parallel_size ............ None + 0: vocab_extra_ids ................................. 0 + 0: vocab_file ...................................... gpt2/vocab.json + 0: weight_decay .................................... 0.1 + 0: world_size ...................................... 128 + 0: zero_allgather_bucket_size ...................... 0.0 + 0: zero_contigious_gradients ....................... False + 0: zero_reduce_bucket_size ......................... 0.0 + 0: zero_reduce_scatter ............................. False + 0: zero_stage ...................................... 0 + 0: -------------------- end of arguments --------------------- + 0: setting number of micro-batches to constant 1 + 0: > building GPT2BPETokenizer tokenizer ... + 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) + 0: DeepSpeed general environment info: + 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] + 0: torch version .................... 1.13.0+rocm5.2 + 0: torch cuda version ............... None + 0: torch hip version ................ 5.2.21151-afdc89f8 + 0: nvcc version ..................... None + 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] + 0: deepspeed info ................... 0.7.5, unknown, unknown + 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 + 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** + 0: > initializing torch distributed ... + 0: [2023-03-16 09:04:21,508] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +15: > setting tensorboard ... + 0: > initializing tensor model parallel with size 1 + 0: > initializing pipeline model parallel with size 1 + 0: > setting random seeds to 1234 ... + 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 + 0: > compiling dataset index builder ... + 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: make: Nothing to be done for 'default'. + 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: >>> done with dataset index builder. Compilation time: 0.099 seconds + 0: WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations. + 0: > compiling and loading fused kernels ... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 102 + 0: [1/1] c++ scaled_masked_softmax_hip.o scaled_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 67 + 0: ninja: no work to do. + 0: >>> done with compiling and loading fused kernels. Compilation time: 26.965 seconds + 0: time to initialize megatron (seconds): 22.969 + 0: [after megatron is initialized] datetime: 2023-03-16 09:04:54 + 0: building GPT model ... + 0: [2023-03-16 09:04:55,109] [INFO] [utils.py:827:see_memory_usage] Before Building Model + 0: [2023-03-16 09:04:55,110] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB + 0: [2023-03-16 09:04:55,110] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.42 GB, percent = 6.2% + 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None + 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi + 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 + 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63, ProcessCoord(pipe=0, data=64, model=0): 64, ProcessCoord(pipe=0, data=65, model=0): 65, ProcessCoord(pipe=0, data=66, model=0): 66, ProcessCoord(pipe=0, data=67, model=0): 67, ProcessCoord(pipe=0, data=68, model=0): 68, ProcessCoord(pipe=0, data=69, model=0): + 0: 69, ProcessCoord(pipe=0, data=70, model=0): 70, ProcessCoord(pipe=0, data=71, model=0): 71, ProcessCoord(pipe=0, data=72, model=0): 72, ProcessCoord(pipe=0, data=73, model=0): 73, ProcessCoord(pipe=0, data=74, model=0): 74, ProcessCoord(pipe=0, data=75, model=0): 75, ProcessCoord(pipe=0, data=76, model=0): 76, ProcessCoord(pipe=0, data=77, model=0): 77, ProcessCoord(pipe=0, data=78, model=0): 78, ProcessCoord(pipe=0, data=79, model=0): 79, ProcessCoord(pipe=0, data=80, model=0): 80, ProcessCoord(pipe=0, data=81, model=0): 81, ProcessCoord(pipe=0, data=82, model=0): 82, ProcessCoord(pipe=0, data=83, model=0): 83, ProcessCoord(pipe=0, data=84, model=0): 84, ProcessCoord(pipe=0, data=85, model=0): 85, ProcessCoord(pipe=0, data=86, model=0): 86, ProcessCoord(pipe=0, data=87, model=0): 87, ProcessCoord(pipe=0, data=88, model=0): 88, ProcessCoord(pipe=0, data=89, model=0): 89, ProcessCoord(pipe=0, data=90, model=0): 90, ProcessCoord(pipe=0, data=91, model=0): 91, ProcessCoord(pipe=0, data=92, model=0): 92, Process + 0: Coord(pipe=0, data=93, model=0): 93, ProcessCoord(pipe=0, data=94, model=0): 94, ProcessCoord(pipe=0, data=95, model=0): 95, ProcessCoord(pipe=0, data=96, model=0): 96, ProcessCoord(pipe=0, data=97, model=0): 97, ProcessCoord(pipe=0, data=98, model=0): 98, ProcessCoord(pipe=0, data=99, model=0): 99, ProcessCoord(pipe=0, data=100, model=0): 100, ProcessCoord(pipe=0, data=101, model=0): 101, ProcessCoord(pipe=0, data=102, model=0): 102, ProcessCoord(pipe=0, data=103, model=0): 103, ProcessCoord(pipe=0, data=104, model=0): 104, ProcessCoord(pipe=0, data=105, model=0): 105, ProcessCoord(pipe=0, data=106, model=0): 106, ProcessCoord(pipe=0, data=107, model=0): 107, ProcessCoord(pipe=0, data=108, model=0): 108, ProcessCoord(pipe=0, data=109, model=0): 109, ProcessCoord(pipe=0, data=110, model=0): 110, ProcessCoord(pipe=0, data=111, model=0): 111, ProcessCoord(pipe=0, data=112, model=0): 112, ProcessCoord(pipe=0, data=113, model=0): 113, ProcessCoord(pipe=0, data=114, model=0): 114, ProcessCoord(pipe=0, data=115, mo + 0: del=0): 115, ProcessCoord(pipe=0, data=116, model=0): 116, ProcessCoord(pipe=0, data=117, model=0): 117, ProcessCoord(pipe=0, data=118, model=0): 118, ProcessCoord(pipe=0, data=119, model=0): 119, ProcessCoord(pipe=0, data=120, model=0): 120, ProcessCoord(pipe=0, data=121, model=0): 121, ProcessCoord(pipe=0, data=122, model=0): 122, ProcessCoord(pipe=0, data=123, model=0): 123, ProcessCoord(pipe=0, data=124, model=0): 124, ProcessCoord(pipe=0, data=125, model=0): 125, ProcessCoord(pipe=0, data=126, model=0): 126, ProcessCoord(pipe=0, data=127, model=0): 127} + 0: [2023-03-16 09:04:59,204] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer + 0: stage=0 layers=33 + 0: 0: _to_float16 + 0: 1: EmbeddingPipe + 0: 2: + 0: 3: ParallelTransformerLayerPipe + 0: 4: ParallelTransformerLayerPipe + 0: 5: ParallelTransformerLayerPipe + 0: 6: ParallelTransformerLayerPipe + 0: 7: ParallelTransformerLayerPipe + 0: 8: ParallelTransformerLayerPipe + 0: 9: ParallelTransformerLayerPipe + 0: 10: ParallelTransformerLayerPipe + 0: 11: ParallelTransformerLayerPipe + 0: 12: ParallelTransformerLayerPipe + 0: 13: ParallelTransformerLayerPipe + 0: 14: ParallelTransformerLayerPipe + 0: 15: ParallelTransformerLayerPipe + 0: 16: ParallelTransformerLayerPipe + 0: 17: ParallelTransformerLayerPipe + 0: 18: ParallelTransformerLayerPipe + 0: 19: ParallelTransformerLayerPipe + 0: 20: ParallelTransformerLayerPipe + 0: 21: ParallelTransformerLayerPipe + 0: 22: ParallelTransformerLayerPipe + 0: 23: ParallelTransformerLayerPipe + 0: 24: ParallelTransformerLayerPipe + 0: 25: ParallelTransformerLayerPipe + 0: 26: ParallelTransformerLayerPipe + 0: 27: ParallelTransformerLayerPipe + 0: 28: ParallelTransformerLayerPipe + 0: 29: undo + 0: 30: MixedFusedLayerNorm + 0: 31: EmbeddingPipe + 0: 32: float16_to_fp32 + 0: loss: CrossEntropy + 0: [2023-03-16 09:04:59,441] [INFO] [utils.py:827:see_memory_usage] After Building Model + 0: [2023-03-16 09:04:59,442] [INFO] [utils.py:828:see_memory_usage] MA 2.05 GB Max_MA 2.05 GB CA 2.19 GB Max_CA 2 GB + 0: [2023-03-16 09:04:59,442] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: setting training iterations to 0 + 0: > learning rate decay style: cosine + 0: DeepSpeed is enabled. + 0: [2023-03-16 09:04:59,444] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown + 0: [2023-03-16 09:05:13,649] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False + 0: [2023-03-16 09:05:13,650] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer + 0: [2023-03-16 09:05:13,650] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer + 0: [2023-03-16 09:05:13,661] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam + 0: [2023-03-16 09:05:13,661] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer + 0: [2023-03-16 09:05:13,778] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer + 0: [2023-03-16 09:05:13,778] [INFO] [utils.py:828:see_memory_usage] MA 2.04 GB Max_MA 2.06 GB CA 2.19 GB Max_CA 2 GB + 0: [2023-03-16 09:05:13,779] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.17 GB, percent = 6.4% + 1: ninja: no work to do. + 1: Time to load utils op: 0.1542186737060547 seconds + 0: ninja: no work to do. + 0: Time to load utils op: 0.1504199504852295 seconds + 9: Time to load utils op: 0.30939149856567383 seconds + 1: Time to load utils op: 0.0005350112915039062 seconds + 0: Time to load utils op: 0.0005583763122558594 seconds + 0: Time to load utils op: 0.2020092010498047 seconds + 0: Time to load utils op: 0.20209527015686035 seconds + 0: Time to load utils op: 0.2024247646331787 seconds + 0: Time to load utils op: 0.20238661766052246 seconds + 0: Time to load utils op: 0.2027587890625 seconds + 0: Time to load utils op: 0.2023153305053711 seconds + 1: Time to load utils op: 0.20245862007141113 seconds + 1: Time to load utils op: 0.20239782333374023 secondsTime to load utils op: 0.20281195640563965 seconds + 1: + 1: Time to load utils op: 0.20214176177978516 secondsTime to load utils op: 0.20317363739013672 secondsTime to load utils op: 0.2026665210723877 seconds + 1: + 1: + 9: Time to load utils op: 0.0008494853973388672 seconds + 1: Time to load utils op: 0.20223116874694824 seconds + 9: Time to load utils op: 0.20374393463134766 secondsTime to load utils op: 0.2026219367980957 seconds + 9: + 9: Time to load utils op: 0.2026207447052002 seconds + 9: Time to load utils op: 0.2027294635772705 seconds + 9: Time to load utils op: 0.2030489444732666 secondsTime to load utils op: 0.20397162437438965 seconds + 9: + 9: Time to load utils op: 0.2028803825378418 seconds + 2: Time to load utils op: 0.21309185028076172 seconds + 2: Time to load utils op: 0.21309590339660645 seconds + 2: Time to load utils op: 0.21312189102172852 secondsTime to load utils op: 0.21312332153320312 seconds + 2: + 2: Time to load utils op: 0.2131333351135254 secondsTime to load utils op: 0.21313738822937012 seconds + 2: + 2: Time to load utils op: 0.21314239501953125 secondsTime to load utils op: 0.21314573287963867 seconds + 2: + 3: Time to load utils op: 0.21308541297912598 secondsTime to load utils op: 0.2130870819091797 seconds + 3: + 3: Time to load utils op: 0.21311473846435547 seconds + 3: Time to load utils op: 0.21312952041625977 seconds + 3: Time to load utils op: 0.21311473846435547 seconds + 3: Time to load utils op: 0.2131354808807373 seconds + 3: Time to load utils op: 0.21314406394958496 secondsTime to load utils op: 0.21311402320861816 seconds + 3: + 4: Time to load utils op: 0.21307945251464844 secondsTime to load utils op: 0.21309399604797363 seconds + 4: Time to load utils op: 0.2130875587463379 seconds + 4: + 4: Time to load utils op: 0.21309828758239746 secondsTime to load utils op: 0.21309876441955566 seconds + 4: + 4: Time to load utils op: 0.2131030559539795 secondsTime to load utils op: 0.21309757232666016 secondsTime to load utils op: 0.21309852600097656 seconds + 4: + 4: + 0: Time to load utils op: 0.0004355907440185547 seconds + 8: Time to load utils op: 0.20891571044921875 seconds + 8: Time to load utils op: 0.20610666275024414 seconds + 8: Time to load utils op: 0.20579981803894043 seconds + 8: Time to load utils op: 0.206573486328125 secondsTime to load utils op: 0.20652198791503906 seconds + 8: + 8: Time to load utils op: 0.20667719841003418 secondsTime to load utils op: 0.20657777786254883 seconds + 8: + 8: Time to load utils op: 0.20467042922973633 seconds + 0: Time to load utils op: 0.00035858154296875 seconds + 6: Time to load utils op: 0.21251392364501953 seconds + 6: Time to load utils op: 0.21252894401550293 seconds + 6: Time to load utils op: 0.2125546932220459 secondsTime to load utils op: 0.21257400512695312 seconds + 6: + 6: Time to load utils op: 0.2125711441040039 seconds + 6: Time to load utils op: 0.21257638931274414 seconds + 6: Time to load utils op: 0.21258759498596191 secondsTime to load utils op: 0.2125849723815918 seconds + 6: + 0: Time to load utils op: 0.0003807544708251953 seconds + 0: Time to load utils op: 0.0004227161407470703 seconds + 1: Time to load utils op: 0.00041174888610839844 seconds + 1: Time to load utils op: 0.00036454200744628906 seconds + 5: Time to load utils op: 0.21213269233703613 seconds + 5: Time to load utils op: 0.21213388442993164 seconds + 5: Time to load utils op: 0.21215248107910156 seconds + 5: Time to load utils op: 0.21216964721679688 secondsTime to load utils op: 0.21160650253295898 seconds + 5: + 5: Time to load utils op: 0.21217918395996094 secondsTime to load utils op: 0.21218252182006836 seconds + 5: Time to load utils op: 0.21218132972717285 seconds + 5: + 1: Time to load utils op: 0.00036144256591796875 seconds + 0: Time to load utils op: 0.0004096031188964844 seconds + 0: Time to load utils op: 0.00037407875061035156 seconds + 1: Time to load utils op: 0.0003685951232910156 seconds +12: Time to load utils op: 0.21060657501220703 secondsTime to load utils op: 0.21057724952697754 secondsTime to load utils op: 0.2113196849822998 secondsTime to load utils op: 0.21175885200500488 seconds +12: Time to load utils op: 0.21116971969604492 seconds +12: +12: +12: Time to load utils op: 0.21065568923950195 seconds +12: Time to load utils op: 0.21085453033447266 seconds +12: +12: Time to load utils op: 0.21170330047607422 seconds + 1: Time to load utils op: 0.00044989585876464844 seconds + 1: Time to load utils op: 0.00039124488830566406 seconds + 1: Time to load utils op: 0.00038814544677734375 seconds + 7: Time to load utils op: 0.21297526359558105 seconds + 7: Time to load utils op: 0.2129809856414795 secondsTime to load utils op: 0.21298646926879883 seconds + 7: + 7: Time to load utils op: 0.2129664421081543 secondsTime to load utils op: 0.21300029754638672 seconds + 7: Time to load utils op: 0.21300125122070312 seconds + 7: Time to load utils op: 0.21299958229064941 seconds + 7: Time to load utils op: 0.21300172805786133 seconds + 7: +11: Time to load utils op: 0.21074724197387695 secondsTime to load utils op: 0.21074581146240234 seconds +11: +11: Time to load utils op: 0.20843958854675293 seconds +11: Time to load utils op: 0.20875310897827148 seconds +11: Time to load utils op: 0.20984959602355957 seconds +11: Time to load utils op: 0.20394396781921387 seconds +11: Time to load utils op: 0.2084958553314209 secondsTime to load utils op: 0.20815634727478027 seconds +11: +10: Time to load utils op: 0.21200108528137207 secondsTime to load utils op: 0.21199321746826172 secondsTime to load utils op: 0.2120068073272705 seconds +10: +10: +10: Time to load utils op: 0.21201443672180176 secondsTime to load utils op: 0.21201586723327637 secondsTime to load utils op: 0.21201443672180176 secondsTime to load utils op: 0.21201300621032715 seconds +10: +10: +10: +10: Time to load utils op: 0.21198201179504395 seconds + 0: Time to load utils op: 0.30383896827697754 seconds +13: Time to load utils op: 0.21091866493225098 seconds +13: Time to load utils op: 0.21093344688415527 seconds +13: Time to load utils op: 0.21095919609069824 seconds +13: Time to load utils op: 0.21097064018249512 seconds +13: Time to load utils op: 0.21097016334533691 seconds +13: Time to load utils op: 0.210982084274292 seconds +13: Time to load utils op: 0.2109835147857666 seconds +13: Time to load utils op: 0.21098685264587402 seconds +14: Time to load utils op: 0.2115476131439209 seconds +14: Time to load utils op: 0.21155810356140137 seconds +14: Time to load utils op: 0.2115793228149414 seconds +14: Time to load utils op: 0.21159887313842773 seconds +14: Time to load utils op: 0.21161246299743652 seconds +14: Time to load utils op: 0.21160674095153809 seconds +14: Time to load utils op: 0.21161127090454102 secondsTime to load utils op: 0.2116069793701172 seconds +14: +15: Time to load utils op: 0.21077585220336914 seconds +15: Time to load utils op: 0.21080589294433594 seconds +15: Time to load utils op: 0.21082735061645508 seconds +15: Time to load utils op: 0.2108306884765625 seconds +15: Time to load utils op: 0.21084380149841309 seconds +15: Time to load utils op: 0.21084094047546387 secondsTime to load utils op: 0.2108449935913086 seconds +15: +15: Time to load utils op: 0.2108597755432129 seconds + 9: Time to load utils op: 0.0003666877746582031 seconds + 9: Time to load utils op: 0.0004267692565917969 seconds + 9: Time to load utils op: 0.0004451274871826172 seconds + 9: Time to load utils op: 0.0004248619079589844 seconds + 9: Time to load utils op: 0.00038695335388183594 seconds + 9: Time to load utils op: 0.0003788471221923828 seconds + 9: Time to load utils op: 0.000400543212890625 seconds +12: Time to load utils op: 0.0008668899536132812 seconds +12: Time to load utils op: 0.0011548995971679688 seconds +12: Time to load utils op: 0.0010800361633300781 seconds +12: Time to load utils op: 0.0011496543884277344 seconds +12: Time to load utils op: 0.0011627674102783203 seconds +12: Time to load utils op: 0.0011441707611083984 seconds +12: Time to load utils op: 0.001154184341430664 seconds +12: Time to load utils op: 0.0011911392211914062 seconds + 3: Time to load utils op: 0.0009493827819824219 seconds + 3: Time to load utils op: 0.0012094974517822266 seconds + 2: Time to load utils op: 0.0007729530334472656 seconds + 2: Time to load utils op: 0.0009183883666992188 seconds + 2: Time to load utils op: 0.0010046958923339844 seconds + 3: Time to load utils op: 0.0014197826385498047 seconds + 3: Time to load utils op: 0.001354217529296875 secondsTime to load utils op: 0.001422882080078125 seconds + 3: + 3: Time to load utils op: 0.0013422966003417969 seconds + 3: Time to load utils op: 0.001383066177368164 seconds + 2: Time to load utils op: 0.0010237693786621094 secondsTime to load utils op: 0.0011298656463623047 seconds + 2: Time to load utils op: 0.0009982585906982422 seconds + 2: + 3: Time to load utils op: 0.001279592514038086 seconds + 2: Time to load utils op: 0.0010378360748291016 seconds + 2: Time to load utils op: 0.0011289119720458984 seconds + 5: Time to load utils op: 0.0013179779052734375 seconds + 5: Time to load utils op: 0.0013620853424072266 seconds + 5: Time to load utils op: 0.001421213150024414 seconds + 5: Time to load utils op: 0.001508951187133789 seconds + 5: Time to load utils op: 0.0015566349029541016 secondsTime to load utils op: 0.0015969276428222656 seconds + 5: + 5: Time to load utils op: 0.001577615737915039 seconds + 5: Time to load utils op: 0.0016162395477294922 seconds + 8: Time to load utils op: 0.00034737586975097656 seconds + 8: Time to load utils op: 0.0006024837493896484 seconds + 8: Time to load utils op: 0.00038814544677734375 seconds + 8: Time to load utils op: 0.00045609474182128906 secondsTime to load utils op: 0.0004513263702392578 seconds + 8: + 8: Time to load utils op: 0.00043320655822753906 seconds + 4: Time to load utils op: 0.0007638931274414062 seconds + 8: Time to load utils op: 0.00044083595275878906 seconds + 8: Time to load utils op: 0.000400543212890625 seconds + 4: Time to load utils op: 0.0010175704956054688 seconds + 4: Time to load utils op: 0.0011565685272216797 seconds + 4: Time to load utils op: 0.0010867118835449219 secondsTime to load utils op: 0.001148223876953125 seconds + 4: Time to load utils op: 0.0011472702026367188 seconds + 4: Time to load utils op: 0.0011563301086425781 seconds + 4: + 4: Time to load utils op: 0.0012745857238769531 seconds + 6: Time to load utils op: 0.0009140968322753906 seconds + 6: Time to load utils op: 0.0009081363677978516 seconds + 6: Time to load utils op: 0.0009577274322509766 seconds + 6: Time to load utils op: 0.0010030269622802734 seconds + 6: Time to load utils op: 0.0010082721710205078 seconds + 6: Time to load utils op: 0.0011107921600341797 seconds + 6: Time to load utils op: 0.0010933876037597656 seconds + 6: Time to load utils op: 0.0012478828430175781 seconds + 7: Time to load utils op: 0.0008780956268310547 secondsTime to load utils op: 0.0009453296661376953 seconds + 7: + 7: Time to load utils op: 0.0009407997131347656 seconds + 7: Time to load utils op: 0.0011615753173828125 secondsTime to load utils op: 0.0011434555053710938 seconds + 7: + 7: Time to load utils op: 0.001188039779663086 secondsTime to load utils op: 0.0011055469512939453 seconds + 7: + 7: Time to load utils op: 0.0013303756713867188 seconds +11: Time to load utils op: 0.000591278076171875 seconds +11: Time to load utils op: 0.0004887580871582031 seconds +11: Time to load utils op: 0.00042366981506347656 seconds +11: Time to load utils op: 0.00041365623474121094 seconds +11: Time to load utils op: 0.00042366981506347656 secondsTime to load utils op: 0.0004317760467529297 seconds +11: +11: Time to load utils op: 0.0004246234893798828 seconds +11: Time to load utils op: 0.00043320655822753906 seconds +10: Time to load utils op: 0.0011551380157470703 secondsTime to load utils op: 0.0011363029479980469 seconds +10: +10: Time to load utils op: 0.00131988525390625 seconds +10: Time to load utils op: 0.0013051033020019531 seconds +10: Time to load utils op: 0.0013222694396972656 seconds +10: Time to load utils op: 0.001379251480102539 secondsTime to load utils op: 0.0012760162353515625 seconds +10: +10: Time to load utils op: 0.0013866424560546875 seconds +13: Time to load utils op: 0.0009090900421142578 seconds +13: Time to load utils op: 0.0009565353393554688 seconds +13: Time to load utils op: 0.0013301372528076172 secondsTime to load utils op: 0.0013587474822998047 seconds +13: +14: Time to load utils op: 0.0011680126190185547 seconds +14: Time to load utils op: 0.0011553764343261719 seconds +14: Time to load utils op: 0.0011429786682128906 secondsTime to load utils op: 0.0011391639709472656 seconds +14: +13: Time to load utils op: 0.001468658447265625 seconds +13: Time to load utils op: 0.0014612674713134766 seconds +13: Time to load utils op: 0.0014657974243164062 seconds +14: Time to load utils op: 0.0012974739074707031 seconds +14: Time to load utils op: 0.0013039112091064453 seconds +13: Time to load utils op: 0.001434326171875 seconds +15: Time to load utils op: 0.0009810924530029297 seconds +15: Time to load utils op: 0.0008761882781982422 seconds +14: Time to load utils op: 0.0012552738189697266 seconds +14: Time to load utils op: 0.0013184547424316406 seconds +15: Time to load utils op: 0.0009417533874511719 seconds +15: Time to load utils op: 0.0010821819305419922 seconds +15: Time to load utils op: 0.0011479854583740234 seconds +15: Time to load utils op: 0.001138448715209961 seconds +15: Time to load utils op: 0.0011305809020996094 seconds +15: Time to load utils op: 0.0011281967163085938 seconds + 0: [2023-03-16 09:05:14,201] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 + 0: [2023-03-16 09:05:14,202] [INFO] [utils.py:828:see_memory_usage] MA 2.04 GB Max_MA 2.04 GB CA 2.19 GB Max_CA 2 GB + 0: [2023-03-16 09:05:14,202] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.33 GB, percent = 6.4% + 0: [2023-03-16 09:05:14,315] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 + 0: [2023-03-16 09:05:14,315] [INFO] [utils.py:828:see_memory_usage] MA 4.23 GB Max_MA 4.23 GB CA 5.44 GB Max_CA 5 GB + 0: [2023-03-16 09:05:14,315] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.33 GB, percent = 6.4% + 0: [2023-03-16 09:05:14,417] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 + 0: [2023-03-16 09:05:14,417] [INFO] [utils.py:828:see_memory_usage] MA 4.23 GB Max_MA 4.23 GB CA 5.44 GB Max_CA 5 GB + 0: [2023-03-16 09:05:14,418] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.33 GB, percent = 6.4% + 0: [2023-03-16 09:05:14,522] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 + 0: [2023-03-16 09:05:14,522] [INFO] [utils.py:828:see_memory_usage] MA 6.16 GB Max_MA 6.16 GB CA 8.31 GB Max_CA 8 GB + 0: [2023-03-16 09:05:14,522] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.33 GB, percent = 6.4% + 0: [2023-03-16 09:05:14,624] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 + 0: [2023-03-16 09:05:14,624] [INFO] [utils.py:828:see_memory_usage] MA 6.16 GB Max_MA 6.16 GB CA 8.31 GB Max_CA 8 GB + 0: [2023-03-16 09:05:14,624] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.34 GB, percent = 6.4% + 0: [2023-03-16 09:05:14,729] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 + 0: [2023-03-16 09:05:14,730] [INFO] [utils.py:828:see_memory_usage] MA 6.16 GB Max_MA 6.16 GB CA 8.31 GB Max_CA 8 GB + 0: [2023-03-16 09:05:14,730] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.33 GB, percent = 6.4% + 0: [2023-03-16 09:05:14,830] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer + 0: [2023-03-16 09:05:14,830] [INFO] [utils.py:828:see_memory_usage] MA 6.16 GB Max_MA 6.16 GB CA 8.31 GB Max_CA 8 GB + 0: [2023-03-16 09:05:14,831] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.33 GB, percent = 6.4% + 0: [2023-03-16 09:05:14,937] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer + 0: [2023-03-16 09:05:14,937] [INFO] [utils.py:828:see_memory_usage] MA 6.22 GB Max_MA 6.22 GB CA 8.31 GB Max_CA 8 GB + 0: [2023-03-16 09:05:14,937] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.33 GB, percent = 6.4% + 0: [2023-03-16 09:05:15,039] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer + 0: [2023-03-16 09:05:15,040] [INFO] [utils.py:828:see_memory_usage] MA 6.22 GB Max_MA 6.22 GB CA 8.31 GB Max_CA 8 GB + 0: [2023-03-16 09:05:15,040] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.33 GB, percent = 6.4% + 0: [2023-03-16 09:05:15,040] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam + 0: [2023-03-16 09:05:15,040] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler + 0: [2023-03-16 09:05:15,040] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = + 0: [2023-03-16 09:05:15,040] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] + 0: [2023-03-16 09:05:15,041] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: + 0: [2023-03-16 09:05:15,041] [INFO] [config.py:1011:print] activation_checkpointing_config { + 0: "partition_activations": false, + 0: "contiguous_memory_optimization": false, + 0: "cpu_checkpointing": false, + 0: "number_checkpoints": null, + 0: "synchronize_checkpoint_boundary": false, + 0: "profile": false + 0: } + 0: [2023-03-16 09:05:15,041] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} + 0: [2023-03-16 09:05:15,041] [INFO] [config.py:1011:print] amp_enabled .................. False + 0: [2023-03-16 09:05:15,041] [INFO] [config.py:1011:print] amp_params ................... False + 0: [2023-03-16 09:05:15,041] [INFO] [config.py:1011:print] autotuning_config ............ { + 0: "enabled": false, + 0: "start_step": null, + 0: "end_step": null, + 0: "metric_path": null, + 0: "arg_mappings": null, + 0: "metric": "throughput", + 0: "model_info": null, + 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", + 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", + 0: "overwrite": true, + 0: "fast": true, + 0: "start_profile_step": 3, + 0: "end_profile_step": 5, + 0: "tuner_type": "gridsearch", + 0: "tuner_early_stopping": 5, + 0: "tuner_num_trials": 50, + 0: "model_info_path": null, + 0: "mp_size": 1, + 0: "max_train_batch_size": null, + 0: "min_train_batch_size": 1, + 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, + 0: "min_train_micro_batch_size_per_gpu": 1, + 0: "num_tuning_micro_batch_sizes": 3 + 0: } + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] bfloat16_enabled ............. True + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] comms_config ................. + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] communication_data_type ...... None + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa + 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] curriculum_enabled ........... False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] curriculum_params ............ False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] dataloader_drop_last ......... False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] disable_allgather ............ False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] dump_state ................... False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] elasticity_enabled ........... False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] flops_profiler_config ........ { + 0: "enabled": false, + 0: "profile_step": 1, + 0: "module_depth": -1, + 0: "top_modules": 1, + 0: "detailed": true, + 0: "output_file": null + 0: } + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] fp16_auto_cast ............... None + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] fp16_enabled ................. False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] global_rank .................. 0 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] load_universal_checkpoint .... False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] loss_scale ................... 1.0 + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] memory_breakdown ............. False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] monitor_config ............... + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] nebula_config ................ { + 0: "enabled": false, + 0: "persistent_storage_path": null, + 0: "persistent_time_interval": 100, + 0: "num_of_version_in_retention": 2, + 0: "enable_nebula_load": true, + 0: "load_path": null + 0: } + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] optimizer_name ............... None + 0: [2023-03-16 09:05:15,042] [INFO] [config.py:1011:print] optimizer_params ............. None + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] pld_enabled .................. False + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] pld_params ................... False + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] prescale_gradients ........... False + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] scheduler_name ............... None + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] scheduler_params ............. None + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] sparse_attention ............. None + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] steps_per_print .............. 2000 + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] train_batch_size ............. 128 + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 1 + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] use_node_local_storage ....... False + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] world_size ................... 128 + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] zero_enabled ................. False + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 + 0: [2023-03-16 09:05:15,043] [INFO] [config.py:996:print_user_config] json = { + 0: "train_micro_batch_size_per_gpu": 1, + 0: "train_batch_size": 128, + 0: "gradient_clipping": 1.0, + 0: "zero_optimization": { + 0: "stage": 0 + 0: }, + 0: "bf16": { + 0: "enabled": true + 0: }, + 0: "steps_per_print": 2.000000e+03, + 0: "wall_clock_breakdown": false + 0: } + 0: Time to load utils op: 0.0004038810729980469 seconds + 0: [2023-03-16 09:05:15,043] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=1 + 0: [2023-03-16 09:05:15,097] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=33 [0, 33) STAGE_PARAMS=1096338432 (1096.338M) TOTAL_PARAMS=1096338432 (1096.338M) UNIQUE_PARAMS=1096338432 (1096.338M) +14: [2023-03-16 09:05:15,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:15,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:15,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:15,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:15,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:15,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:15,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:15,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:15,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:15,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:15,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:15,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:15,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:15,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:15,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:15,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:15,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:15,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:15,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:15,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:15,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:15,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:15,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:15,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:15,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:15,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:15,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:15,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:15,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:15,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:15,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:15,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:15,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:15,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:15,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:15,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:15,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:15,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:15,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:15,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:15,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:15,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:15,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:15,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:15,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:15,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:15,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:15,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:15,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:15,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:15,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:15,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:15,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:16,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:16,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:16,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:16,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:16,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:16,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:16,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:16,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:16,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:16,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:16,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:16,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:16,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:16,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:16,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:16,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:16,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:16,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:16,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:16,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:16,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:16,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:16,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:16,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:16,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:16,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:16,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:16,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:16,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:16,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:16,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:16,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:16,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:16,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:16,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:16,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:16,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:16,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:16,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:16,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:16,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:16,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:16,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:16,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:16,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:16,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:16,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:16,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:16,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:16,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:16,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:16,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:16,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:16,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:16,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:16,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:16,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:16,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:16,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:16,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:16,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:16,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:16,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:16,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:16,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:16,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:16,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:16,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:16,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:16,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:16,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:16,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:16,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:17,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:17,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:17,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:17,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:17,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:17,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:17,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:17,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:17,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:17,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:17,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:17,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:17,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:17,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:17,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:17,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:17,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:17,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:17,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:17,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:17,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:17,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:17,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:17,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:17,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:17,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:17,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:17,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:17,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:17,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:17,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:17,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:17,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:17,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:17,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:17,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:17,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:17,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:17,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:17,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:17,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:17,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:17,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:17,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:17,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:17,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:17,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:17,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:17,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:17,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:17,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:17,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:17,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:17,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:17,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:17,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:17,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:17,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:17,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:17,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:17,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:17,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:17,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:17,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:17,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:17,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:17,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:17,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:17,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:17,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:17,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:17,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:17,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:17,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:17,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:17,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:17,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:17,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:17,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:17,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:17,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:17,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:17,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:17,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:17,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:17,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:17,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:17,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:17,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:17,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:17,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:17,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:17,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:17,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:17,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:17,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:17,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:18,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:18,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:18,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:18,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:18,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:18,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:18,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:18,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:18,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:18,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:18,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:18,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:18,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:18,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:18,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:18,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:18,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:18,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:18,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:18,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:18,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:18,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:18,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:18,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:18,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:18,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:18,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:18,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:18,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:18,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:18,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:18,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:18,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:18,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:18,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:18,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:18,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:18,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:18,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:18,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:18,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:18,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:18,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:18,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:18,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:18,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:18,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:18,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:18,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:18,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:18,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:18,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:18,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:18,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:18,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:18,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:18,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:18,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:18,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:18,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:18,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:18,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:18,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:18,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:18,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:18,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:18,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:18,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:18,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:18,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:18,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:18,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:18,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:18,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:18,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:18,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:18,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:18,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:18,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:18,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:18,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:18,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:19,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:19,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:19,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:19,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:19,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:19,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:19,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:19,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:19,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:19,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:19,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:19,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:19,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:19,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:19,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:19,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:19,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:19,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:19,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:19,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:19,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:19,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:19,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:19,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:19,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:19,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:19,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:19,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:19,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:19,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:19,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:19,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:19,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:19,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:19,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:19,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:19,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:19,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:19,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:19,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:19,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:19,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:19,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:19,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:19,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:19,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:19,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:19,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:19,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:19,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:19,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:19,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:19,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:19,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:19,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:19,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:19,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:19,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:19,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:19,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:19,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:19,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:19,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:19,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:19,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:19,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:19,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:19,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:19,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:19,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:19,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:19,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:19,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:19,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:19,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:19,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:19,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:19,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:19,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:19,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:19,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:19,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:19,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:19,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:19,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:19,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:19,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:19,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:19,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:19,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:19,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:19,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:19,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:19,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:19,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:20,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:20,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:20,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:20,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:20,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:20,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:20,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:20,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:20,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:20,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:20,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:20,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:20,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:20,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:20,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:20,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:20,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:20,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:20,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:20,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:20,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:20,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:20,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:20,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:20,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:20,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:20,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:20,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:20,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:20,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:20,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:20,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:20,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:20,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:20,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:20,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:20,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:20,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:20,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:20,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:20,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:20,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:20,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:20,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:20,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:20,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:20,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:20,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:20,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:20,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:20,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:20,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:20,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:20,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:20,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:20,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:20,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:20,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:20,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:20,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:20,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:20,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:20,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:20,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:20,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:20,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:20,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:20,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:20,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:20,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:20,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:20,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:20,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:20,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:20,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:20,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:20,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:20,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:20,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:20,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:20,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:20,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:20,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:20,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:20,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:20,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:20,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:20,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:20,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:20,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:20,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:20,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:20,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:20,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:20,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:20,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:20,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:20,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:20,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:20,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:20,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:20,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:20,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:20,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:20,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:20,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:21,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:21,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:21,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:21,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:21,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:21,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:21,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:21,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:21,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:21,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:21,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:21,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:21,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:21,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:21,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:21,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:21,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:21,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:21,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:21,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:21,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:21,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:21,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:21,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:21,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:21,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:21,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:21,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:21,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:21,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:21,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:21,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:21,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:21,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:21,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:21,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:21,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:21,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:21,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:21,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:21,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:21,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:21,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:21,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:21,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:21,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:21,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:21,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:21,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:21,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:21,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:21,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:21,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:21,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:21,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:21,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:21,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:21,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:21,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:21,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:21,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:21,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:21,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:21,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:21,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:21,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:21,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:21,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:21,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:21,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:21,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:21,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:21,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:21,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:21,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:21,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:21,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:21,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:21,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:21,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:21,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:22,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:22,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:22,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:22,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:22,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:22,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:22,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:22,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:22,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:22,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:22,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:22,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:22,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:22,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:22,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:22,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:22,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:22,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:22,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:22,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:22,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:22,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:22,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:22,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:22,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:22,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:22,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:22,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:22,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:22,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:22,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:22,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:22,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:22,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:22,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:22,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:22,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:22,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:22,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:22,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:22,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:22,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:22,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:22,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:22,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:22,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:22,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:22,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:22,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:22,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:22,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:22,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:22,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:22,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:22,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:22,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:22,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:22,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:22,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:22,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:22,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:22,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:22,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:22,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:22,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:22,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:22,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:22,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:22,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:22,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:22,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:22,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:22,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:22,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:23,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:23,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:23,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:23,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:23,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:23,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:23,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:23,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:23,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:23,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:23,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:23,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:23,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:23,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:23,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:23,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:23,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:23,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:23,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:23,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:23,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:23,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:23,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:23,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:23,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:23,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:23,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:23,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:23,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:23,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:23,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:23,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:23,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:23,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:23,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:23,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:23,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:23,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:23,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:23,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:23,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:23,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:23,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:23,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:23,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:23,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:23,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:23,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:23,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:23,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:23,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:23,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:23,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: > overriding learning rate value to 0.0002 + 0: > overriding minimum learning rate value to 2e-05 + 0: > overriding warmup iterations value to 0 + 0: > overriding total number of iterations value to 1 + 0: > overriding decay style value to cosine + 2: [2023-03-16 09:05:23,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:23,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:23,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:23,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:23,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:23,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:23,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:23,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:23,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:23,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:23,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:23,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:23,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:23,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:23,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:23,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:23,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:23,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:23,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:23,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:23,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:23,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:23,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:23,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:23,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:23,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:23,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:23,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:23,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:23,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:23,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:23,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:23,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:23,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:23,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:23,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:23,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:23,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:23,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:23,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:23,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:23,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:23,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:23,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:23,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:23,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:23,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:23,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:23,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:23,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:23,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:23,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:23,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:23,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:23,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:23,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:23,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:23,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:23,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:23,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:23,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:23,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:23,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:23,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:23,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:23,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:23,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:23,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:23,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:23,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:23,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:23,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:23,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:23,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:23,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:23,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:23,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:23,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:23,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:23,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:23,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:23,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:23,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:23,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:23,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:23,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:23,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:23,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:23,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:23,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:23,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:23,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:23,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:23,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:23,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:23,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:23,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:23,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:23,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:23,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:23,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:23,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:23,873] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 88 + 4: [2023-03-16 09:05:23,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:23,877] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 39 +10: [2023-03-16 09:05:23,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:23,882] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 80 + 4: [2023-03-16 09:05:23,891] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 39 +10: [2023-03-16 09:05:23,898] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 80 + 1: [2023-03-16 09:05:23,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:23,902] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 10 +11: [2023-03-16 09:05:23,904] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 88 + 9: [2023-03-16 09:05:23,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:23,913] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 77 + 6: [2023-03-16 09:05:23,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:23,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:23,913] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 53 +10: [2023-03-16 09:05:23,914] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 83 + 8: [2023-03-16 09:05:23,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:23,918] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 64 + 2: [2023-03-16 09:05:23,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:23,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:23,921] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 45 + 2: [2023-03-16 09:05:23,921] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 20 +15: [2023-03-16 09:05:23,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:23,925] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 120 + 9: [2023-03-16 09:05:23,928] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 77 + 7: [2023-03-16 09:05:23,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:23,929] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 61 + 4: [2023-03-16 09:05:23,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:23,931] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 64 + 4: [2023-03-16 09:05:23,932] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 33 + 0: [2023-03-16 09:05:23,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:23,934] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 5 +11: [2023-03-16 09:05:23,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:23,939] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 93 + 1: [2023-03-16 09:05:23,940] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 10 + 2: [2023-03-16 09:05:23,941] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 20 + 0: [2023-03-16 09:05:23,944] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 5 +13: [2023-03-16 09:05:23,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:23,947] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 61 +13: [2023-03-16 09:05:23,947] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 105 +10: [2023-03-16 09:05:23,952] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 83 + 3: [2023-03-16 09:05:23,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:23,954] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 24 + 5: [2023-03-16 09:05:23,955] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 45 + 1: [2023-03-16 09:05:23,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:23,962] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 8 +13: [2023-03-16 09:05:23,962] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 105 +14: [2023-03-16 09:05:23,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:23,964] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 113 + 2: [2023-03-16 09:05:23,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:23,965] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 21 + 3: [2023-03-16 09:05:23,968] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 24 + 8: [2023-03-16 09:05:23,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:23,968] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 53 + 8: [2023-03-16 09:05:23,969] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 68 + 5: [2023-03-16 09:05:23,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:23,970] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 42 +15: [2023-03-16 09:05:23,973] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 120 +12: [2023-03-16 09:05:23,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:23,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:23,976] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 101 +11: [2023-03-16 09:05:23,976] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 94 + 9: [2023-03-16 09:05:23,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:23,978] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 74 + 7: [2023-03-16 09:05:23,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:23,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:23,980] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 59 +15: [2023-03-16 09:05:23,980] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 123 +10: [2023-03-16 09:05:23,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:23,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:23,982] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 84 + 4: [2023-03-16 09:05:23,982] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 38 + 6: [2023-03-16 09:05:23,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:23,984] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 50 + 4: [2023-03-16 09:05:23,985] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 33 +11: [2023-03-16 09:05:23,988] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 93 + 2: [2023-03-16 09:05:23,996] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 21 + 1: [2023-03-16 09:05:23,998] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 8 + 3: [2023-03-16 09:05:23,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:23,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:24,000] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 31 +13: [2023-03-16 09:05:24,000] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 111 + 5: [2023-03-16 09:05:24,002] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 42 + 0: [2023-03-16 09:05:24,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:24,004] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 3 +14: [2023-03-16 09:05:24,005] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 113 + 8: [2023-03-16 09:05:24,005] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 68 + 9: [2023-03-16 09:05:24,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:24,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:24,010] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 72 +14: [2023-03-16 09:05:24,010] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 119 + 2: [2023-03-16 09:05:24,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:24,011] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 16 +12: [2023-03-16 09:05:24,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:24,012] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 96 + 1: [2023-03-16 09:05:24,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:24,015] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 11 +10: [2023-03-16 09:05:24,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:24,016] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 85 + 7: [2023-03-16 09:05:24,016] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 59 +12: [2023-03-16 09:05:24,017] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 101 + 6: [2023-03-16 09:05:24,022] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 50 + 8: [2023-03-16 09:05:24,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:24,023] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 66 + 4: [2023-03-16 09:05:24,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:24,025] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 34 +10: [2023-03-16 09:05:24,028] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 84 +11: [2023-03-16 09:05:24,031] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 94 +15: [2023-03-16 09:05:24,031] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 123 +15: [2023-03-16 09:05:24,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:24,032] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 124 +11: [2023-03-16 09:05:24,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:24,033] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 91 + 6: [2023-03-16 09:05:24,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:24,035] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 48 + 3: [2023-03-16 09:05:24,037] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 31 +13: [2023-03-16 09:05:24,039] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 111 + 5: [2023-03-16 09:05:24,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:24,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:24,040] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 41 + 7: [2023-03-16 09:05:24,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:24,041] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 18 + 7: [2023-03-16 09:05:24,041] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 56 +14: [2023-03-16 09:05:24,045] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 119 + 9: [2023-03-16 09:05:24,047] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 72 + 1: [2023-03-16 09:05:24,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:24,049] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 9 + 3: [2023-03-16 09:05:24,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:24,051] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 28 + 2: [2023-03-16 09:05:24,051] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 16 +14: [2023-03-16 09:05:24,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:24,052] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 117 +13: [2023-03-16 09:05:24,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:24,054] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 107 +15: [2023-03-16 09:05:24,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:24,054] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 96 +15: [2023-03-16 09:05:24,055] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 127 + 1: [2023-03-16 09:05:24,060] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 11 +10: [2023-03-16 09:05:24,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:24,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:24,064] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 99 +10: [2023-03-16 09:05:24,065] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 81 + 8: [2023-03-16 09:05:24,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:24,066] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 67 + 5: [2023-03-16 09:05:24,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:24,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:24,067] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 47 + 0: [2023-03-16 09:05:24,068] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 0 + 6: [2023-03-16 09:05:24,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:24,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:24,073] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 90 + 6: [2023-03-16 09:05:24,073] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 49 + 8: [2023-03-16 09:05:24,078] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 66 + 7: [2023-03-16 09:05:24,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:24,080] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 57 + 7: [2023-03-16 09:05:24,080] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 56 + 4: [2023-03-16 09:05:24,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:24,082] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 32 + 4: [2023-03-16 09:05:24,083] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 38 + 6: [2023-03-16 09:05:24,084] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 48 + 2: [2023-03-16 09:05:24,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:24,085] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 17 + 0: [2023-03-16 09:05:24,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:24,086] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 7 + 4: [2023-03-16 09:05:24,089] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 34 +15: [2023-03-16 09:05:24,100] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 124 + 5: [2023-03-16 09:05:24,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:24,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:24,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 43 + 3: [2023-03-16 09:05:24,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 25 +14: [2023-03-16 09:05:24,106] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 117 + 8: [2023-03-16 09:05:24,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:24,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:24,109] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 70 +15: [2023-03-16 09:05:24,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:24,110] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 87 +15: [2023-03-16 09:05:24,110] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 121 + 6: [2023-03-16 09:05:24,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:24,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:24,114] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 51 +12: [2023-03-16 09:05:24,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:24,114] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 118 +12: [2023-03-16 09:05:24,114] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 97 +10: [2023-03-16 09:05:24,115] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 81 + 2: [2023-03-16 09:05:24,116] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 18 +11: [2023-03-16 09:05:24,118] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 90 +11: [2023-03-16 09:05:24,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:24,122] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 89 +13: [2023-03-16 09:05:24,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:24,123] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 91 +13: [2023-03-16 09:05:24,124] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 108 + 0: [2023-03-16 09:05:24,126] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 3 + 9: [2023-03-16 09:05:24,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:24,130] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 78 + 4: [2023-03-16 09:05:24,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:24,134] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 36 + 9: [2023-03-16 09:05:24,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:24,135] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 75 + 0: [2023-03-16 09:05:24,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:24,139] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 1 + 8: [2023-03-16 09:05:24,141] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 67 + 0: [2023-03-16 09:05:24,142] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 7 + 0: [2023-03-16 09:05:24,142] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 0 +13: [2023-03-16 09:05:24,144] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 107 + 0: could not find arguments in the checkpoint ... + 0: checkpoint version 3.0 + 7: [2023-03-16 09:05:24,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:24,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:24,146] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 62 + 5: [2023-03-16 09:05:24,147] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 46 + 9: [2023-03-16 09:05:24,147] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 74 +10: [2023-03-16 09:05:24,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:24,149] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 86 + 1: [2023-03-16 09:05:24,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:24,152] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 13 + 7: [2023-03-16 09:05:24,153] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 57 + 3: [2023-03-16 09:05:24,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:24,155] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 26 +15: [2023-03-16 09:05:24,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:24,159] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 126 +13: [2023-03-16 09:05:24,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:24,158] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 106 + 9: [2023-03-16 09:05:24,161] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 78 + 6: [2023-03-16 09:05:24,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:24,162] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 54 + 3: [2023-03-16 09:05:24,163] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 25 + 2: [2023-03-16 09:05:24,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:24,164] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 22 +14: [2023-03-16 09:05:24,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:24,165] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 115 + 9: [2023-03-16 09:05:24,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:24,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:24,165] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 73 +12: [2023-03-16 09:05:24,166] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 100 + 6: [2023-03-16 09:05:24,166] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 51 +11: [2023-03-16 09:05:24,166] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 89 +13: [2023-03-16 09:05:24,167] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 108 +10: [2023-03-16 09:05:24,168] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 85 +12: [2023-03-16 09:05:24,169] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 99 + 3: [2023-03-16 09:05:24,169] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 28 + 2: [2023-03-16 09:05:24,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:24,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:24,172] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 92 + 2: [2023-03-16 09:05:24,172] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 23 + 7: [2023-03-16 09:05:24,173] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 62 + 0: [2023-03-16 09:05:24,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:24,178] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 4 + 8: [2023-03-16 09:05:24,181] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 70 + 8: [2023-03-16 09:05:24,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:24,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:24,187] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 69 + 4: [2023-03-16 09:05:24,188] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 35 + 4: [2023-03-16 09:05:24,195] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 36 + 5: [2023-03-16 09:05:24,196] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 47 +11: [2023-03-16 09:05:24,197] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 92 + 3: [2023-03-16 09:05:24,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:24,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:24,199] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 27 + 5: [2023-03-16 09:05:24,199] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 44 + 8: [2023-03-16 09:05:24,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:24,200] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 87 + 8: [2023-03-16 09:05:24,200] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 65 + 1: [2023-03-16 09:05:24,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:24,202] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 15 +15: [2023-03-16 09:05:24,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:24,203] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 125 + 1: [2023-03-16 09:05:24,206] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 9 +14: [2023-03-16 09:05:24,207] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 115 +14: [2023-03-16 09:05:24,208] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 118 + 2: [2023-03-16 09:05:24,208] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 17 +14: [2023-03-16 09:05:24,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:24,210] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 112 +12: [2023-03-16 09:05:24,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:24,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:24,211] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 98 + 6: [2023-03-16 09:05:24,211] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 55 +12: [2023-03-16 09:05:24,214] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 97 +12: [2023-03-16 09:05:24,215] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 100 +11: [2023-03-16 09:05:24,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:24,217] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 95 + 1: [2023-03-16 09:05:24,219] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 13 + 2: [2023-03-16 09:05:24,221] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 23 +15: [2023-03-16 09:05:24,221] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 126 + 8: [2023-03-16 09:05:24,223] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 69 + 1: [2023-03-16 09:05:24,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:24,226] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 4 + 1: [2023-03-16 09:05:24,226] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 14 + 1: [2023-03-16 09:05:24,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:24,228] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 12 +10: [2023-03-16 09:05:24,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:24,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:24,229] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 82 + 7: [2023-03-16 09:05:24,230] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 60 + 4: [2023-03-16 09:05:24,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:24,231] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 37 + 3: [2023-03-16 09:05:24,233] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 27 +13: [2023-03-16 09:05:24,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:24,234] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 110 +15: [2023-03-16 09:05:24,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:24,234] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 122 + 0: [2023-03-16 09:05:24,234] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 1 +15: [2023-03-16 09:05:24,236] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 125 + 9: [2023-03-16 09:05:24,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:24,239] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 79 + 6: [2023-03-16 09:05:24,245] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 54 + 2: [2023-03-16 09:05:24,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:24,249] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 19 + 3: [2023-03-16 09:05:24,249] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 26 +14: [2023-03-16 09:05:24,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:24,251] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 43 +14: [2023-03-16 09:05:24,251] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 114 +15: [2023-03-16 09:05:24,252] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 127 + 2: [2023-03-16 09:05:24,253] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 22 + 5: [2023-03-16 09:05:24,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:24,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:24,259] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 40 + 0: [2023-03-16 09:05:24,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:24,259] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 75 + 6: [2023-03-16 09:05:24,259] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 52 + 0: [2023-03-16 09:05:24,259] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 2 + 9: [2023-03-16 09:05:24,263] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 73 + 3: [2023-03-16 09:05:24,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:24,264] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 49 + 3: [2023-03-16 09:05:24,264] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 30 + 6: [2023-03-16 09:05:24,265] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 55 + 4: [2023-03-16 09:05:24,271] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 35 +13: [2023-03-16 09:05:24,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:24,272] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 104 + 7: [2023-03-16 09:05:24,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:24,277] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 58 + 8: [2023-03-16 09:05:24,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:24,277] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 71 + 0: [2023-03-16 09:05:24,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:24,279] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 6 +14: [2023-03-16 09:05:24,280] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 112 + 4: [2023-03-16 09:05:24,288] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 32 + 5: [2023-03-16 09:05:24,289] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 41 + 1: [2023-03-16 09:05:24,292] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 14 + 8: [2023-03-16 09:05:24,292] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 65 + 9: [2023-03-16 09:05:24,297] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 79 + 4: [2023-03-16 09:05:24,303] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 37 +12: [2023-03-16 09:05:24,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:24,307] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 103 + 9: [2023-03-16 09:05:24,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:24,308] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 76 +14: [2023-03-16 09:05:24,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:24,309] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 116 + 7: [2023-03-16 09:05:24,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:24,311] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 63 +13: [2023-03-16 09:05:24,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:24,313] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 109 + 1: [2023-03-16 09:05:24,321] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 15 + 7: [2023-03-16 09:05:24,323] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 58 + 1: [2023-03-16 09:05:24,328] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 12 +13: [2023-03-16 09:05:24,328] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 106 + 2: [2023-03-16 09:05:24,340] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 19 +10: [2023-03-16 09:05:24,346] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 86 +12: [2023-03-16 09:05:24,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:24,362] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 6 +12: [2023-03-16 09:05:24,362] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 102 +13: [2023-03-16 09:05:24,369] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 110 +15: [2023-03-16 09:05:24,378] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 121 + 5: [2023-03-16 09:05:24,384] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 44 + 7: [2023-03-16 09:05:24,389] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 60 +15: [2023-03-16 09:05:24,393] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 122 +10: [2023-03-16 09:05:24,395] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 82 +11: [2023-03-16 09:05:24,404] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 95 + 7: [2023-03-16 09:05:24,407] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 63 +12: [2023-03-16 09:05:24,412] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 98 + 3: [2023-03-16 09:05:24,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b112b400m/global_step23189/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:24,428] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 29 +12: [2023-03-16 09:05:24,430] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 103 + 8: [2023-03-16 09:05:24,435] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 71 + 0: [2023-03-16 09:05:24,440] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 2 + 9: [2023-03-16 09:05:24,456] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 76 + 3: [2023-03-16 09:05:24,464] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 30 + 6: [2023-03-16 09:05:24,467] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 52 + 5: [2023-03-16 09:05:24,471] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 40 + 3: [2023-03-16 09:05:24,475] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 29 + 5: [2023-03-16 09:05:24,488] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 46 +14: [2023-03-16 09:05:24,507] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 116 +14: [2023-03-16 09:05:24,508] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 114 +13: [2023-03-16 09:05:24,541] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 109 +13: [2023-03-16 09:05:24,549] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 104 +12: [2023-03-16 09:05:24,610] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 102 + 0: successfully loaded checkpoint from checkpoints_1b112b400m at iteration 0 +15: time (ms) | load-checkpoint: 9522.17 + 0: estimated model parameters: 1.096338432 + 0: estimated model parameters without embeddings: 1.002523648 + 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 09:05:24 + 0: > building train, validation, and test datasets ... + 0: > datasets target sizes (minimum size): + 0: train: 1 + 0: validation: 12800 + 0: test: 12800 + 0: > building train, validation, and test datasets for GPT ... + 0: > building dataset index ... + 0: reading sizes... + 0: reading pointers... + 0: reading document index... + 0: creating numpy buffer of mmap... + 0: creating memory view of numpy buffer... + 0: > finished creating indexed dataset in 0.007679 seconds + 0: number of documents: 208931 + 0: > dataset split: + 0: train: + 0: document indices in [0, 208931) total of 208931 documents + 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy + 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy + 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy + 0: loaded indexed file in 0.007 seconds + 0: total number of samples: 48805 + 0: total number of epochs: 1 + 0: > building dataset index ... + 0: reading sizes... + 0: reading pointers... + 0: reading document index... + 0: creating numpy buffer of mmap... + 0: creating memory view of numpy buffer... + 0: > finished creating indexed dataset in 0.049582 seconds + 0: number of documents: 364608 + 0: > dataset split: + 0: validation: + 0: document indices in [0, 364608) total of 364608 documents + 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_12800ns_2048sl_1234s_doc_idx.npy + 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_12800ns_2048sl_1234s_sample_idx.npy + 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_12800ns_2048sl_1234s_shuffle_idx.npy + 0: loaded indexed file in 0.080 seconds + 0: total number of samples: 84978 + 0: total number of epochs: 1 + 0: > finished creating GPT datasets ... + 0: [after dataloaders are built] datetime: 2023-03-16 09:05:39 + 0: done with setup ... + 0: training ... +15: time (ms) | model-and-optimizer-setup: 29905.44 | train/valid/test-data-iterators-setup: 13610.04 + 0: [after training is done] datetime: 2023-03-16 09:05:39 +15: ----------------------------------------------------------------------------------------------------------------- +15: validation loss at the end of training for val data | lm loss value: 3.547616E+00 | lm loss PPL: 3.473042E+01 | +15: ----------------------------------------------------------------------------------------------------------------- +END 3319358: Thu 16 Mar 2023 09:06:05 AM EET diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b70e26f7504e676f415fa2ec69c7cd1e544288a4 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:94f193aaa6f6a66c20c1174f6ff42ab2a7b390d472f93f995a8a7f21f1451eb7 +size 51395415 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ade690119d1f36f5037fc7e6c943c4bd3a8f1666 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd1b239775c7f8c2d56afb5d11926b5744fc2a18c24d902171df5b5f8595ea6b +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5eed33da8f68a1e38122a9b47791df7e8b132f3e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:525ba2e35da35fb9c40ca5d45dbdb861923f7cde784398cba62eddabf3d920e8 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ec8c0f6f8a73182c0d16770a63818022fd3927ff --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f1c140499b367e4f035439ce1c24b8513cd6291931e93ea2006954e6d2bdf506 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fbd02aa646fcd7811dd74c8832e693d19fb86978 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ffdbcafa32e23904e2b7575a4b6d09946d2236d7d72bd5aaa2f8cfd9497b3b57 +size 51395565 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a6ced5a6a4ff00096537f1d6f51f8dacac220cf9 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0fd0850a05887232ce6fabacd3c1e191956fe0b875e4bc0d7c7ddc1671103c87 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ddef268524021cbdbf5f91ee3f19eff8905a4098 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d65151dc734cf4a9058b121fa32031e62f5162585550b66343bfcad65b8f4590 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..748c0bb874991082793edace4223158e0d3a4b4d --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9aea0fb098b4c8ca258d38f92bd67eee96c7232d71521e698e75781846a81dd5 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5165853a83decca13739ca0342bc69bdc7b00a0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a972741111daff12b7533eb4b38b52c5e44a5a525754b1eb358c76e4081678b +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5eccbfa5d636003425443862dd32df91489ae22b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a2611bb7c484496c36770c2e81fe1e43adfd3a38abfd5113ed8be88b7868ce2d +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5a0421599e43f422ec0affd08087ba40a433f4f0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d072abe79ced02ae3b4f456f1c21d62d09f9781ecd533335f8613279b563bc9 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f14b07cd6c39a17507d0b36f91be275462032828 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fcd127a0bcf19a431802f1235ca427ab0f44a738e5ff70064e91845bbca6e4ec +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e89e309201ec190d624847b3c832db90782c1449 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5e5412cf66322e6b6f52017c95ad65e9ec9656623ff3c99658cdda21823ddd1 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..43db7688e7143de6ce16fe3f95534883511140b6 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a958104cbd11ef49c96c2d464d9ab3f25bc19f3dabfdc16e4f3284c42ba3e15 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e2e80c5569bfe6f282b3ff325c553b2d63908b92 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4da7093438aab66d0f6738dd4bfa382d4577a6fa92c3744755e14c38199c305b +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a844874fd183492e332165f747fb918b8e1b379e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a358a66b778d81fe405d88120c1c740cea6281f4a0125df203044e077dfc23c9 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d3587fa256e0f8f6a36ab0ca8211256808f43f0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4a8826660fff3eca1c0f62a848b4e659e53df882a000bef6ff5cbc030a41484 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3cafe685eda3309d752515b49084cebba8690b7b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b7c7b9f7b53ed4ea581db9ed70ea584d84c70efbad1275c6cd15f15d0ad47f5 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..899255b72594cfe8050b9d4924e58c54b8360390 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:798656ce9904b0bbd9ad72bb8d31297969c6c90071c07b645216a97fc8166a02 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5a931f30e76fa34251f32fafeda97e59a867b8b0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:933222fdab2f82dcfe362d6f40caaad78a0dc4ae3f823bef8ebc004f13ff4a01 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cea2536874cf9e25f9a9b7b26f962f88648e5295 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1df1e73eb0262cc358e1288f3d8f576d10900eda99f15095a5595fa47572c29d +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..91635eae8f388891a5096708055c4f25ebad6741 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6dca941a8b67b6a60eea23dcd9534a4aa6cdd40e2779eeba0bfc17f53c23c2a5 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fa9150bcc0c4148e84f12956939979c798cef9d9 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:448565ab452b55a29e871056093ce1b67674a02d60bd8785d60399285c8b2285 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c38ae700eb56ef5f47820360da2b7e5f3abb00f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7340d99fef8b678e47cc291080770e30fe87d9868fa682a940e910c29d61ee2c +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aee6d8a620d9cae809cb6084b3c9cd6ba2f0e629 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3b9625bf6cb40aad98eb2365036af8a0bc06096dd829d3ad25c4d9b8b19106cf +size 51395565 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6df2b348ed6882e1762a19014738659c6bfb139f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f61f340c7d13212d972701fb5ccee80e88076326aa1e8697beeb2ccf344a6b81 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..88c001819d82e598bdd62e7aced61655fb9f5991 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d57e2f72eecfe111ad3277b92755a1291f4e65ed0273a3b3b62f0822bedbb449 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c336920f3ea6ab5e038fb892c82c0302fe4e5c8a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee836ef002aa78d3222e4c5f815b80d5b38ad364491b465f9977de005d2f9b63 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..83b2bf29849f6d62218fc45ad0c8030c718e54d1 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:451f36bf66968e40a3f5c909e379ec09c96fd4b6ad72598f982c493a15efa442 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4e0c8e897b135c16cb03496e60df82af8b836262 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1588c2715a83ef043410fac98ecfc4a761ea1d623ad672dc7361486bdf976723 +size 51395565 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..38a7c0796d5897f028f0d86d7d488e11849829a8 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8586a96672282347cf81c7d6afc7a6bf89b73eff70df5cd353561681456e0988 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..19b5c3d5b03c69593a4fc8f1a6a52c0d6e81a17b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf853010476ecb06a346ddb540531170285a4ea813cb3d7892e58057b7071193 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb1e09cbc298ae96e30bfe72bee90ce181d5c158 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:853d21284f72a2bd95248d16518213f6f521f2dc4d4015bfc33dd7dba13f6c56 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cbad433e9009f6aa013784dfaeae80c6bd5c8fd7 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff1a755fc5f36ac36d9b10012e7bec1744034bf32eda2ae7616b422968ea9933 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..597a057d69c582df3660949b9acd1313c85a02d6 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:746ef98e806b72c3b6136d9ff82227e11f061586cc3560e5f7fbbedf61522553 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3951d19938e2afbd066ed1daa7d7941599601d38 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cbb9b0ecebaa44746f458822526b81f10785b59aca3b010b37232bc73a41d534 +size 51395565 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a0499bf88982a8faa967828414f6fd8e6057b2ea --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c03d101b1e3bc728ba25e2049ca6cc799b3659f7e11471f3d9c4de75a3f2b9b3 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6eb203c3b786e6a4aeef62a6894a08299512fe7d --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7dd1104d7b239a954a36f2762d6e474157378c45edcb1c2a37187f5f0536ad13 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fc8f189a68080d9ba9e64f93ae62ae7928ffb029 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dbf69c5caad3ae429c56f47bf19fcd706446983500d9a701fae1a025dc8da853 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..955056787ae74d353165be5fbdd644bb9a031845 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d86de5979d92af9d9eb74c7c8cdda4b818987248321360550ebe006916a217f +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..34cca66397f8d692d4521806a372b78e084a8eb0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a3ddc815b7961f2f8d7bf561d6be53d282d17ebfd1e7e10495e6db9e7895f373 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9fb0d6c8c0475e074a197cf58050035ba678af00 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:979f14330003af128515c938e7a93d3964cfba0cf891682d3f7e94a2184e6bc0 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1b89e32ec3a4b9e20814c6b9f9f15130d6ec8453 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c6d2c515600c8661366804f2b88b6ceb75f92d55549f9e4233c25a3e0e1e3dd4 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a74e046f45deaa598a7f405f653668af957ff880 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:90d93ea21b14b669e4bbecda865ee695c8540ce5d6242a9748cc9d8fa309b729 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3928776c6c296eb998f2100cdcd18779a5aacfce --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bbb80d4a1477e4edf013a39c63a2faa608e0bcde271506f2340ac6cec47eb7ae +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3def5bba0f8fe12d3a2cf1f6b8473d7165e7071 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b085596b98268b2039d764ee28981c8bf9591208a6f525bc670c9d2536fc13ff +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e44517ac97bbffa3aee2aef1d95587afced0a9d4 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e843483b9bfd3cc079690bc40dbfefba7662bd0d0284835f3fc0bca57b4a42f4 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e733369c126ac85428740d700cfeacb3caf4f159 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a63c8c542770c32fbb44404d928fea2344efcefb5c4b4488cce0a461ed54ba67 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..27962e58c6876ec48e441de462a3a22903b8f9b0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6f17380e51a94e029802a0fce244dcee12372c43f516daba15a4998f514f265 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f497921e896c17e174a50f42f4cb01666de96859 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bce8c3daee58afdd5c777db5cc11deab0bdf5d54e54fdc8cd30a48eff55ee536 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6fc900dbd4ed71bd057ea5c424d09bfc24470e0f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a3505fcf33f91e7a0f7d663a1029cc8b8199ccf98519880c8a47bd507d461a97 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3afdf14932c1b5f6fde57137132b4b9a4f28adb0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:055ed3e98506a1b8c44b28daad81979e139774c1046e7b60576d314fdddab21e +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..97e9dfdd5f11764e88667ed6853f8d3247835907 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8143186fa357ae8c227ae7c5c9fc0e7e92c9a518cf31131194268adcfddbfaf +size 51395565 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9ea665fbbd300ccddde75f4b68309bcc111732db --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b8de2dc98bef9052fcfda8f640ab7272399c4f0810c19ae699b05a401c7eb5de +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ab86eaf927c4c3b48f4e1a2d0def961714373bf9 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1cf47abb542836e6d1ae6ed4e642df498e6ce2f9c21ea46a76e7476bd5c593d2 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d0b3fa6b34f09f5dd966aaa92cbea5cfe0ea9645 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8cf64d2408622fcef2fb9618559ea0195c1d48d5c38674c79223c574ef8f158f +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6cc9393627dc543f8aae041f901f2e51ea48a490 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59bba875e9ae1f53d99aca6f680c91697ebeed7f08f480236d4d07b049f6e756 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..970978dab1b7c818bfccd8543f14b185b1564668 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd142e026ff4bedb195a3b51883dab4177f0e54abbec7c411b9fca1bc9e82a10 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a66e4559f5efa969d72a1a70685e41be0abf8394 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0e653950d4ec7b1a2e1d8e8d42e372bdea562f53e21438dbdac29d2921aed1c +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f812fc309665275c799374bd7187808f41559b11 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d68f4861467dd5eba7f2813f72b6cebdc5b36cf4cea683e9bc62dcd2803a28e +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5e3eb5aa9295887f3651992624311c2c2f0b0c5f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:70bb49a8b6beb11737c8bca410c6ace37f9c7e8379d9482a32f955c80adc1d11 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..451a96751bf9b54cc65e61dfa779cb298683e4c5 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:334c19f332ba8830a98164ccbc69e7585d69ab139c88b52f177b6f5f5e8d053e +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..945333911349b3ff36b3dae04020977f4a33b0ab --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bab7aa191d48b4f4d43aee79cb66a5946bfd55b0750eb1b73de77d0b014b1af8 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5be1dfe1f54b29102652d4f4fd40f008e9bf4166 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ac2db03009c3c5efe77c257353c1a6435a26a13df9ca575efa9df943c1ab33a +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3954d17e7ca6605fe68c76a9dc19afa166d829e5 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b106095b182fdf4b72f194cd54189637de4fe951d0d838c6e83ea415e6770ded +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1afea29297c3fb8c85216c70053e1aeac35ef072 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb613026752679f14220f33f506dd0d390a9c5580ecd3a43b82c52f72de6045b +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee59ed5565400f1ac57b8f3399fa19fe8cc82a54 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a8f1a199a66685cb87f7c42f57c3a5f148d4c6908cd7d080e256ed00d722c89 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..91613e9b8407cf96ee6df7cbbffcb9c69dd9f119 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c03456e4bafda0dcb9cb54b579d450167228b225ba7c354243915cf06cf1a057 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..345219c42f443b24184970cb5fafd82346c81502 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1358b56bf8f66778c4736f44430339a18c7c19bf0fa4c5bff29998116f4fe15 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aaca528644a12041dc5760f30bc8b0da2b720286 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0f9df67495d0c35cdfd98cb86541748d040fed763dce5ed79dfef28cbd4126a +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7ab5745784c548f36387e078e1db62c6b9d47115 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b245e876408d982309fcfc1137dec00ba1023fd7e695ab39ab9e3a67e2fb695b +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..08ccc405643b93b8b8ab9a95a234b2b7b6b95adb --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8af60262c2b6bdabd6460f13f85e4057f53f22198586cbc7ad0de13dc5af624e +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57de31b6b063f546bee00bc575c536312bca7428 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2281741d83e39a7941795b836b0d2f6cccfae2fd7a445634352decf07a5e85b6 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4ec6955e2ef63d744fa493206fd027994e25300 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bd4efc9464a87236771ea886d8fe6bc403094f6fc550fb8ade97b946cd49a62 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..405770c33f824bddd2d01be988d69b65d6427c35 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:20a5ec143f28f5b995d45da2fe16e29154d5c26377e642a24a5cfd5ec590b9b5 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6e18fcd81c8fbadeb6ccae1cfe77e73c9303742c --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c7239ea8fa9b42e9a2804b93c147d77f7a3ec715b4aab065333c5b4f23e608f4 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1a382c603dc9bf6877065641fc3466bb127ae703 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f618a7d634fe506a55c16f7f385090daadf50a54d9a4a024ee7575cd02d9d538 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6f34a51d2d60784df2f98c1de2fe85b0e5477f8e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:423afa0907681d37a3e1811683778a2496254533060f105178c595d66700da4b +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d88644a2e060db447ccb0aa0522af443fc822088 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5bdb239943a101df3fbaa945e1c929580507840767b5a27821e31e47f9088506 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f400f7e683d2d76e9c4193b5b093a74a15ba0cca --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e098240abf861619ca67781154618fa81c72d6f469560f17677991c71ce30289 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f61114e4ed2a513dfee901660812264a18d12269 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c08e2cce38011d4999ef3856691d1e2d865c7206e49a439e8907506fee07a20 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..72d4a2282dcb9a8eb2bab5eac09de77a580fd650 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c333fe518ae310988edb152721b2a33cdc70b83b8a8b17eaf255412c9d6e2515 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..223dc56ad07ab854a0c9f77a96bd8a6bce671d9c --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:401dfaa8247df31138cd714bcc1845b33320d672ffe1fe0d8cdec783a98f0f98 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ec89882714fcdad2d17421aecff454bfd8196c1a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4970dc3fe3116ecffc955d3e5caa48ff3407ceb455236ff207a68c75cc855677 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c055c9e5a1af5dfce4759fae5ad32b8b58346176 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:83049693ce257fb8e9f076545e08f87417335c116e044ea7da037b1385223e03 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d19826e3a7e811bf2b03306949c82dce67de6bf1 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:662e613f36c47a5386abccb7546e042f60837e28ce92f33194f1c34ce976ebc4 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1fb7d32814e13b375bed198f2ae7325ca662a82b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5073ce7f1e41f1cd5dcba28987e053d70fdc7fcfc7b3f4019a7a32f87dc0d1ae +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f5ce84eab4122f70faff58ee30c461a9ee49603c --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:86aa77af873d7e190188d233706f3c64d0bb1446c3e19dd1bb34210a075d6aef +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..757c817d3726e7896d515a861430674af075ec85 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:485ef11cd54be66ad60b2bd772ef5105746c5f40199b1ed60222bbb1c2a921cf +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9b62bdfef2c398cc68c9966df5aac4e5ccfc59f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23cab52ff7de7f0115e75dbdc40cfb3ef6654f737aac3b495be349125ecedc58 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f09c44bff201cc23c151cfedb492811c164d6e19 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:806966959285c23926df7690275f502de6c630f7c97940fa4a232cf1f133f120 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a26655d84481f001c2a425b95bbc1cb05fcb4f0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9382f74283d34753eaab254aee04c5e5ebdc25f41225659b5f7e8ff9f6c1b4c +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ca5dd33f921f5553cc2b5e62d8791576f635e77a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72aa927e0989cb00a9dad8a381b0b627cef57e5e5147bb5e6dbffb552824c149 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1ddbe2034f7ec837c3f26a95758fce9635f7a270 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b8aedcb11de0a788f387626dfab9d02ee64c11779fab62c67b69306ce0e5225 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bd8676a8c7ca44a9b83c5541cc127e0c58095f2a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8155c4351872b14eca1856c324927800126aebd97bdc048516a5c3b4c7b7e97f +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..21964aa53d213f683f05f91b09fa5572294f5fca --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f92ca5b8148268e24149661f06d936f0acd28337721c6023cdb1417e94013878 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d29c4055cca4775f1d880f7cae1472193cb8e733 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb58df5a8ee858964335dfac64a5b242a0e7b7b822eca0fbcd80297077005a84 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..52da1e746ac91b82327fbb049e521ca5a81a0733 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:716256cfd1311105e8caf892c0d75ae0e460ff013f970b0829040bdebd83fe5c +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3db59fced164156310eccf7e76d0dc5383d1dbb0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3860cb37f8f9c79fd1dc16dd3d5e2e6cda665923add21a8892e7eeae6753aeda +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..288f5ef6f9773cde5898498681fde448e6eddba7 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:465c4c38017362bb784738f7fb3bf8aa2bddd01e4c22419211342bb7c49e57de +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57476e25309bfa670e8920f83c2f8cdc36f4bbf6 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:232ef6363465da904df66eef30a4d0e93a79189e952370fe84f5a51158391659 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2ceb45e00957e0f105800e85a66da63027ed3906 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3f9c52f81d02ab5b1b59b67f393657160a051d66c8f61797808c807a184d299 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f03dae22aaf3854565d3f00723d1ff80ccf5e574 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:52dafad2aa3f9a63d946ef258a97b0e4f131164d73d81769c7ae5e7ea20a3481 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..77a062133812b2f41a1bb124f0613301c664fcb5 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:89069e76977c12b516a0bb5c399bcc5823434d34b8c4342b63362511c879ce35 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..162171145c5912430f7979791742a85129aa2089 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de105c49ff35f3fae6cde439ac2b445bc5f8eb8baaa8c860442f373dbdd2a871 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..309051b1569165e56e32437075446b60492e7c92 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e3663209f5d6b69b40ac0e12431a56fc15a2174e77537826fe3878fd4c665757 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..104b4400eb1555bc60e71e698087d335dbe69020 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51ad0ad6200449f0fbe8444e2ef0171b930d366289ac7eed7467e7a8d8c051c1 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..336eb79297b16e041ebfd9d1b9e72a7244a28859 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fdc67b950890888daa6b865e6a93778436db98d0b266eb04f97048e0cfc2afb2 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..45954d35f46c49d1c1299499df60af9c69493b28 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5bc31522726bc5f906583bd5c950b7cff86845fd4afe111c5b42e51875a4a111 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..73fc59a4c163b61677dacb0d7c78a0bb27cc375e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d0647b493ec4b9217f60601c24cdc67b8e3813eb3619b04d746fe354749b4960 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f4c1f52afa510b500d2529c09880bf943853d95f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1117ca7a95177d8986eb7d06d8e9e36ee0b3c456cfd03b4bfff8c62eeac0dfe +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..33eddbaa0e58f037824ea2d3e54efcfab3d71f41 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c0e1633923b2016586a1dfa50119e28bd4fb090d2cf343539b83b3ccd2e53008 +size 51395415 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..653892d0394c54de2cec637d84e64d902c53c642 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:354ec2ea67c86c12261becd487d04b2e37cde250bfa4bd125954768c8f0d256a +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..426ba5e0073b8d08451d37f077e08362a17ba2e8 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e468d5d8c5ea9e567505ae4937bc65470b834980681c59a2b83419efa521c054 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8f0cfc561300f978e901df8ca363e23873f2baab --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d33c7dcae15e67ca56abb761601ab96ca9da2dd2932efdf416ca74523ca7dd8f +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..06c56a16694f1837db9d4c65b6736680bde6d449 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2a0a763d674be2b5928e05595b2beeee8c90eaa6d97358c19b4edd6b843dfb6c +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d56dbe0ce452f253f22721343a68bcf905e83720 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82501a5b81be88fedba3ce43f1b6879bd57460db2979a94bb7ffb0c4e6201502 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bb73d010f49d092649f878b4b5c9f13c004c10e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:61d6388fcf54f0640af9df1e5a289c31e104d27c03e5fc53d57e2f72e807619d +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a5cee5f504f27c3f271b1d94ce23e742b933bed2 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a3793bfd8536d53f960c1539b6bb27c2d4a97c7c52b5ba417a9bdfe02bc9001f +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..48800c1bc5b166419b02df11b41bb9f0acebdad4 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fef34f4b0a2d6b57235362199a65491c82bd6ea59d22c675e6e1709303a58603 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..20b337c94f3df30a80cd93d4d64aebd4d9124e58 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c6c3b0c8fa0c47441b1d3a2497439f8e735fff951f5e66c08a3c4c3cbf63d93b +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f49a469cfc524af9275b46cdf0e3c0a9da24160e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8265d46096cb1bfd3926a04e7eb9ea2f4465269fd6c0e313da4e74a53c471015 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3c9d19ce27fa5a3792b901d9c3a505a7f0a90327 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a3524c13691b96857b2e6cd4e3ff958251907a4d0e6723158f6f15caa4810caf +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c86d5b6b3ebc745ed42381d48dbb121ef52232ed --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:81e663e4593d4cac8887b412814af47c65e01ad99f3dec36bd8f9b45aee5b817 +size 51395565 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7007e7c8de574c8faa2ec33a2ed621e34d9dc1d9 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c3b7da42b151274134b1105c754f3e7bd078f64b04720ebaa597cfcc04247af9 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..204bbcaceffbd06e510b1ca3845853ee4741cefc --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5b161b24c0eae2556e36ad0f4c5a3f5b067c76ad2aeca09164e57cc288cdf2b7 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d39cd9be7cb08e1e556f79f9dd57a028690f8a89 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9b83c065b5c98f2a7972cb5c1b51c0392e0ff53ac81ddeeba627bea071e80e2 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e46c4c5c3e315a9124583894f0ac014ced757ab9 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5239ba55b21a1d268405e9bfc4aa25ab8537c1e75c2a9af0935ea79060930aa8 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eee4110a34ccb362ce4b74aebc35adeb4bbda37a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43fee0bdb10905aabfa9710d1ae866dd72568bb84f36826c423393ef6701c764 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..97a1b2795a852da148db57b93436a70296b5c504 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a8c68dba63d059b39e68c27d18530a43f0c9229863f1f6493a23be8bbd44e45f +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c469966be7030d29c521db509fa7a7fc7a224183 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:580f4c3a8b9f80f4f722cb81e58799af426689bca6c2a3104e3c764ab1779d2c +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e85fa87e4eb16c21c7e41ceef6c3362ef66f2ac0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:86f6b2434a0f627d4bfcd6be30308e06657e9f01da285c4b69999bf223cdbea3 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..020e46cdfea830d6ea9096bffb38af73b1490673 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9393d424d6565d2b610d400e060991c2b67cc213df08546e78ccb29573b5ebd5 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..219204112a923c4023531f763c78948d0db7137b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e1ff3250ea711c9812b482fadb4cbce8cc7a11cee2a480c93693099ba348ac9 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7661f84e700b3523bc56b2d47b89150ab02366b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80a1b2c1fb1ee21e448762f24767d2b005b784fe4f4451d0728372aa92e601ec +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ea68cc3edd8c3b1082006afa5d8c7e672f003301 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17e1972f14acb598df56b08fbce2973271d54e7b89f5f05246eba5f549cffbde +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7e5674367b416f4d8fc2b275263f565e5c887687 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:193e18d348eba13db216b1f68c19e968d1279654a24b9bfdd87292cadf376731 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d8f5fc84f54136f39ab5c9494e48089f4484233a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7c5d2afdda2718ef22c9840453849e58a5ddebc0ce0dbc5022becc62c111f352 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aca63c9345c5444d439e0d12fabe0c20f8557329 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:16ef56be57d0fa85ddbe7906e9948da946d9c0b5db3a9051f9a48cfe1569f636 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c9ee086ab9c92f32d8ca7265f25e2fc5c1627ac --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4be356067f6c20914761547bc5ecd47e2303f3f7c0bfd2e0d732d36bed4ef9dd +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b782c8f73665f72d959d59d56e62ad428f734c22 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b7d80f60af4af7c1ecb3f73bfd61d9aabbf22e571894e3f73c124758b53c4f65 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ef9f5c50621b97000a3931ffa0d88960399f701d --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34f1643203bff30f02b83c32ecf7810c0719285bc475c4ce9f091e51cf57b61f +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..11d9a4d4e8dd2d81dc1f66dd1e1da24924114621 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:256fa772f40d12809ef9b32903fc46a61d9688d938248cddd71f5b67c1884089 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3dc5b29318462b05eaaf96d4a12e7fedd34a569e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:260aa268451d248ebf43dc1b867055bd5d628a2f17c3ced752e0da1dc759fe32 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0842936824427b631847b97c5245c3b497dd8ddf --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cbf981ae079654d03f4af3744e02e6610f5623ac4f0539c149a7584e39788136 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..015b1665e790bbde20bc3502b3607bfb9ddbb9a4 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:422666124a10feb5955b5001e5d4b58a94053342ed45c534ca89f5d3dfb4c46b +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e485a854299340996bd0c1c89f20366c2d2225bf --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f53ebf084e534e03d0d329b2727978ef17f6b9ff30e751522df7834a65965259 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1328d63969d63b377951aec31f7f4e52ad239b3 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:74c3babaebb661bbfed7e3221b5f526c40ee38a7f9772184f5c65bcc596b26b5 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..820713b71f568baa664c49ab93f13fedc4d0acd8 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8754647e493cd041c152f78be2e8d2d2264aab940bc03535df727c614e973f28 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e8f21f41a87b6b5d2ae8d6dc5b6cb22af4e8b1b7 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e925395de9d674da11a2ddfb3805aa385d11d8c5dc76d69294258c9ea3f9bc5e +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4815737d1923472205028a0349402bf1d07ebd73 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:83ff8f3fe3143a1005e5534a5483ebefd7bf341667311e71528b05d8a33fa4bb +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8499f414e84d446042840590358112cde63ec4e5 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aedd99745cfbe4e959c44514ca5acefd5326e8e7f2b04184c7d2411fe4414969 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a444e6758859246183c8808df3daba5071032cdf --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05d31373b35b4d32e568bb64fb6749073519038304513e4d43974e5060e1463e +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ba3cfd16a7d38ff5b37041d6ae8c11e72cdb223e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c931575a6232b7708e95210c538dd8aa011caba585cf62604467c75e06455670 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..967ae812ad1e77208ffacf467e48c7f36ca76a4c --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff538eeeae31756ff74e01ffa8d48504d6f2bdd8c26312cfd10cdd8e4a749891 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f19961bcb791080eef3a8939e2c9b0919aec4c3a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:443b85a46f978b1aa7c3e52bc14d2465e3316bf398102f0496ed942654f88c32 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f6c18a44cc7783c8d03e5f659daa7614fe5c417b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1521ebc556bd4753d6c64172f47ac8b8b202474e7b8f28220405b30bcda581b0 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3ddb6f5a3afbffa1bf2dc8cbe429295d183d0622 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6485a52e96696e18d102b1b648e3afc768c58ba29ec45c23b359b4fab00e07db +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..04100433fcef39d2e427d7a89efdd670ef4db90e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02ba6fceea9d86c48a4c8c24d9a51f23a5b9f30ce86d1716214c32abaac07d5f +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1463811accbc75a8a682e3f432657398e7ff4556 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b776b0729311aa7ab08d1102c38929bc8fe640ef7daef17b6ca4f4517e7028d +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..be9e860ca95d269c09eccd875bd67a2e039592d5 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ad15657ae70d1cc019d378d93e31d2e4c3a012ba59c0c3dc5112fd9feaec480f +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d83fe67fa01d1a361cb889f09e72ae89c6b0e970 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a3b60fd767485c8b72e6962e213410b555d132f219bac2d0e1ab0ab8775a146 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a3d829c2b0b86bb09442d0eeef1c3a9c916aa6f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:083e93a63f964d673a6d9b02c816a47a358804a487e0076729bec957aeadc754 +size 51395437 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d0c0e8a5772560b5cee1e2826ba1ef05c783cf53 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4e7815cf327a82a7736a97a2e166eb5c70b8a29e0729f70019e2db39bcbb6b4c +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2743f6b9ed96785925a97f8e37cb09a5847071d0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9b12c4eff505ede485d7791dfc55889fff693699eb50ac03dfae7a296162aad +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1a55a3e7d3f2bc58f0ad1fbb0a8fcf269fbddf98 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:81bf07cde773de9cf4389615c9bbba13273aeebf826b8fe1f6568c331805f56a +size 51395565 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7d7fa9e4b1bc5b6ce3dda4c271c5c9331f681985 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e0f3d31fb8fe38cf2d85c6dab9c51919d1f0812c274f3105004a66baaa02c71 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e25d08d9fffe192f346e91484caa716c1340c507 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3604c84b72cf82ffd40db0cd1e6ed617983595a45c021d33690df686d7614995 +size 51395501 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1cc41adf9a7ade8d0b98a6a33f0fde79c49b4aca --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6728475958205f532b44d1dc1c8d9729c07fedad5b5dceab743e26bfbe6013a0 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a5868e2fe8c4077fe48e8f5cbf859fd5572ef2f1 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8471c76c1f65f6ab50330bc309272d939882164afb907bf8b1b3ae75d2052067 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8def38cd90a601191349d0ea7354c2434ddbd2b2 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0c4e87ec798c50d4cb56878387ef2a96b9718072754735b81f926b9b51dc5413 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..16a5bd054316dda3dff65e3b24c97a9b9d40d8a5 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:205779ef3674f60882653c1e9edf9381da01631332aebeca0caa655e9676efe6 +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..584e0974b6b81c82f3e68e19441fa005b29da1f6 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c0c513bb051336f44da1d616fbd911647ec8946b8024993dadd34ccb586c6cf +size 51395373 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7c997e1238f9b1469858d8e93412a9e6465971c --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f07e3a0431b49c37997568e5bc31311ad774d2fd3558ec4549562ffd58d51a6e +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5e4bab33f211814c50d1c1a142a6ca96039eb23c --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26ad461642e6fbf80969567723cf457777101d6b670b3a02a8f3c0951452494f +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2fe5a9c89253956da2d6af7e4bed64bd2322d11e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:61713c5817df92897db7cc2c396a3a2e965cdd65af4513c41b4cff21f58fbb13 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..efd01fddd31902ac65d1878b7feb0e4401edfb93 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:71a8d67d81163ee0b7393e63e1ffae51fdcd25ae9c62805aa2482543027ea8bd +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..09864f4b0a4760c63364bcd20db652f11a623f67 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d00a0b453beefb4a56b41c8e743f513811f07c9cbcd2cf2532adda9a8bb34acf +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2e161dd87b5f4bbb3adcf13990aafc89465ce742 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6e8f0080a67ede294b22344dbc28f0c5b400b320effcfa5c82bae0f3d7ec3939 +size 51395351 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a21ad227ed8650aab0d3ed278aa73beb5d14e8b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d319b35651bcd2ea29f65c8a56c1df9edd094e20f85e562cc2a2b7d407743c9f +size 51395554 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d6e6478599283cd52cb2d3dfe9fafaded629fc90 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:29cf44e28fce67f3b62718eb712fc2d7256e71a39957031708413afce131ebde +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc1a85515bf8cb6cf73333dd671910dbaceabbb0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dea8e250fa1716c3dc88426ea2ac16d14926406648df7d36ec747cc6e3e84da9 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..41eaf20738d0b33a0ea55c20862e956bd531769a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:39da0434ed9f293887350d1c64f2d052d6ada4f014da476449434a6cbee53bc2 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7b7d926ee04362910d2f4fd73e27f9d96fd2f038 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32255d9825cb42fef899d9e2121e929303294929f1db3cb36da2a0623cd37e82 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d74d9ba65abdba455e994bd15e97e289398a8ca8 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:10018324e89eaa6ace5941579e73a91752f46e730430f31c3e15b19cc361c81a +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..abf374e913c441847937efbf8ec19cbfb7f2da2b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b67a29fac8de1ae4027e0fd340033136ba12fbe2e08bb7eca1cf245ec14862d +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e111016f9b36fe3274d61c9e86bf7515eb6e674 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:788a2948b34caf2ebbf807e8dded34edbf2bb3f491a6e616d306dddea1a84c5d +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb07b38d55e7953046ffd0ff76984b9010eed4a3 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72f3d5a1bdb3451ff68124c0be1edac662001cac94eacc86e07888a44ca3bf94 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6f6242b8ee4fd614126117b6dfde54da615e6868 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9f9ba77236a05c913b9bab79d6662f690cf34314376822c2d42ee230041f055 +size 51395554 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6adffe05bfe919b7c8c1c0f832e53d39148da707 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e71a266b6f8f4577c13a1a28c27c526583e8338d95e847993800fab989c07b48 +size 51395415 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6981452334a5fd6fc0ed397f344e22583d7926ff --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f9699420cabb143cec4deb783f76074d01fca23a1c06f412843d13d636b104a +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3adb9c97867bb3a5bf570756cc45adb4cf114a9d --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ee040cdfc03fd86801935fdb8812cf5d7c05d6374ea5d7daedf031654dbde97 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..babcf3b79421111ad6e4db12369f30136090dc12 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e06ce3dee2b0ffb20c2b7fd954cf738b32a69faf90563cf2b9506a302c15dbf +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5ee3e38e5b25008b37a8746447a37c90ee0d50ce --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5938e09800bb9a73e0528d9ddfe70fd89ef074ebe32c847fc92765784aec778c +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a7673bffcde8d2363e49f82b0d3c785ff6c7d54 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59d45adc8ba15a15df3db86c0f2999569fa724027eab12819a306481a9d5c3ff +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2ce96384c380220c05828886f4154c6964fcef25 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d400fcce103b56216c54e3f6dff8611712007a80d5ed128fa7950f6a0541890 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4ae8faf13c093a804bcf5a7df3a56efadf351e07 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f1a40eac64b3de3cfcad49fbefe8249de63d437b10fd7c55679a1a948e4a124 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f98121c2c7a572a1df7ac9c5ea0c95aa818b9e48 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:080f3188e1f5cd229691d4ccffc42994d7b63fcd70ee1747db1fd58a12d82de5 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8aa0674bc6b5a0cc7646b688392c6085da5e978e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4dd31561b3d158a0ac27c12581b23edbf63008240fb0e1d3cf0c0d538b5c1df9 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0d144297940a2008d47ad42c3a80599da2039ac3 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d74555d5dca0bc0e9026f0ac4484e4fc00fddc5ea45270582d43a7ad625213c5 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ff23781a694a9924eead09088f0ab9febdff95a9 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4849c7ae50f1620df4c3f7eb3db997260c0c86f330fa70fe2e69a6fdec8601f +size 51395415 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..88861fbb013145408ba7f0a44513691d637c4f13 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a8668caf1f488b1b143847e481ae7ab07d3ceed5ccd2a987e2a643c3a45d849f +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f90534fe210f51ac98dd0094fd774979f58d5b5b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:702d6a15f8a89dd693ee4d0b5460c350c5e5c3ef21529c1e6487f325142fcc57 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..035ac4c5ecd1ed9a16262f0b35dea2be6296c7f6 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4aac115b811147b6df14548c0bc8f1263bbadc4f43593e0a8cc2b72273f451f2 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..59ad3f1e02f623c8f7eddff101f8397893fcf37a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:24a0d68e76b447f6668e41f67e6cca1d4c70502ef2d2c74f2ca772d772a29f77 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7a8316250f6a59978d7227feb516916b6316ceca --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84faad1950b49d1a4a8987e12147a694ea3f8f6588240dc5b1e4c4cba2f084a3 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cad8cbfd371be619a807133da4dd405941cb36c1 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:badd5efb35431ff5139d2b3ac79fe72fb670615bebb00533f991bbfef0b865b2 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..56151b561975fa9051cf16007a9ffe475fe0bb31 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dc24cde475c019dd8df29ede968609219987a9305dedb928c4968ef933a08b86 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..20fcc91b3c864274032ed9ff2984ac9a51d0e527 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54b97912d13647f4892825eb98582bd9c72271da11f08380aa6d20c4d0668aa5 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fbc05859fc0cd3b6cd8ec4cc0e6ebd00b7fe21bb --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d40ec1ae9280ff5476ca38bb8238fdfd5a505cb48eb27ce00e7e244d6b9c4a86 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1473fc65e203d5ba08982c24f93c56ed4aff458b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:736cf9f55669ddfe809168f2d4e3d72e502e3c52430571fb2cc335cf1163450c +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e6f40cbd729d4756efe95a123f4d979d2498a008 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b30e27156f8d0d5cc00094c48b65cc466e5e7f48f80b3e0b28039af0d62773e9 +size 51395415 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6ee95de8452a52906d1f020f9fe90d16a106c7e6 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:830b35132daababb2a0f7a5597dbc23986ec65411e42e96df95a23ce5112468e +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4a45b3005f91672ced7618e076932792ce599844 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9f55d689d6482110a9009736313f2a1c870014a2c2bb5d48b0421e368fcb3637 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57ea4f9f0977d3058a0b174ab2da684dd5ad452f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:434096ccfc350701174b38082510f39544307863218014e643ca22420b45ff82 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b6472ab1ec2ea149502b7fcc7692d1fdc2eecea --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0672a7e93fa9b661261fcd8bbaa86ebf67d30838f7d947d73d6072c7a06dcb9f +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe48735fde38f8dc8c4bfa8e1fdfb6fd57cea027 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf3caa00cca3f49f001c2b451e870758a18dd927bbeded77f8ba02066774b55a +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d8a6db4a0f9e35eeecccdf93cc7483190a77b37b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8fd83227763a32aad3f34253685f61cc26ebc3111fa86123983e2cf5f21c053d +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..05cbf65429dd6e3b4277df765ef9eff7e7c9f8cc --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9dde2ba5c0fc77629a9c38f3dd2de1fb04e4261adcdc07e378de0a4dd62a96a3 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57061c69d4df9be33c157f1b756e8f923e262c08 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d765a057e9bbded4f58f98fd03701cf076c6d5e9716dc0393b88b0000a5ea05 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..406364e3f5deacd9c8c2679894ea584dee139cba --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:108d7fe493e02a54322f3fed565b4ec803c631049ff492d1be9fcdeef22a0704 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f4ad825973df62fa4469ede396265f7046ce1036 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:083fa3747e5a3fa0f468b3330fcea1059ca44e80a6e6f3b7fad462de2725bae6 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..581cf22a0a719858668cd6bd341b994ff438451f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d43578a7d2a4ba6f02500ff18db820179de6608c9872af93f16b771c2433c0b +size 51395479 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9be2240794a5a1d22bb1c0a7f39b39f7db2c6d4a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d16390cc290e7a5077870af63eb9f277c246c7d8885ad5e77f1565daf995152f +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5ce8a52e144acca1076946a0a7db607352445f75 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8a7e63d91f1b28a0820c3c2478000aeef1b8be38af85ba1c25aca2ed0f2dca02 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7d0505f10df3a0f94a41aaf5090b52134325e91 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9bd3e8e33ab374efebff33f052229ffa0c0a822ca4aef4058c3b0f1ea1bd6749 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a7c22bbb82780079c6d552639958117ff264656c --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:731e8ce2c37b36db8d991bcf3ace6fc63309970f4ede1904eea7faa26908d12c +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0f60c26c581a9f69451c95fff10ab034c8c5ba5 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:135346ce1bf1e9dd934f2df2c76c618d8f43607807a87bb08feb265d944d0835 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..420bf0950f1661966517aeaac4264a0705a5f61e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:daba09e71a3ffd941205feff70570745f5d54b2336167bbbbf287029774a63e0 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dbba2bc3176cd144fb69dd1c013f39bb4e38b365 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef563b92987014540deea5e1d55d71e93ac64b8b58d7c7efb24aa7bbd3bb71cc +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fedcdeba58a26d9d9f5c37c8a430426476d6840f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3df90c9637c7af9789e9049fe7c38a8c39f8052b9241c35d7a5f076c26a35858 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4954fd495533952e80bc144c03f3ffd1d7363c40 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9286eeb1873b07c33d465995ef8e54414998de47cb21ede4f9940e1970022d5 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b048648b23e0bfb5a75b4132f05e5cf84ce6b9eb --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6da3c9afccfc56255e5bd662f88813acf762d096a52c2066b8aa5721529d98aa +size 51395554 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9cfa31d033b96455ff83c69b8f6152e95ae0732f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67dae4758df9e3f4f92144a9b67a3d9de01b25733c4de8a5c304119911b9b79a +size 51395351 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3bf13cd0b4506231ab04fc27b81c49d5d09e5e1f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a4679a458e9b63bd3d1291f4a0ddd68772d8833c73cd00a92e47e80e65a991d +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..17c3896a3b040cc062e486e9477ec8279a606893 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c8178f6956f64808439d7128b02c76afe501af89448708cbe8d1c15bd19a0d3 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76ea5291e0703e7ba8d8702dfb8d97ad78edc0e0 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2384bb9f0c3f4f9a3b648e02c36f1091ecf9cb27d675b45189b53c791c7c6a81 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a7875f73df2dbede085918964d2b8e4cc693825a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a7b48a0379761142265a1abea6dd2bcdb6e57ab4d12bc260820aa9dfd3f03d9e +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0d10e885e8a957baab5f22ad9dd2ac235b9b826f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:657da6defd61f8b30ed2f40ebf900215c89133634868c7f04a188e9877d2c299 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9e274dbca4d859a78e4a61873f3eda0351aed210 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8b87e923cb5122dd3859f82933291681a43237f44d366bdb52dd129c4fe0ea9 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d0efad967c8af0fd548931ea70f1d386d249c74 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9d937fa281ebc8a2b21f32fa38c0bb2ae11616ba9180a926177039f10788722 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..906067007881d08079fb2de6a237e50de6df0d9d --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4885679a7c67a395166150197e9d4417bafc600db49bb2a63e312164ead2c8e4 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..995d00558e5fb3ec9689cfd3b09ca85c7ea74c6b --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e40e9b6d96687414066bcb0eee024d2ae4c47a80a623b79fd95a79f147e1e0ca +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2efa203c9b8558e884883842c0f241ce5dd9eb2e --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:681ccb760f828a2f9dddb14f761a0710e9f9737a44d9ff4ea0f8f76e0f9bfc0a +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1691dbf55349cd4f26291f79606df102afb1d809 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee4aca4269c52bdfa0fe7bd16e917aa6cfb9d39096cf032637d2c142559e9deb +size 51395351 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..11a0953c7bd89f9066e98cd0ce43730fa9d3e9e9 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:61e9789b5a541ac548997824d55fce254104a4c24758228b3e223cf7c86d7efc +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8728ad1f44b0b1d9bf2e9533bd5f6430163f53a7 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:482fb34ac5fbf02b34f18f36332060cfa97a113c0b7eb038a331dfb3fc272990 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c3ed24b54f2a937a763d3e1e5fd5c1087a355320 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b068303599e90e2f803f107879f3e4e17a71f2c80fafdbdbdcf60ae116e4713 +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f0da9f16114faa5200d15c0458a22fa0e568ba6 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cbdf2b9a9244257a3b87f76e2880ff38186d159782e286420fc45d27c3f9163a +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1066ff83305977c40822cc66354c74d912e0e114 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:191bd97dc2e4de9686dd7f12c45bd0867386e345825a951596868a2cf57b8111 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..badabeeb2e7dceb843a51d2b4cb0ca0839df1261 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4cd59c7b88ca4b96fa50d8779b13940b91118220a0faca3059d5ed2838fb6d93 +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..83d74479acd6ec16995d1ff8406388e70e766d85 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0af5b992c4cca4047a8d7a83a01f86a32cf0fbc769c8a59ed8317fb4edb98858 +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b81f3c530d2bf025bc7bfb6a211fe1a263cc6ed6 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a577cd55e605afa4d5dc2248cfccce93cb9100da7975678e38aa6fea9cd00b3a +size 51395490 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2dcd6cc05802b60da8ead253c843497681038f6f --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2a0b733fb566141682bafc3d09b60fe52e596dd670fc2e3bbabb876291b18daf +size 51395362 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1745e8431ed269744afc2e0e9bb2c0d7dcfb7c69 --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e70830352a7dcbac9b9d32838dd93d93b9afcba4fe0bdd4fb2ad8b3b72f2475c +size 51395426 diff --git a/1b112b400m/global_step23189/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/1b112b400m/global_step23189/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..58537a9657f340e316a27ec2b8a4e64e4101c47a --- /dev/null +++ b/1b112b400m/global_step23189/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0882efeefc103d01d578907a56b53d94430c4c0639cc321e7d5c08e2631a6e82 +size 51395479 diff --git a/1b112b400m/global_step23189/layer_01-model_00-model_states.pt b/1b112b400m/global_step23189/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0623f22cdf7a7200909d1f04683d7c0d514419d5 --- /dev/null +++ b/1b112b400m/global_step23189/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:10e570cf580ec4f9dedcacb2ecb2b676854306c017f4e7ffa40615945160b8da +size 187630851 diff --git a/1b112b400m/global_step23189/layer_03-model_00-model_states.pt b/1b112b400m/global_step23189/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c3f23c4cc87cfa9527738c176cde5bcf673e194d --- /dev/null +++ b/1b112b400m/global_step23189/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef451ad590420f6b67e98f0a63efb50f0488aff338d6008bd368e64dcf04f085 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_04-model_00-model_states.pt b/1b112b400m/global_step23189/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c827c3736d21b915955a9354a32f807b9b13a1b --- /dev/null +++ b/1b112b400m/global_step23189/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b7acd2fcbbeb89dc7c5c67185adc0d7b888dc02f9132379edb4b48ed86198fab +size 77121283 diff --git a/1b112b400m/global_step23189/layer_05-model_00-model_states.pt b/1b112b400m/global_step23189/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f534dc9e7f609cfd44e5e99bcd3890ed7f99ae11 --- /dev/null +++ b/1b112b400m/global_step23189/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e45ad9aeee4a207166e8a47fc208ee6716af5081fa1e1359fa984320db23084 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_06-model_00-model_states.pt b/1b112b400m/global_step23189/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c8ab8b24daac217c67e69c26d169dd5ac8ce0831 --- /dev/null +++ b/1b112b400m/global_step23189/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba124506d361c48cecb5c7853d1f4d790b42b7c850c4378285b378ecd55b4d98 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_07-model_00-model_states.pt b/1b112b400m/global_step23189/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..508c283ff62bcd3b5e9c5b4299ad17addb193ecb --- /dev/null +++ b/1b112b400m/global_step23189/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:95a3c114222443455a88898b766a9cc9f9e0478702ea99aa780b37624c89aab4 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_08-model_00-model_states.pt b/1b112b400m/global_step23189/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e5f09f01c3311cec7e5f6d5fe15f8d71e7389c5b --- /dev/null +++ b/1b112b400m/global_step23189/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9bce8d154ebf2555ccfdaaa2d9ab52949ee6b3ad002b41fc328cfb12c205cc30 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_09-model_00-model_states.pt b/1b112b400m/global_step23189/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f3c50a4621af40794359c380f329e45600cd07ae --- /dev/null +++ b/1b112b400m/global_step23189/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0571bcb98e98831aad56b22de97dd1c240993e244b5dd3a68894f6325564c299 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_10-model_00-model_states.pt b/1b112b400m/global_step23189/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0516d684bc5d11aacf58e8c915b70517286daa3d --- /dev/null +++ b/1b112b400m/global_step23189/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:958d6cc1544427cc287eb0963ac21a47275ac1e40d58d875e36cecb6a987c221 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_11-model_00-model_states.pt b/1b112b400m/global_step23189/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9f58f52cb6c91dc040a41ae4d79b53b0bc9a4eb --- /dev/null +++ b/1b112b400m/global_step23189/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb31b360d607910d1305976c20568a5d848c11f210edd95dbadb2cc59ebff528 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_12-model_00-model_states.pt b/1b112b400m/global_step23189/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..60f285e9419750377f16a9b72c3ce2790ebc541b --- /dev/null +++ b/1b112b400m/global_step23189/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e4234efcd999283fb053ab83c6be29c33a7a80f8ad4c29d038870897ee597de5 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_13-model_00-model_states.pt b/1b112b400m/global_step23189/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b45e87f08044647ed5614d17d854d36509a77fe4 --- /dev/null +++ b/1b112b400m/global_step23189/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5a8a8bd5392627b6561cfaab683a7938957afb65d5fe2f08dead10f84276212 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_14-model_00-model_states.pt b/1b112b400m/global_step23189/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7aa7f776824014128832bcb3ff655a8c6c4bae9d --- /dev/null +++ b/1b112b400m/global_step23189/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b26be7354460aa9cdb656a851a81f1839a61d2da44aaf0b48e91f2302fd98938 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_15-model_00-model_states.pt b/1b112b400m/global_step23189/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc2c096d7f0e18898e8a82a0cc3dcafee427f550 --- /dev/null +++ b/1b112b400m/global_step23189/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0cb849098c2295f9f18b3b9da0748ffc6fa44ef10f279f4daba3d0390f40acc +size 77121283 diff --git a/1b112b400m/global_step23189/layer_16-model_00-model_states.pt b/1b112b400m/global_step23189/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..717f2211ea9e80e989561ee2fa060a026a4954a7 --- /dev/null +++ b/1b112b400m/global_step23189/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0842751595844e3f787912462f63095d91a58f6f80c14819142c986299ab38fc +size 77121283 diff --git a/1b112b400m/global_step23189/layer_17-model_00-model_states.pt b/1b112b400m/global_step23189/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1f9da092b6c3f9d528e44b26bcf83d90650db5a --- /dev/null +++ b/1b112b400m/global_step23189/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:697ddb134b82534da1561ba1a96e22126161c2a30447c7b01b679eff8b9107c1 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_18-model_00-model_states.pt b/1b112b400m/global_step23189/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80f5b33dd318e6c9138e3bbf2df58f87a522005a --- /dev/null +++ b/1b112b400m/global_step23189/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:86f2eca028e27dc611c7e051220db2cdc81bf2d52048ff0be119f2789e39eaaf +size 77121283 diff --git a/1b112b400m/global_step23189/layer_19-model_00-model_states.pt b/1b112b400m/global_step23189/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4914031ea32c647c8af8b465a52dfe4ea269fa13 --- /dev/null +++ b/1b112b400m/global_step23189/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ffe65c347bbe085dd797fc2b1f0b83dd9f6ed739b31ed76ca410a7150439130b +size 77121283 diff --git a/1b112b400m/global_step23189/layer_20-model_00-model_states.pt b/1b112b400m/global_step23189/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2140d15cd58c072fc8c941a059b6c933c9b4fe01 --- /dev/null +++ b/1b112b400m/global_step23189/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b28d81eed8e383668c8588c5231c0d786673c5ed5b6c2a84c34956db6f27c932 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_21-model_00-model_states.pt b/1b112b400m/global_step23189/layer_21-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c515c34872de064b4eed94271653d6f9856dc553 --- /dev/null +++ b/1b112b400m/global_step23189/layer_21-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b0e631370cfea956d8095a85502ebd74d23a2db19c96d554fcce01682e820f6 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_22-model_00-model_states.pt b/1b112b400m/global_step23189/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d2ff713b7599bccf4864305ae0b1d5c8497726b --- /dev/null +++ b/1b112b400m/global_step23189/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:13689e7189c15a54e70a2a68e9894e95c2ec4f0805220a624c9d01eeaae8dac0 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_23-model_00-model_states.pt b/1b112b400m/global_step23189/layer_23-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9ee8c7bb1e96bc322e6b1d8322e2cf324e22baf2 --- /dev/null +++ b/1b112b400m/global_step23189/layer_23-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:968b8e2df3f706c3000eb83babd052c21b7635940082a08e068fe7380bc1dca8 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_24-model_00-model_states.pt b/1b112b400m/global_step23189/layer_24-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a3ce8ffd05f525e0351626eb8219b08e505a18e2 --- /dev/null +++ b/1b112b400m/global_step23189/layer_24-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b507cc65b06e9d087cf252826046eb9a916fadcd96ed9aaaa735719e08f99636 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_25-model_00-model_states.pt b/1b112b400m/global_step23189/layer_25-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..382484b20b595d89af1a2c6a44c2f90fbbdb4389 --- /dev/null +++ b/1b112b400m/global_step23189/layer_25-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfb2150257ae573477434cd183d465e438aead0cd30c656636a21fdb7f071c51 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_26-model_00-model_states.pt b/1b112b400m/global_step23189/layer_26-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2585b012dfe638c4cec64c4f88ad49d6fb035948 --- /dev/null +++ b/1b112b400m/global_step23189/layer_26-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7de26bd5706466ffab52a5886ce0f474f4792d4972dcafce28ac7c17d1e9a150 +size 77121283 diff --git a/1b112b400m/global_step23189/layer_27-model_00-model_states.pt b/1b112b400m/global_step23189/layer_27-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b19a383cd1ddc122e4b8503b376e05ceb3c5811d --- /dev/null +++ b/1b112b400m/global_step23189/layer_27-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8fd2298459a33bdde66ead86ee96f815b6ea34476ca133597587923f4fafde7d +size 77121283 diff --git a/1b112b400m/global_step23189/layer_28-model_00-model_states.pt b/1b112b400m/global_step23189/layer_28-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d820ef1e87f54eaea8159a9c5b1ae12112761f40 --- /dev/null +++ b/1b112b400m/global_step23189/layer_28-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9441b8478327fa83271f116a26cf51685cfc9abc594ddd0400e233aeff857aec +size 77121283 diff --git a/1b112b400m/global_step23189/layer_30-model_00-model_states.pt b/1b112b400m/global_step23189/layer_30-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7a6da2b1d16b5680952325657d4674555011c8ab --- /dev/null +++ b/1b112b400m/global_step23189/layer_30-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5b4743fc54633bc47daad96fd278abdda878ca4af4a0b81308c6d419c68846d +size 8387 diff --git a/1b112b400m/global_step23189/mp_rank_00_model_states.pt b/1b112b400m/global_step23189/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7caa0b2e853ac7c1ccbe77b7d5b8908589b4fb98 --- /dev/null +++ b/1b112b400m/global_step23189/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:55ecbd7d4c3c13cd6849827b1b723e7d364bb006920da0e83329bd989539d7e2 +size 43827 diff --git a/1b112b400m/sbatch_1b112b400m.sh b/1b112b400m/sbatch_1b112b400m.sh new file mode 100644 index 0000000000000000000000000000000000000000..ce7d0ff154093d64ecdda19ca8cf573a7636d6cf --- /dev/null +++ b/1b112b400m/sbatch_1b112b400m.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=32 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b112b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=1 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1143M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 12157780000 +# -> Samples: 5936416 +TRAIN_SAMPLES=5_936_416 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 59_364 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 10000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b112b400m/sbatch_1b112b400mval.sh b/1b112b400m/sbatch_1b112b400mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..9bcdfc8ff8037d1db780809585feafed91266e44 --- /dev/null +++ b/1b112b400m/sbatch_1b112b400mval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=16 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b112b400mval +VARIANT_CKPT=1b112b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=1 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1143M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 12157780000 +# -> Samples: 5936416 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --override-lr-scheduler \ + --reset-progress \ + --no-load-optim \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b112b400m/tensorboard_1b112b400m/events.out.tfevents.1678910032.nid006911.12519.0 b/1b112b400m/tensorboard_1b112b400m/events.out.tfevents.1678910032.nid006911.12519.0 new file mode 100644 index 0000000000000000000000000000000000000000..de6ecdee7250b6c6956aa558727fe0f663da948e --- /dev/null +++ b/1b112b400m/tensorboard_1b112b400m/events.out.tfevents.1678910032.nid006911.12519.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3ef8d4a55e8cb99d26209d3797bf46d520e11e162ea7b688a03d3eebba747214 +size 41363912 diff --git a/1b112b400m/tensorboard_1b112b400mval/events.out.tfevents.1678950261.nid006724.72965.0 b/1b112b400m/tensorboard_1b112b400mval/events.out.tfevents.1678950261.nid006724.72965.0 new file mode 100644 index 0000000000000000000000000000000000000000..32adef10593a28f1811db36c63f98f8e27e15491 --- /dev/null +++ b/1b112b400m/tensorboard_1b112b400mval/events.out.tfevents.1678950261.nid006724.72965.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59b32fc4b03f2085da5580a27dca48809a15784079573a55b33968097ce1f70b +size 980 diff --git a/1b11b5400m/3318671.err b/1b11b5400m/3318671.err new file mode 100644 index 0000000000000000000000000000000000000000..88f56bdd16dc72b281b61900d8dc4c1ef1409d03 --- /dev/null +++ b/1b11b5400m/3318671.err @@ -0,0 +1,1101 @@ +4: 2023-03-15 23:28:18.875583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:28:18.875572: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:28:18.875588: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:28:18.875571: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:28:18.875609: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:28:18.875623: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:28:18.875629: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: 2023-03-15 23:28:18.875946: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:28:18.875957: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:28:18.875956: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:28:18.875609: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:28:18.875945: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:28:18.875992: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:28:18.875992: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:28:18.876000: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:28:18.875991: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:28:18.875984: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:28:18.875996: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:28:18.875995: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:28:18.876003: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:28:18.876014: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:28:18.876019: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:28:18.876027: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:28:18.876034: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: 2023-03-15 23:28:18.876779: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:28:18.876779: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:28:18.876809: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:28:18.876820: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:28:18.876815: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:28:18.876818: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:28:18.876829: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:28:18.876831: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:28:18.876589: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:28:18.876588: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:28:18.876597: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:28:18.876619: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:28:18.876625: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:28:18.876631: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:28:18.876637: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:28:18.876631: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:28:18.876534: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:28:18.876548: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:28:18.876540: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: 2023-03-15 23:28:18.876423: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:28:18.876418: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:28:18.876444: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:28:18.876558: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:28:18.876563: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:28:18.876579: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:28:18.876584: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:28:18.876429: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:28:18.876453: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:28:18.876602: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:28:18.876461: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:28:18.876481: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:28:18.876607: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:28:18.876614: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:28:18.876630: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: 2023-03-15 23:28:18.876495: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:28:18.876617: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:28:18.876612: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:28:18.876641: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:28:18.876656: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:28:18.876658: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:28:32.707112: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-15 23:28:32.707296: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:28:32.707131: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-15 23:28:32.707315: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:28:32.707668: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:28:32.707726: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:28:32.707140: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-15 23:28:32.707321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:28:32.707686: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:28:32.707160: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-15 23:28:32.707333: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:28:32.707742: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:28:32.707167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-15 23:28:32.707340: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:28:32.707696: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:28:32.707179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-15 23:28:32.707343: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:28:32.707750: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:28:32.707169: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-15 23:28:32.707353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:28:32.707175: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-15 23:28:32.707810: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-15 23:28:32.707367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:28:32.707716: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:28:32.707723: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:28:32.707723: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:28:32.707741: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:28:32.707759: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:28:32.707767: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:28:32.707770: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:28:32.707776: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:28:32.707755: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:28:32.707778: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:28:32.707821: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:28:32.707826: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:28:32.707847: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:28:32.707844: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:28:32.707843: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:28:32.707853: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:28:32.707853: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:28:32.708695: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:28:32.708720: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:28:32.708723: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:28:32.708738: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:28:32.708741: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:28:32.708751: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:28:32.708757: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:28:32.708765: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:28:32.709225: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:28:32.709263: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:28:32.709288: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:28:32.709829: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:28:32.709330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:28:32.709352: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:28:32.709309: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:28:32.709874: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:28:32.709379: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:28:32.709908: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:28:32.709923: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:28:32.709945: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:28:32.710351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-15 23:28:32.709382: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:28:32.710003: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:28:32.710012: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:28:32.710016: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:28:32.709978: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-15 23:28:32.710396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:28:32.710377: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:28:32.709996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-15 23:28:32.710392: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:28:32.710017: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-15 23:28:32.710384: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:28:32.710041: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-15 23:28:32.710395: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:28:32.710027: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-15 23:28:32.710389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:28:32.710047: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-15 23:28:32.710373: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:28:32.710053: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-15 23:28:32.711096: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:28:32.711116: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:28:32.711130: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:28:32.710067: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-15 23:28:32.711135: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:28:32.711142: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:28:32.710587: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:28:32.711165: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:28:32.711175: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:28:32.711177: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:28:32.710605: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:28:32.710626: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:28:32.710633: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:28:32.710651: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:28:32.710657: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:28:32.710664: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:28:32.710671: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:28:32.710240: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:28:32.710723: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:28:32.710280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:28:32.710293: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:28:32.710306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:28:32.710320: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:28:32.710747: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:28:32.710746: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:28:32.710327: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:28:32.710335: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:28:32.710335: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:28:32.710763: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:28:32.710766: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:28:32.710779: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:28:32.710777: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:28:32.710788: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:28:32.715287: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:28:32.715308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:28:32.715590: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:28:32.715316: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:28:32.715328: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:28:32.715333: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:28:32.715345: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:28:32.715608: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:28:32.715340: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:28:32.715616: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:28:32.715347: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:28:32.715628: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:28:32.715632: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:28:32.715633: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:28:32.715638: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:28:32.715648: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:29:00.317240: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.317260: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.317307: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.317299: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.317315: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.317330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.317410: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.317413: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.317968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.317993: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.317998: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.318038: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.318055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.318075: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.318088: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.318092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.318864: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.318888: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.318906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.318906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.318921: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.318926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.318931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.318936: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.321474: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.321499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.321519: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.321554: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.321551: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.321562: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.321564: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.321577: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322091: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322094: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322097: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322098: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322103: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322121: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:29:00.322122: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:29:00.322123: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:29:00.322126: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:29:00.322126: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:29:00.322939: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-15 23:29:00.322127: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:29:00.322129: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322131: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-15 23:29:00.322941: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:29:00.322583: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:29:00.322159: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.322947: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:29:00.322615: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.322944: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:29:00.322640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.322944: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:29:00.322650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.322949: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:29:00.322664: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.322955: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:29:00.322955: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.322949: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:29:00.322666: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.322965: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:29:00.322967: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.322966: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:29:00.322685: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-15 23:29:00.322969: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:29:00.322972: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:29:00.323476: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-15 23:29:00.322996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:29:00.322755: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:29:00.323011: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:29:00.323513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:29:00.323529: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-15 23:29:00.322864: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:29:00.323549: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-15 23:29:00.322866: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.322866: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.322868: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-15 23:29:00.323817: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.322867: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-15 23:29:00.323820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.322870: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-15 23:29:00.323842: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.322872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-15 23:29:00.323850: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.322873: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:29:00.322884: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:29:00.322880: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:29:00.322885: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:29:00.322887: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:29:00.322888: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:29:00.322890: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:29:00.322894: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:29:00.322895: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:29:00.324158: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.324183: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.324205: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.324227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.324236: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.324240: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.324259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.324262: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.325186: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.325222: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.325234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.325257: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.325275: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.325276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.325305: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.325310: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.331195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.331196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.331196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.331201: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.331201: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.331203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.331203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.331212: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:29:00.331213: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:29:00.331214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:29:00.331218: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:29:00.331220: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:29:00.331221: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:29:00.331223: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:29:00.331266: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:29:00.331280: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:29:00.331490: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:29:00.331493: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:29:00.331495: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:29:00.331490: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:29:00.331489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-15 23:29:00.332262: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:29:00.331499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:29:00.331498: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:29:00.332266: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:29:00.331503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:29:00.331513: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:29:00.331511: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:29:00.331512: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:29:00.331517: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:29:00.331513: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:29:00.331511: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:29:00.331517: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:29:00.331518: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.331936: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.331938: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.331941: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-15 23:29:00.332012: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.331938: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.331944: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.332013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-15 23:29:00.331944: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.332011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-15 23:29:00.331946: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.331947: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.331955: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:29:00.331957: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:29:00.331959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:29:00.331957: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:29:00.331961: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:29:00.332016: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-15 23:29:00.331962: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:29:00.331964: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:29:00.331965: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.332011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.332013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.332016: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.332021: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:29:00.332032: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:29:00.332032: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:29:00.332035: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:29:00.332037: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:29:00.332037: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:29:00.332040: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:29:00.332038: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:29:00.332040: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332270: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:29:00.332270: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:29:00.332272: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:29:00.332282: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332278: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332273: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:29:00.332284: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332289: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332291: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332289: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332293: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:29:00.332295: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: +3: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +5: +5: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +3: Loading extension module utils... +6: Loading extension module utils... +0: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +0: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +1: Loading extension module utils... +0: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +6: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +3: Loading extension module utils... +6: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +6: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +6: Loading extension module utils... +5: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +6: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +0: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: +0: Loading extension module utils...Loading extension module utils...Loading extension module utils... +0: +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +3: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils...Loading extension module utils... +2: +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: +1: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils...Loading extension module utils... +7: +6: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: +7: Loading extension module utils...Loading extension module utils...Loading extension module utils... +7: +7: +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/1b11b5400m/3318671.out b/1b11b5400m/3318671.out new file mode 100644 index 0000000000000000000000000000000000000000..ded06d51ce0d939919a0b6310806ab502cba596c --- /dev/null +++ b/1b11b5400m/3318671.out @@ -0,0 +1,8468 @@ +Model parameters: d_model 1792 ffw_size 7168 kv_size 128 n_heads 14 n_layers 26 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 26 --hidden-size 1792 --num-attention-heads 14 --kv-channels 128 --ffn-hidden-size 7168 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1b11b5400mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-only true --eval-iters 100 --tensorboard-dir tensorboard_1b11b5400mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_1b11b5400m --load checkpoints_1b11b5400m --train-weighted-split-paths-path train400m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3318671.json --zero-stage 0 +START 3318671: Wed 15 Mar 2023 11:27:16 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 45.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 40.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 45.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 38.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 42.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 44.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 35.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 38.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 43.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 45.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 49.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 44.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 49.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 39.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 46.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 40.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 49.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 39.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 39.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +6: Launching on nid006945 (6/8), master nid006939 port 9999, GPUs 8, CUDA: True +0: Launching on nid006939 (0/8), master nid006939 port 9999, GPUs 8, CUDA: True +3: Launching on nid006942 (3/8), master nid006939 port 9999, GPUs 8, CUDA: True +1: Launching on nid006940 (1/8), master nid006939 port 9999, GPUs 8, CUDA: True +4: Launching on nid006943 (4/8), master nid006939 port 9999, GPUs 8, CUDA: True +5: Launching on nid006944 (5/8), master nid006939 port 9999, GPUs 8, CUDA: True +2: Launching on nid006941 (2/8), master nid006939 port 9999, GPUs 8, CUDA: True +7: Launching on nid006946 (7/8), master nid006939 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3318671.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 7168 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 1792 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-1b11b5400mval +0: kv_channels ..................................... 128 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_1b11b5400m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... 12.0 +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 14 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 26 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_1b11b5400m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_1b11b5400mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-15 23:30:09,099] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.096 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 25.166 seconds +0: time to initialize megatron (seconds): -2.854 +0: [after megatron is initialized] datetime: 2023-03-15 23:30:37 +0: building GPT model ... +0: [2023-03-15 23:30:37,279] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-15 23:30:37,280] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-15 23:30:37,280] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.95 GB, percent = 6.7% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-15 23:30:39,263] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=33 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: ParallelTransformerLayerPipe +0: 22: ParallelTransformerLayerPipe +0: 23: ParallelTransformerLayerPipe +0: 24: ParallelTransformerLayerPipe +0: 25: ParallelTransformerLayerPipe +0: 26: ParallelTransformerLayerPipe +0: 27: ParallelTransformerLayerPipe +0: 28: ParallelTransformerLayerPipe +0: 29: undo +0: 30: MixedFusedLayerNorm +0: 31: EmbeddingPipe +0: 32: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-15 23:30:39,659] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-15 23:30:39,659] [INFO] [utils.py:828:see_memory_usage] MA 2.05 GB Max_MA 2.05 GB CA 2.19 GB Max_CA 2 GB +0: [2023-03-15 23:30:39,660] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.01 GB, percent = 6.8% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-15 23:30:39,662] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-15 23:30:53,064] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-15 23:30:53,064] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-15 23:30:53,064] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-15 23:30:53,076] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-15 23:30:53,076] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-15 23:30:53,196] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-15 23:30:53,196] [INFO] [utils.py:828:see_memory_usage] MA 2.04 GB Max_MA 2.06 GB CA 2.19 GB Max_CA 2 GB +0: [2023-03-15 23:30:53,196] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.7 GB, percent = 6.9% +0: ninja: no work to do. +1: Time to load utils op: 0.2989840507507324 seconds +1: Time to load utils op: 0.2989959716796875 seconds +1: Time to load utils op: 0.29903459548950195 seconds +1: Time to load utils op: 0.2990419864654541 seconds +1: Time to load utils op: 0.2990410327911377 seconds +1: Time to load utils op: 0.2990589141845703 secondsTime to load utils op: 0.29904747009277344 seconds +1: +1: Time to load utils op: 0.29906582832336426 seconds +4: Time to load utils op: 0.29752516746520996 seconds +4: Time to load utils op: 0.29755353927612305 seconds +4: Time to load utils op: 0.2975649833679199 seconds +4: Time to load utils op: 0.297579288482666 seconds +4: Time to load utils op: 0.2975776195526123 seconds +4: Time to load utils op: 0.29758429527282715 secondsTime to load utils op: 0.2975790500640869 seconds +4: +4: Time to load utils op: 0.2975931167602539 seconds +3: Time to load utils op: 0.29764533042907715 secondsTime to load utils op: 0.29764580726623535 seconds +3: +3: Time to load utils op: 0.29767847061157227 seconds +3: Time to load utils op: 0.29769206047058105 seconds +3: Time to load utils op: 0.2976958751678467 seconds +3: Time to load utils op: 0.2976956367492676 seconds +3: Time to load utils op: 0.2977135181427002 seconds +3: Time to load utils op: 0.2977292537689209 seconds +0: Time to load utils op: 0.3064448833465576 seconds +0: Time to load utils op: 0.30655646324157715 seconds +0: Time to load utils op: 0.22873234748840332 seconds +0: Time to load utils op: 0.3069572448730469 seconds +0: Time to load utils op: 0.30646467208862305 seconds +0: Time to load utils op: 0.3071177005767822 seconds +0: Time to load utils op: 0.30649852752685547 secondsTime to load utils op: 0.30650901794433594 seconds +0: +5: Time to load utils op: 0.2983818054199219 secondsTime to load utils op: 0.2983853816986084 seconds +5: +5: Time to load utils op: 0.29842638969421387 seconds +5: Time to load utils op: 0.29844188690185547 secondsTime to load utils op: 0.2984466552734375 secondsTime to load utils op: 0.2984457015991211 seconds +5: +5: +5: Time to load utils op: 0.2984426021575928 seconds +5: Time to load utils op: 0.29845714569091797 seconds +6: Time to load utils op: 0.29843783378601074 secondsTime to load utils op: 0.2984347343444824 seconds +6: +6: Time to load utils op: 0.2984597682952881 seconds +6: Time to load utils op: 0.298473596572876 seconds +6: Time to load utils op: 0.29848766326904297 seconds +6: Time to load utils op: 0.298490047454834 seconds +6: Time to load utils op: 0.298494815826416 seconds +6: Time to load utils op: 0.2985060214996338 seconds +2: Time to load utils op: 0.3032081127166748 seconds +2: Time to load utils op: 0.30321168899536133 seconds +2: Time to load utils op: 0.30322718620300293 seconds +2: Time to load utils op: 0.3032686710357666 secondsTime to load utils op: 0.30325841903686523 secondsTime to load utils op: 0.3032686710357666 seconds +2: +2: Time to load utils op: 0.30327725410461426 seconds +2: Time to load utils op: 0.3032693862915039 seconds +2: +7: Time to load utils op: 0.29552698135375977 seconds +7: Time to load utils op: 0.28885889053344727 seconds +7: Time to load utils op: 0.2818877696990967 secondsTime to load utils op: 0.28235912322998047 seconds +7: +7: Time to load utils op: 0.28992462158203125 seconds +7: Time to load utils op: 0.28806185722351074 seconds +7: Time to load utils op: 0.2902200222015381 seconds +7: Time to load utils op: 0.29041504859924316 seconds +0: [2023-03-15 23:30:53,533] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-15 23:30:53,534] [INFO] [utils.py:828:see_memory_usage] MA 2.04 GB Max_MA 2.04 GB CA 2.19 GB Max_CA 2 GB +0: [2023-03-15 23:30:53,534] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.7 GB, percent = 6.9% +0: Time to load utils op: 0.0006723403930664062 seconds +0: Time to load utils op: 0.00067901611328125 seconds +0: Time to load utils op: 0.0006639957427978516 seconds +0: Time to load utils op: 0.0006649494171142578 seconds +0: Time to load utils op: 0.0007016658782958984 seconds +0: Time to load utils op: 0.0006761550903320312 seconds +0: Time to load utils op: 0.0005435943603515625 seconds +3: Time to load utils op: 0.0010330677032470703 seconds +3: Time to load utils op: 0.0009410381317138672 secondsTime to load utils op: 0.0009872913360595703 seconds +3: +2: Time to load utils op: 0.0012981891632080078 seconds +3: Time to load utils op: 0.0012180805206298828 seconds +3: Time to load utils op: 0.0013735294342041016 seconds +3: Time to load utils op: 0.0012767314910888672 secondsTime to load utils op: 0.0012960433959960938 seconds +3: +3: Time to load utils op: 0.0013954639434814453 seconds +2: Time to load utils op: 0.001486063003540039 seconds +2: Time to load utils op: 0.0016531944274902344 seconds +2: Time to load utils op: 0.0016438961029052734 seconds +2: Time to load utils op: 0.0016078948974609375 seconds +2: Time to load utils op: 0.0016570091247558594 secondsTime to load utils op: 0.0016679763793945312 seconds +2: +2: Time to load utils op: 0.0016832351684570312 seconds +4: Time to load utils op: 0.0009243488311767578 seconds +4: Time to load utils op: 0.0011074542999267578 seconds +4: Time to load utils op: 0.0013718605041503906 seconds +1: Time to load utils op: 0.0012657642364501953 seconds +1: Time to load utils op: 0.0012860298156738281 seconds +6: Time to load utils op: 0.0008707046508789062 seconds +4: Time to load utils op: 0.0015110969543457031 seconds +4: Time to load utils op: 0.0015561580657958984 seconds +4: Time to load utils op: 0.0015420913696289062 seconds +4: Time to load utils op: 0.0015244483947753906 seconds +6: Time to load utils op: 0.0009391307830810547 seconds +4: Time to load utils op: 0.0016012191772460938 seconds +1: Time to load utils op: 0.0015418529510498047 seconds +1: Time to load utils op: 0.0015490055084228516 secondsTime to load utils op: 0.0015454292297363281 seconds +1: +1: Time to load utils op: 0.0015370845794677734 seconds +1: Time to load utils op: 0.0015137195587158203 seconds +1: Time to load utils op: 0.0016155242919921875 seconds +6: Time to load utils op: 0.0012264251708984375 seconds +6: Time to load utils op: 0.0013747215270996094 seconds +6: Time to load utils op: 0.0013530254364013672 seconds +6: Time to load utils op: 0.0013096332550048828 seconds +7: Time to load utils op: 0.0005881786346435547 secondsTime to load utils op: 0.0006234645843505859 seconds +7: +7: Time to load utils op: 0.0006585121154785156 seconds +7: Time to load utils op: 0.0006654262542724609 secondsTime to load utils op: 0.0006551742553710938 secondsTime to load utils op: 0.0006716251373291016 seconds +7: +7: +6: Time to load utils op: 0.0013642311096191406 seconds +6: Time to load utils op: 0.0013928413391113281 seconds +7: Time to load utils op: 0.0004591941833496094 seconds +7: Time to load utils op: 0.0006210803985595703 seconds +5: Time to load utils op: 0.0007479190826416016 seconds +5: Time to load utils op: 0.001031637191772461 seconds +5: Time to load utils op: 0.0011298656463623047 seconds +5: Time to load utils op: 0.0013773441314697266 seconds +5: Time to load utils op: 0.0013492107391357422 seconds +5: Time to load utils op: 0.0012385845184326172 seconds +5: Time to load utils op: 0.0013663768768310547 seconds +5: Time to load utils op: 0.0013797283172607422 seconds +0: [2023-03-15 23:30:53,673] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-15 23:30:53,673] [INFO] [utils.py:828:see_memory_usage] MA 4.24 GB Max_MA 4.24 GB CA 5.44 GB Max_CA 5 GB +0: [2023-03-15 23:30:53,673] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.85 GB, percent = 6.9% +0: [2023-03-15 23:30:53,779] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-15 23:30:53,780] [INFO] [utils.py:828:see_memory_usage] MA 4.24 GB Max_MA 4.24 GB CA 5.44 GB Max_CA 5 GB +0: [2023-03-15 23:30:53,780] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.85 GB, percent = 6.9% +0: [2023-03-15 23:30:53,886] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-15 23:30:53,886] [INFO] [utils.py:828:see_memory_usage] MA 6.19 GB Max_MA 6.19 GB CA 8.31 GB Max_CA 8 GB +0: [2023-03-15 23:30:53,886] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.85 GB, percent = 6.9% +0: [2023-03-15 23:30:53,990] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-15 23:30:53,990] [INFO] [utils.py:828:see_memory_usage] MA 6.19 GB Max_MA 6.19 GB CA 8.31 GB Max_CA 8 GB +0: [2023-03-15 23:30:53,991] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.85 GB, percent = 6.9% +0: [2023-03-15 23:30:54,101] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-15 23:30:54,102] [INFO] [utils.py:828:see_memory_usage] MA 6.19 GB Max_MA 6.19 GB CA 8.31 GB Max_CA 8 GB +0: [2023-03-15 23:30:54,102] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.85 GB, percent = 6.9% +0: [2023-03-15 23:30:54,206] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-15 23:30:54,207] [INFO] [utils.py:828:see_memory_usage] MA 6.19 GB Max_MA 6.19 GB CA 8.31 GB Max_CA 8 GB +0: [2023-03-15 23:30:54,207] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.85 GB, percent = 6.9% +0: [2023-03-15 23:30:54,316] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-15 23:30:54,316] [INFO] [utils.py:828:see_memory_usage] MA 6.32 GB Max_MA 6.32 GB CA 8.34 GB Max_CA 8 GB +0: [2023-03-15 23:30:54,317] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.85 GB, percent = 6.9% +0: [2023-03-15 23:30:54,421] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-15 23:30:54,421] [INFO] [utils.py:828:see_memory_usage] MA 6.32 GB Max_MA 6.32 GB CA 8.34 GB Max_CA 8 GB +0: [2023-03-15 23:30:54,422] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.85 GB, percent = 6.9% +0: [2023-03-15 23:30:54,422] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-15 23:30:54,422] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-15 23:30:54,422] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-15 23:30:54,422] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-15 23:30:54,422] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +6: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-15 23:30:54,423] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-15 23:30:54,424] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-15 23:30:54,425] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-15 23:30:54,425] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-15 23:30:54,425] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-15 23:30:54,425] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0005753040313720703 seconds +0: [2023-03-15 23:30:54,425] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-15 23:30:54,437] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=33 [0, 33) STAGE_PARAMS=1096338432 (1096.338M) TOTAL_PARAMS=1096338432 (1096.338M) UNIQUE_PARAMS=1096338432 (1096.338M) +0: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +6: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +4: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +5: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +3: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +7: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +3: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +6: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +4: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt... +0: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/mp_rank_00_model_states.pt. +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:30:54,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:30:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:30:54,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:30:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:30:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:30:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:30:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:30:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:30:54,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:30:54,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:30:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:30:54,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:30:54,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:54,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:54,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:30:54,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:54,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:54,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:54,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:54,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:54,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:54,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:54,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:30:54,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:30:54,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:54,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:54,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:54,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:54,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:54,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:54,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:30:54,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:54,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:54,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:54,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:54,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:30:54,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:54,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:54,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:30:54,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:30:54,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:30:54,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:54,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:54,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:54,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:54,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:54,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:54,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:54,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:54,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:54,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:54,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:54,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:54,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:54,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:54,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:54,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:54,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:54,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:54,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:54,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:55,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:55,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:55,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:55,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:55,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:55,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:55,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:55,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:30:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:55,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:55,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:55,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:30:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:30:55,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:55,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:55,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:55,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:55,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:55,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:55,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:30:55,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:30:55,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:55,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:55,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:55,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:55,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:55,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:55,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:55,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:55,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:55,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:30:55,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:55,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:55,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:55,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:30:55,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:30:55,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:30:55,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:30:55,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:30:55,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:30:55,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:30:55,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:30:55,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:30:55,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:30:55,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:30:55,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:30:55,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:30:55,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:30:55,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:30:55,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:30:55,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:30:55,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:30:55,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:30:55,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:30:55,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:55,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:55,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:55,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:55,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:55,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:55,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:30:55,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:55,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:30:55,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:30:55,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:30:55,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:55,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:55,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:55,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:30:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:55,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:55,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:30:55,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:55,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:55,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:55,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:55,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:55,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:55,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:55,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:55,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:55,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:55,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:55,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:55,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:30:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:55,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:30:55,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:55,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:30:55,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:30:55,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:55,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:55,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:55,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:55,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:55,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:55,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:55,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:55,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:30:55,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:30:55,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:55,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:55,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:55,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:55,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:55,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:55,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:30:55,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:55,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:55,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:55,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:55,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:55,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:30:56,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:56,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:56,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:56,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:56,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:56,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:56,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:56,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:56,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:56,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:56,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:30:56,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:56,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:56,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:30:56,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:56,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:56,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:56,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:30:56,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:30:56,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:56,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:56,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:30:56,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:30:56,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:30:56,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:30:56,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:30:56,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:30:56,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:30:56,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:30:56,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:30:56,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:30:56,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:30:56,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:30:56,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:30:56,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:30:56,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:30:56,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:30:56,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:30:56,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:30:56,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:30:56,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:30:56,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:30:56,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:30:56,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:30:56,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:30:56,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:30:56,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:30:56,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:30:56,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:30:56,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:30:56,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:30:56,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:30:56,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:30:56,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:30:56,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:30:56,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:30:56,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:30:56,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:30:56,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:30:56,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:30:56,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:30:56,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:30:56,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:30:56,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:30:56,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:30:56,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:56,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:56,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:56,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:56,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:56,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:56,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:30:56,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:56,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:30:56,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:56,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:30:56,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:56,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:56,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:56,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:56,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:56,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:56,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:56,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:56,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:56,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:56,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:56,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:30:56,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:56,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:56,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:30:56,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:56,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:56,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:56,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:56,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:56,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:56,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:30:56,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:56,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:30:56,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:56,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:56,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:56,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:56,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:30:56,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:30:56,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:56,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:56,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:30:56,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:56,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:56,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:56,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:56,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:56,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:56,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:30:56,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:56,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:56,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:56,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:30:56,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:30:56,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:56,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:56,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:30:56,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:56,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:57,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:57,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:57,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:57,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:57,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:57,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:57,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:57,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:57,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:57,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:57,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:57,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:57,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:57,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:57,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:57,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:57,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:57,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:30:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:57,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:30:57,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:57,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:30:57,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:30:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:30:57,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:30:57,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:30:57,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:30:57,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:30:57,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:30:57,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:30:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:30:57,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:30:57,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:30:57,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:30:57,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:30:57,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:30:57,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:30:57,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:30:57,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:30:57,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:30:57,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:30:57,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:30:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:30:57,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:30:57,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:30:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:30:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:30:57,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:30:57,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:30:57,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:30:57,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:30:57,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:30:57,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:30:57,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:30:57,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:30:57,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:30:57,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:30:57,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:30:57,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:30:57,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:30:57,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:30:57,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:30:57,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:30:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:30:57,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:30:57,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:30:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:30:57,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:30:57,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:30:57,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:30:57,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:30:57,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:30:57,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:30:57,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:30:57,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:30:57,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:30:57,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:30:57,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:57,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:57,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:57,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:57,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:57,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:57,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:30:57,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:57,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:30:57,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:57,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:57,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:57,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:57,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:57,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:57,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:57,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:57,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:57,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:57,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:30:57,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:57,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:30:57,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:30:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:30:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:30:57,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:57,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:57,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:57,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:57,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:57,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:57,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:57,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:57,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:57,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:57,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:57,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:57,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:30:57,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:58,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:58,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:58,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:58,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:58,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:58,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:58,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:58,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:58,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:58,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:58,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:58,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:58,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:30:58,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:30:58,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:58,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:58,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:30:58,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:58,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:58,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:58,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:58,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:58,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:30:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:30:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:58,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:30:58,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:30:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:30:58,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:30:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:30:58,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:30:58,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:30:58,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:30:58,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:30:58,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:30:58,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:30:58,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:30:58,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:30:58,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:30:58,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:30:58,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:30:58,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:30:58,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:30:58,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:30:58,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:30:58,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:30:58,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:30:58,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:30:58,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:30:58,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:30:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:30:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:30:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:30:58,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:30:58,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:30:58,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:30:58,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:30:58,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:30:58,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:30:58,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:30:58,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:30:58,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:30:58,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:58,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:30:58,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:30:58,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:58,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:58,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:58,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:58,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:58,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:30:58,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:58,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:58,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:58,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:58,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:58,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:58,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:58,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:58,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:30:58,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:58,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:58,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:58,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:58,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:58,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:58,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:58,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:58,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:58,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:58,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:58,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:30:58,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:58,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:30:58,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:30:58,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:30:58,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:30:58,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:58,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:58,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:58,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:58,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:58,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:59,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:59,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:59,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:30:59,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:30:59,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:59,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:59,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:30:59,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:59,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:59,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:59,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:59,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:59,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:59,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:59,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:59,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:30:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:59,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:59,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:59,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:59,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:59,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:30:59,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:59,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:59,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:30:59,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:30:59,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:30:59,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:59,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:59,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:59,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:30:59,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:30:59,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:30:59,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:30:59,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:30:59,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:30:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:30:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:30:59,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:30:59,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:30:59,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:30:59,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:30:59,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:30:59,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:30:59,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:30:59,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:30:59,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:30:59,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:30:59,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:30:59,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:30:59,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:30:59,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:30:59,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:30:59,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:30:59,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:30:59,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:30:59,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:30:59,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:30:59,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:30:59,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:30:59,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:30:59,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:30:59,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:30:59,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:30:59,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:30:59,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:30:59,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:30:59,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:30:59,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:30:59,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:30:59,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:30:59,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:30:59,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:30:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:30:59,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:30:59,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:30:59,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:30:59,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:30:59,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:30:59,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:30:59,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:30:59,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:30:59,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:30:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:30:59,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:30:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:30:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:30:59,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:30:59,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:30:59,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:30:59,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:30:59,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:30:59,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:30:59,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:30:59,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:30:59,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:30:59,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:30:59,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:30:59,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:30:59,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:30:59,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:30:59,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:30:59,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:30:59,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:30:59,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:30:59,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:30:59,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:30:59,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:30:59,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:30:59,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:30:59,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:30:59,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:30:59,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:30:59,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:30:59,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:30:59,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:30:59,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:30:59,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:30:59,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:30:59,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:30:59,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:30:59,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:30:59,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:30:59,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:30:59,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:30:59,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:30:59,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:30:59,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:30:59,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:30:59,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:31:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:31:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:31:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:31:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:31:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:31:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:31:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:31:00,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:31:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:31:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:31:00,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:31:00,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:31:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:31:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:31:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:31:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:31:00,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:31:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:31:00,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:31:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:31:00,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +4: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:31:00,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:31:00,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:31:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +0: [2023-03-15 23:31:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:31:00,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +1: [2023-03-15 23:31:00,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:31:00,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +3: [2023-03-15 23:31:00,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:31:00,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:31:00,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:31:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:31:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +5: [2023-03-15 23:31:00,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:31:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:31:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +6: [2023-03-15 23:31:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +7: [2023-03-15 23:31:00,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt... +2: [2023-03-15 23:31:00,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +5: [2023-03-15 23:31:00,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +4: [2023-03-15 23:31:00,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +3: [2023-03-15 23:31:00,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +2: [2023-03-15 23:31:00,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +0: [2023-03-15 23:31:00,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +1: [2023-03-15 23:31:00,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +6: [2023-03-15 23:31:00,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_21-model_00-model_states.pt. +7: [2023-03-15 23:31:00,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:00,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:00,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:00,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:00,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:00,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:00,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:00,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:00,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:00,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:00,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:00,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:00,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:00,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:00,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:00,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +4: [2023-03-15 23:31:00,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:00,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:00,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:00,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:00,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +4: [2023-03-15 23:31:00,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:00,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:00,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:00,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +1: [2023-03-15 23:31:00,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +1: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +2: [2023-03-15 23:31:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +0: [2023-03-15 23:31:00,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:00,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:00,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:00,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:00,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:00,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:00,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:00,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:00,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:00,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:00,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:00,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +7: [2023-03-15 23:31:00,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:00,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:00,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +6: [2023-03-15 23:31:00,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +5: [2023-03-15 23:31:00,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt... +3: [2023-03-15 23:31:00,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +5: [2023-03-15 23:31:00,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +3: [2023-03-15 23:31:00,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:00,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:00,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +7: [2023-03-15 23:31:00,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +2: [2023-03-15 23:31:00,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:00,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:00,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:00,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:00,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +0: [2023-03-15 23:31:00,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:00,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_23-model_00-model_states.pt. +6: [2023-03-15 23:31:00,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:00,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:00,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:00,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:00,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:00,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:00,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:00,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:00,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:00,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:00,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:00,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:00,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:00,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:00,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:00,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:00,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:00,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:00,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:00,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:00,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:00,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:00,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:00,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:00,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:00,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:00,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:00,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:01,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:01,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:01,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:01,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:01,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:01,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:01,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +5: [2023-03-15 23:31:01,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:01,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +5: [2023-03-15 23:31:01,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:01,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:01,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:01,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +5: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +3: [2023-03-15 23:31:01,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +1: [2023-03-15 23:31:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:01,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +1: [2023-03-15 23:31:01,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:01,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +2: [2023-03-15 23:31:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +5: [2023-03-15 23:31:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +7: [2023-03-15 23:31:01,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:01,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:01,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +4: [2023-03-15 23:31:01,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:01,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:01,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:01,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:01,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:01,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:01,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +5: [2023-03-15 23:31:01,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +5: [2023-03-15 23:31:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:01,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +2: [2023-03-15 23:31:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +0: [2023-03-15 23:31:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:01,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +6: [2023-03-15 23:31:01,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +5: [2023-03-15 23:31:01,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt... +4: [2023-03-15 23:31:01,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +5: [2023-03-15 23:31:01,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +3: [2023-03-15 23:31:01,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +7: [2023-03-15 23:31:01,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +6: [2023-03-15 23:31:01,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_24-model_00-model_states.pt. +0: [2023-03-15 23:31:01,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +3: [2023-03-15 23:31:01,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +2: [2023-03-15 23:31:01,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +1: [2023-03-15 23:31:01,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +1: [2023-03-15 23:31:01,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +3: [2023-03-15 23:31:01,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +4: [2023-03-15 23:31:01,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +5: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +2: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +6: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +5: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +4: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +6: [2023-03-15 23:31:01,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +7: [2023-03-15 23:31:01,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +5: [2023-03-15 23:31:01,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt... +0: [2023-03-15 23:31:01,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +7: [2023-03-15 23:31:01,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +0: [2023-03-15 23:31:01,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_25-model_00-model_states.pt. +5: [2023-03-15 23:31:01,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +7: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +2: [2023-03-15 23:31:01,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +4: [2023-03-15 23:31:01,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +3: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +6: [2023-03-15 23:31:01,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +0: [2023-03-15 23:31:01,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +5: [2023-03-15 23:31:01,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt... +1: [2023-03-15 23:31:01,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +5: [2023-03-15 23:31:01,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +4: [2023-03-15 23:31:01,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +6: [2023-03-15 23:31:01,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +1: [2023-03-15 23:31:01,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +7: [2023-03-15 23:31:01,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +2: [2023-03-15 23:31:01,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +3: [2023-03-15 23:31:01,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_26-model_00-model_states.pt. +0: [2023-03-15 23:31:01,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:01,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:01,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:01,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:01,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:01,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +1: [2023-03-15 23:31:01,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +6: [2023-03-15 23:31:01,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +1: [2023-03-15 23:31:01,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +2: [2023-03-15 23:31:01,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:01,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:01,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:01,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:01,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:01,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:01,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:01,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:01,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:01,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:01,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:01,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:01,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +6: [2023-03-15 23:31:01,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:01,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +7: [2023-03-15 23:31:01,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +7: [2023-03-15 23:31:01,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +5: [2023-03-15 23:31:01,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +3: [2023-03-15 23:31:01,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:01,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +6: [2023-03-15 23:31:01,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +0: [2023-03-15 23:31:01,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt... +4: [2023-03-15 23:31:01,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:01,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:01,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:01,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:01,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:01,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +5: [2023-03-15 23:31:01,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +3: [2023-03-15 23:31:01,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +2: [2023-03-15 23:31:01,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +6: [2023-03-15 23:31:01,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +4: [2023-03-15 23:31:01,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:01,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:01,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +6: [2023-03-15 23:31:01,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:01,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:01,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_27-model_00-model_states.pt. +0: [2023-03-15 23:31:01,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:01,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:01,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:01,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:01,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:01,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:02,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:02,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:02,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:02,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:02,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:02,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +4: [2023-03-15 23:31:02,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:02,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +6: [2023-03-15 23:31:02,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:02,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:02,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:02,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +6: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +6: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:02,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +6: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +1: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:02,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +4: [2023-03-15 23:31:02,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:02,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:02,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:02,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:02,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:02,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:02,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:02,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:02,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:02,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:02,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:02,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +5: [2023-03-15 23:31:02,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +2: [2023-03-15 23:31:02,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:02,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +4: [2023-03-15 23:31:02,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +4: [2023-03-15 23:31:02,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +4: [2023-03-15 23:31:02,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:02,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +3: [2023-03-15 23:31:02,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:02,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:02,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +4: [2023-03-15 23:31:02,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:02,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:02,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:02,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:02,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:02,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:02,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:02,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:02,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +7: [2023-03-15 23:31:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +0: [2023-03-15 23:31:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt... +6: [2023-03-15 23:31:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +5: [2023-03-15 23:31:02,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +1: [2023-03-15 23:31:02,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +2: [2023-03-15 23:31:02,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +2: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +2: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +5: [2023-03-15 23:31:02,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +2: [2023-03-15 23:31:02,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +3: [2023-03-15 23:31:02,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +3: [2023-03-15 23:31:02,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +7: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:02,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +6: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:02,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +6: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +5: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +1: [2023-03-15 23:31:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +6: [2023-03-15 23:31:02,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +7: [2023-03-15 23:31:02,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +7: [2023-03-15 23:31:02,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +1: [2023-03-15 23:31:02,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +3: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:02,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:02,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:02,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:02,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:02,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:02,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_28-model_00-model_states.pt. +0: [2023-03-15 23:31:02,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt... +0: [2023-03-15 23:31:02,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/layer_30-model_00-model_states.pt. +0: [2023-03-15 23:31:02,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:02,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:02,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:02,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:02,671] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +4: [2023-03-15 23:31:02,677] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +1: [2023-03-15 23:31:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:02,691] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +1: [2023-03-15 23:31:02,698] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +7: [2023-03-15 23:31:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:02,757] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +7: [2023-03-15 23:31:02,764] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +5: [2023-03-15 23:31:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +3: [2023-03-15 23:31:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +4: [2023-03-15 23:31:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:02,828] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +5: [2023-03-15 23:31:02,829] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +3: [2023-03-15 23:31:02,830] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +6: [2023-03-15 23:31:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:02,832] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +4: [2023-03-15 23:31:02,835] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +6: [2023-03-15 23:31:02,839] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +6: [2023-03-15 23:31:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:02,863] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +5: [2023-03-15 23:31:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:02,863] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +5: [2023-03-15 23:31:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:02,864] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +6: [2023-03-15 23:31:02,869] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +5: [2023-03-15 23:31:02,870] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +5: [2023-03-15 23:31:02,871] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +3: [2023-03-15 23:31:02,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:02,873] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +6: [2023-03-15 23:31:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:02,877] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +4: [2023-03-15 23:31:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:02,878] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +3: [2023-03-15 23:31:02,880] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +2: [2023-03-15 23:31:02,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:02,884] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +6: [2023-03-15 23:31:02,885] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +7: [2023-03-15 23:31:02,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:02,886] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +4: [2023-03-15 23:31:02,886] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +1: [2023-03-15 23:31:02,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:02,886] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +4: [2023-03-15 23:31:02,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:02,890] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +2: [2023-03-15 23:31:02,891] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +7: [2023-03-15 23:31:02,892] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +2: [2023-03-15 23:31:02,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:02,892] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +1: [2023-03-15 23:31:02,893] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +5: [2023-03-15 23:31:02,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:02,897] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +4: [2023-03-15 23:31:02,898] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +2: [2023-03-15 23:31:02,898] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +6: [2023-03-15 23:31:02,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:02,899] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +5: [2023-03-15 23:31:02,903] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +2: [2023-03-15 23:31:02,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:02,904] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +6: [2023-03-15 23:31:02,906] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +6: [2023-03-15 23:31:02,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:02,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:02,908] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +2: [2023-03-15 23:31:02,908] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +1: [2023-03-15 23:31:02,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:02,909] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +2: [2023-03-15 23:31:02,911] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +0: [2023-03-15 23:31:02,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:02,913] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +6: [2023-03-15 23:31:02,915] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +2: [2023-03-15 23:31:02,915] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +1: [2023-03-15 23:31:02,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:02,915] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +1: [2023-03-15 23:31:02,915] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +4: [2023-03-15 23:31:02,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:02,917] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +5: [2023-03-15 23:31:02,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:02,921] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +0: [2023-03-15 23:31:02,921] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +1: [2023-03-15 23:31:02,921] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +3: [2023-03-15 23:31:02,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:02,924] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +4: [2023-03-15 23:31:02,925] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +0: [2023-03-15 23:31:02,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:02,926] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +5: [2023-03-15 23:31:02,927] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +5: [2023-03-15 23:31:02,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:02,929] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +3: [2023-03-15 23:31:02,930] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +4: [2023-03-15 23:31:02,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:02,934] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +0: [2023-03-15 23:31:02,934] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: [2023-03-15 23:31:02,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:02,935] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +5: [2023-03-15 23:31:02,937] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +4: [2023-03-15 23:31:02,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:02,937] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +6: [2023-03-15 23:31:02,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:02,937] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +7: [2023-03-15 23:31:02,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:02,941] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +4: [2023-03-15 23:31:02,941] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +0: [2023-03-15 23:31:02,942] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +6: [2023-03-15 23:31:02,944] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +4: [2023-03-15 23:31:02,945] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +2: [2023-03-15 23:31:02,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:02,945] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +7: [2023-03-15 23:31:02,947] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +6: [2023-03-15 23:31:02,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:02,951] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +2: [2023-03-15 23:31:02,953] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +1: [2023-03-15 23:31:02,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:02,953] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +7: [2023-03-15 23:31:02,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:02,957] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +6: [2023-03-15 23:31:02,958] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +1: [2023-03-15 23:31:02,960] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +4: [2023-03-15 23:31:02,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:02,962] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +6: [2023-03-15 23:31:02,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:02,964] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +7: [2023-03-15 23:31:02,965] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +5: [2023-03-15 23:31:02,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:02,968] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +4: [2023-03-15 23:31:02,969] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +6: [2023-03-15 23:31:02,971] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +5: [2023-03-15 23:31:02,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:02,973] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +1: [2023-03-15 23:31:02,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:02,973] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +1: [2023-03-15 23:31:02,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:02,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:02,975] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +1: [2023-03-15 23:31:02,975] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +0: [2023-03-15 23:31:02,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:02,976] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +5: [2023-03-15 23:31:02,976] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +7: [2023-03-15 23:31:02,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:02,977] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +5: [2023-03-15 23:31:02,980] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +1: [2023-03-15 23:31:02,980] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +3: [2023-03-15 23:31:02,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:02,982] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +1: [2023-03-15 23:31:02,982] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +3: [2023-03-15 23:31:02,982] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +7: [2023-03-15 23:31:02,984] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +0: [2023-03-15 23:31:02,985] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +2: [2023-03-15 23:31:02,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:02,986] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +7: [2023-03-15 23:31:02,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:02,988] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +2: [2023-03-15 23:31:02,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:02,989] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +3: [2023-03-15 23:31:02,989] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +0: [2023-03-15 23:31:02,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:02,991] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +2: [2023-03-15 23:31:02,993] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +7: [2023-03-15 23:31:02,995] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +2: [2023-03-15 23:31:02,996] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +7: [2023-03-15 23:31:02,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:02,997] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +3: [2023-03-15 23:31:02,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:02,998] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +0: [2023-03-15 23:31:02,998] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +3: [2023-03-15 23:31:03,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:03,002] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +7: [2023-03-15 23:31:03,004] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +3: [2023-03-15 23:31:03,005] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +3: [2023-03-15 23:31:03,008] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +7: [2023-03-15 23:31:03,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:03,015] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +2: [2023-03-15 23:31:03,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:03,017] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +1: [2023-03-15 23:31:03,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:03,018] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +0: [2023-03-15 23:31:03,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:03,019] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +3: [2023-03-15 23:31:03,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:03,020] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +7: [2023-03-15 23:31:03,023] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +2: [2023-03-15 23:31:03,024] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +1: [2023-03-15 23:31:03,025] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +0: [2023-03-15 23:31:03,026] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +0: [2023-03-15 23:31:03,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:03,029] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +3: [2023-03-15 23:31:03,030] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +0: [2023-03-15 23:31:03,036] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +0: [2023-03-15 23:31:03,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b11b5400m/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:03,046] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +0: [2023-03-15 23:31:03,053] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +0: successfully loaded checkpoint from checkpoints_1b11b5400m at iteration 0 +7: time (ms) | load-checkpoint: 8626.35 +0: estimated model parameters: 1.096338432 +0: estimated model parameters without embeddings: 1.002523648 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-15 23:31:03 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.007557 seconds +0: number of documents: 835726 +0: > dataset split: +0: train: +0: document indices in [0, 835726) total of 835726 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.016 seconds +0: total number of samples: 195101 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.033015 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.010 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-15 23:31:17 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 26159.47 | train/valid/test-data-iterators-setup: 12927.58 +0: [after training is done] datetime: 2023-03-15 23:31:17 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.704141E+00 | lm loss PPL: 4.061514E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3318671: Wed 15 Mar 2023 11:32:02 PM EET diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..160be92430688ff34fae14ef59cd9e8e5a9b40a6 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e846fd159b33464c016eea8ddbc395e044a3541d200bc5d7275437105f8870e +size 205568023 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a9d1f347f1711ed8da510f1722bab268b652bc62 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1cbfdb560b5a151b447e0c8c6f383ec47f077ec4de827c679a54c1102fcfb752 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9412f3e9772854563df68a748f68dd27698fa438 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb0de9c578a4c38ae762e1513bd34e2f4ce3e3a7f14026934810a87f1d5f8400 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..585fd78ec3aeb12e1e4d32b51f0d54eb09a6162e --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b0f37458b96f219abb8eb04454b91ae8c9703048a5da722c9aff2b62785204c +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..65157bcfe1f40b0212662a0e22aaf1272d565a32 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:60ddba273d2b157e62b541020d6cd1adacb2caac7cd881d4223d9019cd3fbff9 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3f3746f2c39f0c4b4b3dc6fba8d4847ec8cf64ab --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b5b93cc0b35eeebc036c1b8ec25fd93b1896ac483a6aaea9c493523598c523f8 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0f822fe876386266a37869e037bcd606bee00855 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3900db6387864eabf08b9fb5edbea190b8742ec5e6ac2dcd9db6db45fceb6c60 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..129916121907a0983ce20fe92ab7fda139805830 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6f456d9a72dd21a15ddd3c0a8443b41bae1c6e497bafd9f4042ff2ab32fb7fb1 +size 205567970 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..87c3de700ba950fabd60b8b63c1be334dafe4a7d --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33a8e399ea90146ae85e106c0820aa78cb9c25127be75ca234774f024359e4b9 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3f54d8b2640526aa790c91f5f1794aeae525db9 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:395ce8c8228cab88a38d429dbcd1bd6641c7dcd484f9434ccd68673b54dfa3c4 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..728f47ec67cef0bac9c94d9b981c30c7da19ae6a --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:78ccd982db639dbde58a67c597f5ac569093715cf80fee08ff72c97b685c4e86 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6753c248615efaa844e87ab2e58fe6008d9c1b83 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9b5cc4a8afc0d25c14079e39296ee3132399e90be3431417cd2384893146319e +size 205568151 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8ef2d7a1fc8a42c35a4e41d459afab208b7f1e9e --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:048589239ebb668214c6365f9fe3cdbc8325b504dd955628d5a96b3e888d62cb +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..820338aa719e6fb9c484c3e4eeee8ad6a262b6f4 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:58850ee03ec84dd455364a0750a53b370c32142236e9afc7b959a36dbce260ef +size 205568034 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a7e86e10a726f27d2fbba192fab43aeef2b0878c --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa0500e4069749693f75442f126f281e6a0c13ea2083d1d44f33127a5fcb1d0b +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5716a739dd19c6a974ac960b704d36f2811cbe4 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1933b314a50ab28596b6f2f2b96ea71822ccdf97d1d7f3c04d2f46551d1be695 +size 205568098 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6cfb60e7a24416703c1a63bfdce2cd087d70f205 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b4ac6e87ea8cdf8773ae62be96e603c1b33bec1855cfd0ea234e88fb4fb8ae68 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3bd663bbdc5d34ba0d59b21357c86330e32e65c --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:082c310f3de49c6fbb2a3ff0a0c7316ca8618f0d79b671b19ac9c7b76e65fb48 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d2fdca91122732dc87ce26804558f26f7f1ba862 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a02e01090af9c836ee77362226e87715bb6ea31da9466fe6e4022b590f0b56f8 +size 205568034 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..127bea9e1c9509918b5fc3732ab2fe4cf3a8222c --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8e22e6a859616c32603e0d8fd309a06824235bcb520f87bfcbf5e5b94867eb48 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..147cc4eccbd3a5e74a429f6dfd1affb151ff5ff7 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d0c2ffa2f99f59cba167ebef4771a3703bb406eb8eabc62f0ad3bfef7fee0941 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..27269d1d4d99d0be94c684f04ee57793353a9da6 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aca22e71b3929506a3d7dc805c7792511a9f7e35634798ecbaa77a989eb86bbe +size 205568098 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..52af6525283728c1fb5ae6ed0e6b7771e1b988aa --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:490a918de6bb9b078565cc18acca5d734efb2842f1e7cddacb05b517886abd59 +size 205568151 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76debfe4c43fe6eca946d80890385d73af61b31d --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:87c8104c3631e85a8b2605b452a7cf2e095b0c8aab1038f996d3a3aed5181ce3 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b86807163e7b6ce6fa2bff09cf5db2c5aefd0a3 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b70b5fca77509089f549d2049d3af7425699cf470a5785f762bc1f00809e668b +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cf3bf06eae8b417a48220685ba10771652d23a8f --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f24e7250d2d90b0115690bbcb30b15dd161a9ee430c30a3e584847536949436 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..458a65ef6a8b8dacf041854e8d79d96f98faadf1 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c95edfe9116ca3ae85334018f6c3efb2027d601a1ef74977bd539b6f3ba16db7 +size 205568034 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..94b90b9c052b64f6fd769c6d39151e5a9fe6d034 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f23787f084ec9564bf1ed48065db7e588972a5b43e53114638d180ea77fbb9ce +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..07646ac6489ced71550e1c423bb06e4006e0ccf0 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3946f4e59f7f58f1de327d2f2394ae9fab82d42b92372f71526fefd34e70be69 +size 205568290 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..99f3a5b080d1304d6331ac5478d6087e0a24043e --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e657535b5b4e8d9d7d2527f74f1bf566227248df4174e8de1bde0323807756a +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..af2be71c6f11c90fc75967e2a15caa282331c57b --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d16589d27297760f895f7e514b0ffe4fdb47b99cdff2cd16226d6c98ac0ceb91 +size 205568098 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c38957d9290e281ae0e1a9d485fd0e74a29a26e7 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:699a5e5fb2b5dc0057376bcbf98f046cfa4098a5ebb38b4902a8ed1e421cfffc +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cae8e94c5014af862580f7281b0e0c5b2a96c1c2 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a3617232c3e747572feb2384a18ed0e964fe04a2bac6ec787a9f10319f36e44c +size 205568098 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..db4a97a9e2062252b33509d7cc6942c139adc82c --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:accf24c40270f1c5c9a5a8ccc896d05e964c7edfd7a2cbf3b5b36084d0a6f222 +size 205568215 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c42fca1e18665852787372bb7626637c5eaccb5f --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3af16002503018cba7a1a7bd9313ad0b73afb333131c715de2403aca46d4bb12 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7513a405c3a64f5654e17a73e1928da9bb970548 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:afd754b649b26d8e9b221ed937ee15e407a44f0313d00a190aa61561c408aae1 +size 205568354 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ad4e7ee7dac7df99f9286d26cbcf2a1da390cb51 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f102e39534b88ffd0c10bdff3f62004b37070e040e23e9441279f93ef45b83f +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f8d706d832890fa43c904175ee6e7f7b2c0cdc00 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf1997cbe1cd69edcd8736d90463c558a6a352acabb5751b69dcca8ed4f3b861 +size 205568034 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3cc80ee617670e710e41208c106658caaa2f11ab --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:463fc9efcc3255b847c15e3596bd7e7216d1a5cb73e7fdd3d94399ae269ecd0f +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3fb0c2dd3472822269d4de7cd24625a4191175a0 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5ce1f72eeef19387f529ad93f3eca1f9a99e662b80f2a701477fbafef0ac6e01 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4d2610198c6efdcfbeeb57cd9000baecc98baad7 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfe27efc5f38fe8f9f25f6cd3f7627e5fe217f85edc32d5ac9ff61beaedd4217 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..23759afabdd180ade1019fb0ab28f8bec30057b1 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a79c7fb3a7e2501cc800e3495ebd89c27cc4645bf7c75b50696e253bef59432 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..704d6e813309b319d8ba2ff91800ebccd52f040a --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:70e43caa6ad38ebf821a992d5aab1f6b849789204a08f633cb679bfbd6840e1c +size 205568034 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f7033b513d46fe3743df4b63358e107753fb78d8 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d566e67323a1c57103dbbd054b1923afc5d921e708252112132f65f84742265b +size 205568290 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dd9586f2946c26ce72858fec5cf7c1dad08ebc5c --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:87299e5d151b81c5ff92be7241951f59e6c5734972cf32e0523893c748ce7ddd +size 205568023 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..121cd9aac7ae2e90bdb73ee9b9bb24220f45f9b1 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c16d34418de0ba21da98bbf41ab66326f09b904819740360b883fefbc34e5ae +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9cad355648fddd9dffdacfdcfa32c507bb4afb11 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8a7dc6e72bbcfb33f2b3318c03cb5d57b8b8e7569df31c3da16f65c66297066 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8c0eb3965763a2326b0e92d350a292dc7eafa982 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:633936f661463d82241a15d289d92189056f73e2c17e948e8477473936da78af +size 205568290 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6e6c5138ef50273f11e5c62c7e36b2f476d41767 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c40ac5b95ce472e8805fd24e9fc70bb75abd4892b749c362fd03cc89f782b02 +size 205568034 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..117f2915de769a9a6115da682ef75c6937e783ad --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b5111762509f976b5b1a8d938d7a8452f6afef7226fc0bdf2fb222966794f4a +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f65cd1ca845ef89e231abbf6e86efac8e454336b --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6ce1c03d27e6609a33f4e085386344fd2c575da321c6db3c3f3ee5a504461608 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4f5719dd705e088fce9e28c8837a5009191a8817 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56b0163a1861dfb8ecfa2cb2d51cda8c99842bf70c28b81b60c2cfb2475ba03c +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fd26912932183115570a9e4ce5ca0a1a99aab632 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f50eff568cf1890d79d1b11023a810f8ecd16fbc5a585e8fb1552d253319057 +size 205568290 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..27112b48a3b14df7599f9911177eb567322a2a9e --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:636d32adc5102c6614b98d91a83fe8a5fc2dc93416f1508e74a405f88d057cb8 +size 205568098 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6050a660eb6c1cb5bad635c460b352b58d76b702 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f1500e1b67c08e271f3887a9c7f3a020c95e9d59694b41936c1bb0236308c7d2 +size 205568226 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..983e50423520c733d46b3f8335a44669e14d01ff --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e89e4bae9cc099cf6213b7aaf399eaf53d99bfed10b18fbdaacc66d888530f93 +size 205568023 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6ca3358bbdfc7dd73446d3c39a30adea926ba6d5 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa43681fbb48fca4f86fd813dbdec118d0c710f83a389fd0bd764703fc26a257 +size 205568162 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4ae0f18836401fdaa2ef6f557e983a7c542b5500 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f59f751b0bc5ddde2248fd892e1047416b7b43cd693dda1d550fca89eaf32c8 +size 205568098 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5ed738bc287eb4036c25592ad6f64b36bf4e713e --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c87da675ae5922cc6620ff5466920a4dc5438da619f017d958f3cb303774d60 +size 205568290 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f77f7c699912d54e7a89db970b6a57647d965b85 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9bf9db69fcc71c8f3c5a4f547215a9be593d75b836be6e07c2e5b61be9cf58a +size 205568034 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4b583effc7b0f65b09ebb1ff0a5298b155383792 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:18f66e2036dc806490837c0892a66e23298a122868febbd845c8b927d2a3a2b0 +size 205568151 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8effb3256c566d6360313dc0b10ef1cbe0321759 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9970c662c9290e20212ba2aff13677994dc1f2d1b275369fabb240792dc0e0d2 +size 205568151 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5e1f79e7696a4c3b761806f25130660601552968 --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37776bc4d20b66bb624e0859e7328e0175a5af3ef3a2756e9799dd3f07a71b52 +size 205568151 diff --git a/1b11b5400m/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/1b11b5400m/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f786826861d42d36071a91bf094b4a320f98c74a --- /dev/null +++ b/1b11b5400m/global_step2891/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0281851bb1c2cf28784269d23ae897ee20c00e4af62ea643c740702d5ef0a57 +size 205568151 diff --git a/1b11b5400m/global_step2891/layer_01-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c118cdefb6fdc46d16521ec86c1775c470de68d --- /dev/null +++ b/1b11b5400m/global_step2891/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67818612eda35a0629e249c4e32857c5fc2098e154dd00cfe0c2dc577d8707b6 +size 187630851 diff --git a/1b11b5400m/global_step2891/layer_03-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b4f18c2e61d9a82b58ca952000c06cca89198855 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f324732fc6d680910c5883071601ad423c3551f1b57b94d91a384b8a7437c64 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_04-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..50b0df331eb2db243f7f381c1b0f73e660f0d501 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ac3fa3126d56bc23d86c8588a5bb27050f6236e292dd5c77c510d4ff841dfff7 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_05-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e5aac2fe563b794ae9e1f361c9231f8791ebcab1 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35c41af51aee31420230ca8f6023c544821bc4af9f165f65c9b6179d94f0e343 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_06-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ce25597822904fc966135a7136dc44f2b12cdf18 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d07b9cd7f53e56669f226338decd721554adfbdd66353e9abaff30302917d63c +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_07-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6a3709a7ee2cec80d0b3c9f10fdccca17ed9ec93 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fab1171e13dc5293e8d0894902a0e094ccb6770b0fb960cf6982531e87c56b4b +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_08-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1ee031e0a15f0a7ede955c0eb6cbed4990ff33c --- /dev/null +++ b/1b11b5400m/global_step2891/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d5a649e0b82ee6292d471c747bddd69185bb83b01f937b3d59e8ec25b4b4482 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_09-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ec31552bb51725340731ef4e3be2f4693249ccca --- /dev/null +++ b/1b11b5400m/global_step2891/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:547a6500d0b7d3b9f60576432ca52f035a0ec1194205f0363bd2ed1e8d4219bb +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_10-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b9eeb8ae2181d462ea4f63dc77070f471d5c4167 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d2f35b19ad7424fa9ff83518935f173a3de0fff9ee17d4260d4b9c0cbe0c6279 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_11-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d727b6da99753a3d695e8865f0bc743ee9633ed7 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af61a13604a616fc1f6fb85e8f443487445fc761bfbb9cc535749d2d762e6322 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_12-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8375710a2ab15d881503bdff8facb2b11d26c63c --- /dev/null +++ b/1b11b5400m/global_step2891/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f61258a6d0900986b93ef8bc1a5b5fb05bc1a781a8122efb94d6bef9014f46f4 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_13-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..af2ad8a414f44c3634f1036fdee07858d3d049f6 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35aeece26d4121d07f263ebcbe06a16434de1a4012abcafe6926cd991706ea4d +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_14-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..423e706ee5a290fc72c71f4b3ecd8b2ac0660f08 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c1c5a92b832bc2130a626dfab1c8a69b3342b1a76e9042c5642907277b7e820 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_15-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6de5767042a28a85264f41cb47e3829779a7f78e --- /dev/null +++ b/1b11b5400m/global_step2891/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb0c392d6838f3887b8c6a6b49eb4248cbb1db2f67933f900e4d7ffc1f93cab9 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_16-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e067d92d8cdae49c587d0daf53964ea404308a5f --- /dev/null +++ b/1b11b5400m/global_step2891/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c0084e3e978e9e556cceb73945a288a2e13d0437d6e397dd4716fbfef58a181 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_17-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cebb73a87929aefe83f762787774a8e3cfb66965 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:39fee17039aad187ae918710f0b0399ef485c81fa8f07ccb0f750c761e6313a9 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_18-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a47e94f1c643067e561624007bb96e799a91d290 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:50e28995a52589b2ccb1d14b3c5ef705ea719eaf88042fad5e5bbef65102bcd5 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_19-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3747725ddf60068812ac14cde21dba1f717fa6f5 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9b65135d7e52bfeb8515b54ae56bfa561f48fd20a6ff85cd8bc93d942e6577cc +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_20-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1e98ba5887c444c81464408e89bcc06e7b690d18 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e61f0abe35d84ae2453b42541fecb21dfcce284ef01697155f52b31bdf5b7c0 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_21-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_21-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..84ab0e381fb715fd5ba685445b034178dce44b61 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_21-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:15c78949ebcb6fac3b890edc5c59f876479bae28fe74f0474c489335be0d961a +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_22-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b50ef5410b964dac86e32c1ed1a728fde837c5a5 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ec29171dc345316cdf044c9e83b09a26c05885d57570183a1cb59c5a60a5354 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_23-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_23-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c9fc5d2fca8f662a9a5b871bac7a1a0fe2c5c1a2 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_23-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43694d9cf8e7ecc6f691590a78533d35c5492b1a8a461f4dbb02dd991cc8b6f4 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_24-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_24-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..851beb49d836c2fd3c3a5820fc0f2b6a3d9c7fe2 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_24-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:29b5660ed269eedc7fe4bdc6e524b91975bb97bc6972e33f1e9db9e2e4b87d05 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_25-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_25-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1e08f505fdcbef771d9730fc51f2696168245066 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_25-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb1bac5d71dcea35aa4b196f73351eac52b9cd5baa374afc6b0039fc7516aa80 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_26-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_26-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fa48df13c4c4d05638108807539933b4b65e1c8b --- /dev/null +++ b/1b11b5400m/global_step2891/layer_26-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:330b2d35fcbb285aa8e8e4ea67c899c6d09517763444c1614c6aabf54d84d699 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_27-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_27-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fc23f0c9cb87197bdc10e9e176b0b90da2312f89 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_27-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b134953d9a2b84854a60707f30b031acc2080ce377322ca7949d79e18d0de5b +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_28-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_28-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7882ef9223b61a3a338a8deede7d56f1c2fbfed8 --- /dev/null +++ b/1b11b5400m/global_step2891/layer_28-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:964ec3a6e9e2b27389bf7994eea0708be5fac3324fc98d48f06915a3cfafb4f5 +size 77121283 diff --git a/1b11b5400m/global_step2891/layer_30-model_00-model_states.pt b/1b11b5400m/global_step2891/layer_30-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe531b8dbe8f41e57cb2f29ea4833385cc63e79a --- /dev/null +++ b/1b11b5400m/global_step2891/layer_30-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e58aa99767c65f7d0657b2bc0305afdb2219848bc2f95278b03abe53e1d2d19a +size 8387 diff --git a/1b11b5400m/global_step2891/mp_rank_00_model_states.pt b/1b11b5400m/global_step2891/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..19907f03a0b59f20372dffddc8a3e22e82503fa3 --- /dev/null +++ b/1b11b5400m/global_step2891/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5bcf2e0a95f80b565668af2a39a64977fae58f59064117c753d6c0cb4c7e67d5 +size 43827 diff --git a/1b11b5400m/sbatch_1b11b5400m.sh b/1b11b5400m/sbatch_1b11b5400m.sh new file mode 100644 index 0000000000000000000000000000000000000000..2b03c2b7775692e11df0056c235d54afee332d7b --- /dev/null +++ b/1b11b5400m/sbatch_1b11b5400m.sh @@ -0,0 +1,166 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b11b5400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1143M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 1516071000 +# -> Samples: 740269 +TRAIN_SAMPLES=740_269 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 7403 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b11b5400m/sbatch_1b11b5400mval.sh b/1b11b5400m/sbatch_1b11b5400mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..1a95b55ff6ae4108b890351f81476e74c126415c --- /dev/null +++ b/1b11b5400m/sbatch_1b11b5400mval.sh @@ -0,0 +1,171 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 12:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b11b5400mval +VARIANT_CKPT=1b11b5400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1143M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 1516071000 +# -> Samples: 740269 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-only true \ + --eval-iters 100 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b11b5400m/tensorboard_1b11b5400m/events.out.tfevents.1678910449.nid005733.11844.0 b/1b11b5400m/tensorboard_1b11b5400m/events.out.tfevents.1678910449.nid005733.11844.0 new file mode 100644 index 0000000000000000000000000000000000000000..53e2ee4cbf303b051fd4af4755ea534c755795c2 --- /dev/null +++ b/1b11b5400m/tensorboard_1b11b5400m/events.out.tfevents.1678910449.nid005733.11844.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1cc0bb588a1df04b22307641015d94f232c7889be1fc5fd03b6aa59cd2e403e0 +size 5153659 diff --git a/1b11b5400m/tensorboard_1b11b5400mval/events.out.tfevents.1678915809.nid006946.123298.0 b/1b11b5400m/tensorboard_1b11b5400mval/events.out.tfevents.1678915809.nid006946.123298.0 new file mode 100644 index 0000000000000000000000000000000000000000..a681b24e6a350baa984adca7ddf94b20835c8bf1 --- /dev/null +++ b/1b11b5400m/tensorboard_1b11b5400mval/events.out.tfevents.1678915809.nid006946.123298.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d58671c70adf8439f7db883106b4784baf0164faebedd0498febd367aba0c84b +size 980 diff --git a/1b58b81b5/eval.txt b/1b58b81b5/eval.txt new file mode 100644 index 0000000000000000000000000000000000000000..7f1aa2f9e461fd946bda7e2cc7b5deed2c404a8a --- /dev/null +++ b/1b58b81b5/eval.txt @@ -0,0 +1 @@ +3.070052E+00 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1d0626ba0aad83fa0a9453de77fc926bbb5f0516 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:01973736516f9f58e7224b3ef88b9aa4f03eaa06fd92d76e954ed74a8104d8d3 +size 71125719 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..87f9f2cbebd001dbc270dd9c7d0d99b399098431 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6276baec63377898b1f1ae59261f25c76558a94f13c13295187b4b9cff7effa8 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..45933d87ec1e0b6cd9821ecf1e41c27e41565ba1 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51a58ee16b9dd63506431a3073fcd0b0da2ce96bd104e81aa65c8bb337b98077 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..35692316f7ef256caf90921c40e666bd4e471518 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:01bf0b6e6bba9d542853dd0219e5eae7d5e3132bbcd4eab69babdd78b025bf0d +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e36e5c1976ad33dfa7b578b6d14935868eb19d99 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9d296a3844683c037b703038305db0362715c8977be521b0876f8cfe131b0225 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..491b2caba72301861235f5713f9c39b0d2a85a3d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a7e50f4633734c2d7cd387b7063a4cc15292152eade40670b331ca8ce98189c +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7cfca45b47679bbfac9aab1062e560b61137254c --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5ba5ca2a0d67a153c39ad09607544ceb1b39f50a029cebc9bcd11335e803254e +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..478426a3f5577003545a2766eb65206f98278258 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d6035a744442d9d688183e8cca2ededb40fe4751d606e07fb7483a297d8e12a6 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ccf473e337a578920ca4983b626fa36f2d57b2f8 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd9ef860b8cb0dd3c25cf10310cccc3bca1d6a87b190303b76cb9acc50759cff +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fd9e667406d24a9c07c0b41bcfc751497f205a33 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b67112a5a333143758afc40c7b9039837df56cac2eb6a1ef6130cf6ef06ee226 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c045aca9877c0e655712110d9fbb80601922ce9 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e304361e3033a9fd3c2933556823c41f51fe4f746112385970dc61f9fdb23051 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a67aaf8a7eaf27249fc99638138cb89c508138e9 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:880aac43f8cb569deefc7cb032cdb1aa24676e43b1246b0c3ab39441b7453b1d +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3450b766d1943b1fe300f60e1a2eb878cf3b181 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:90621451d9c03e37d016d1c31c6c4ae8613a34f726b8cfdc55fe3560176e967d +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..96d5ee2b347f0c232511b47accaa27acbb2df974 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1983785a368d18dfab21eadb25b861423281d8c029117f5e661961d75cff6f0b +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f799945ad75f66a6a2e0a2747b1d2644d7e41cb7 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d6ef8fd3e1304729845a597b3c0842c81843a014492a7bbce85b98e42e4b716 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..45cd99ce32ae7a3fab3bd05e1e89ae5b9a994d28 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0c2f9313002377f2a6fd1f6a9dc3ef4ba6bac3b567acaf2a1ea65e8fee3644c2 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eda7312bd9a89ad2f07c909b1c3cd22db662d224 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e63ae3b6da20f47835180dc6ce3b0886fa6e78986024f028e2c1e70b28efa1ba +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..548c4b45410be8ee6b816bf32af3933babe1e929 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb914a9bb6142e8a76d52410c9a8abe5914b202675fa5bbf1f37bf21d35e3f80 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..791b26c2ef5bab7537ef9af60d3a377ecbad0eeb --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2209a0d4b5cdd4431174f4248c680f38c734adbb2639867d967e3b72057b0451 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..db8f3823af3558ed73655e8f6bcfb31c50f5795f --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2641d6c86973b2a6b244b38b8d656c68b675930e9b6caab2d25436b9a9ebbc79 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ea5107a6c97bac6330c978ea0547a7253fdd2eed --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cdfdc715763fafa4eb76e69be9befee5509d982aada203bb1d9e029b9cf6f127 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d69562beac683dba53127f0b92ff5995b7b3bbff --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e383b719d588fe58e715618213f5d16cbe4ab14e6866f55bf868a8a875821558 +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6e763666470c7d747ba6851e26922bf3815552dd --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0309a611bdcd1ccdcc9e73b78ba83151757cdb57cf0172acc1adaa289956b805 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c209de817029b942b69146aaa5711c6e3c7e7e7f --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abc92711a3dbe6f81071fd8566bb3787782e86ce1bbc70e28273d6f46529dd35 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b8aaffcf29c7b3f43a09e2d52b7a23c7bd835ea5 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:58c5398f2f173352fa533dc21324930170816ebcdd62f09f25b6d546ce856baa +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..08922cde9d52665b00390302cdadc04fe40e0c46 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2888ecbd8e7bf996db53ec4c19fa62da99c6611d3203e580546267843bab2b92 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6c5e2c73d2c75825465aaf85f703b1dcb824fa7e --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:137cac9f3685887daa7bd46d98546ed45662e29b3aa0baf6817d9a0417d4351a +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4968f4958942933efd40d2b24d3f4dc3757b1ce6 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b2e34d7e77559b7437b8bef5208fe0cd9992fb905a11c996cea39e823255171 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4067cd1aa6855108ffd61864c33315ccaceb1fb8 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17365fc2ad2a06f5383ba1242852989b74a8614ba78fecdeab1f7bc9009ccd26 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f5e29926e0e92ca09d598bff5e5cf2c35af71a19 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:39ad96900a9147b3003fb18e63feccb38418fad26bcb3147d04cd940533a537e +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1ae8f3b33152c440c74376ad3dc6a69dda5bcc65 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1039e5bf4e93e272397f3ca7e4442232dd9ff044fd3740e368729323cf21a3f0 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5f27c224dee4e3e8a6e18149124ccc533d35a3f6 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:01299a516c450bd22f96649253701f3e225032eb5928427ff7f12e0e1a2da516 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..293c3e437b8187977fa4cbcc8cc7a6e2a1dfea82 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6346e7bca0faa8ff79ce3ce9d8f9e1919ea7144600402bd144d5a88441ca3785 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..711b5a675be96b0841af42c3d471093eb35ebc23 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34f1fa2021322c52aac0cb78782106d785db1c37ad3e6cc03e5d78e9524d6d91 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..75ac35badcff95d5021d23496566966f92510996 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a589c50f73c8b6b7dc5401b2582f69c5737bcb1e7016dd70dc5e4cbb46c69ac +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..beacd516b8a5932f8bd76fe16b1fd9a0de6d2f34 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85fdaf5af8c3e0af535c3aa16c40306328b9534a8317b9ba78e12e234c435c58 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3ee8d5d6cb16c432f4dd4b8742c51b8be13047a3 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d857e453def00a4a3278daf51829a267da205b837a84b54642f873610bdf674 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..37e1325d474af086b84215a74899559059e76e3d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5df20903f0d141d6710ee0fd1d1199e48280d1d1a797568f041c32c02ab930b4 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b3e57fbed6411b58a0f1fc1dc3e59c8242ebac9 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:38df60cd6dd3ae5c16d82e79d4ef544e31aac304a6a5da34290a7e0f5fd07673 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e2b500a91a0b9a531302e6790430e3b94893e91c --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a2122fd0910c27e0ff96160a6c4a7c26a9a66a794a41b1e4802f9cbae4b84a00 +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1ee33c5346591043c680ef7a6d376efb3c5929d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:754b92cab88983f6db2a2bb1fe5745ae877d2fb30af1de3693ff72dc34f2ad15 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a770ae6c87c1715d97484f221876bbf7233ac4d1 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4ee833196201b812ddc36eb9ddb7d51389c87a7ec44a6a6e4937d7c1e7af986 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9e65e76381701c7803803b2fcbda96aa31911e63 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9180384672f3101c10f033a4ec16b14566c986f91e9021f61c3b2a0a40daa69 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2e33e18835e3a4fa825fb4ba21d7c4db8821a221 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:059c695bff5e82315c9c5bd23c93ba05270289b823cb2612954290ab7ec1b742 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..926adeacb3d6114f033f4d09ccd41c0c28e68655 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d9ac6fa45a87aa1d302ec8bed93fcb608412955e39a449e54cf1d495ecd2723 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4c7aada2790f70ce1e014323b95dbc1d3b0bd39c --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06cf32cf0c701d42967f765aa577ec2663505f794fab70c59abdca183daba43e +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9bf5d2e8cbe072a4c5d7a773e035cfe906104b4f --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:25d35e2ae135088ed65945599e5d6a2a40a86976e1b3690ea0217185a4f93673 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1a09cba84295312a68c8dda6b7e0e8b226ce7dea --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:38b5081569006015fde5a8ba81833a38f07646404fcd81146748dffd2238df28 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5a3323234b917eb15496443ab1146fec3d87d6e7 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e892b0d87c2037fd841b35d44d59d088d90b57209bc5ade5473fb0fe196f829 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bc45b82cb6137dbd68f4d52a7ce42b52b8a65609 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d6013ff91679c370dd33c2b2746ef70c99e1b305f1161a58878999fbadc0c268 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7c77a030c028dff4b2605ade7b73a55d4667a07b --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d6d16cc7f1741a7737a8ee830e8e6c397964ddfc4fe89c8aaea461902c70c41 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4da31025d1f857daf75caa9d77d626cb3f2b3671 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a07cade6a02ae079036efb19e9fa56788694115d3ee4d7c663728f3d5e761db0 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ab13b2a2e8ecd0549ba461faaff6e58f299d1cba --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7bea8591c00d5a9cb3de3c6d530204fd187458f3c7ff3d8eb70a090c59177c17 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d994828b0da5175c16cace566037fc27682d152 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e6e4c7167a36542507c89ad64a3c77c4302007b6fadf279d8dca63911db0466 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a50a41c47f92c5973bb6d7e617a0e5ca5f8e33e3 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:99bce57e02e4715b0219b65df0fd6ca2051e184f42d279f8287ad5654cf42c60 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..19b6475ed28692786c719452b1fbfc204f0d57ad --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23da93443de1ace4297769d12e86156ed172e4d2c6c408f2e1518765ace9bf55 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b67aeff23d778b56fdc04a9c40b1d0d0aa30a4ca --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e744e9825586d882fcc5cd95ebaf1196da1f97dd7d1a2ad901b356cb9f65fb88 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5060559bf8f626bfc16915e8bda45916b4c1eaf6 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:766d1e4825fb6c371327a597483114edc8e92146fce081cd0542f94f77cffedc +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..93d093be4f59195daee91807770ca414d90dcbe6 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc4787a61dad58f28b9273ac0314d2198d009db606bf2adb7bfdedb1e916558e +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..71681cc80ea72eab014ae0d4630ee36a99ddb72e --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd26366fde330a9a32947c26db4ffad042c4a49887e38e03e28a54dfb3b848b1 +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5bf189e15dd1fe31d2e02eeff9e0c80efa09ad80 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66acc7dd12679701245ccbad9b1dbb9d5dbf81336fc0d7d16847ed6bf8520ceb +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..67f7e67aa9db554df58d1c37ddf8b6d08214a1be --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cae7cf355812935e3a517c576e260919913132c69335c6449d55738ec3502b4d +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c6a15a56d8b47bc68e320dfad958583694d4e056 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d814aa0887fa295a7d16102e3a586bdda6ea8eb91199a8d40d063a9f8f9d4c5 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..62f0b3caa851ca408152483c0e73907d30f5dffb --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:115cdce3945e25faed96acdb624e476b9a4db267bbbcf12011430c7f5f2c5a5f +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ced415716742fec627c5dc076d823128509e8859 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8bbb886a83ae1582881cab6604c62122b98bb00dfc47ae4f57b31cebf6dd8c88 +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fc94d0989b05329ccb50ae552962158bc1cc8e77 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b2002434055ef84f9270c13d1b7e1b7bddfb5820091ab3e7d215b20ba04866f +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3481c100ecb04c5b57aa67fdf80a4ed67803600 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85c7dc4bddeb8ff512a8a4691bf30025c1b3589b4b4b7fc86225bf0246780c25 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c2609058324b361c7844ba530bd68eea417da404 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4fcd08c6e7908803cc337fed307d1030b5401ec58dcc0505452b3fcfd5a13531 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c261408f60aecb16be36e786214348d308c7bddf --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed9bd718714b48f71a6f4e73452f380251d1994b03934a5d0bec18a1acc79f40 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..27103e19a0119df998e70befd247637f2b25878d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d31d9ee2ec636d6ed999c636ca9f9e1cf1f0d8332018a24d3619aebfa890dc2e +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..243aa9b859383fd5816e0f8efd62afa3f222dd79 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b98504871871522cd0a1acf7504f3fcd6b2bc7370cbd6abb8dc7796af1422c05 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d09c5165e9a624b4e9bc82e053752ee272de24e --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:916ac5d5ff92fe4eb041881e1c43546b5cb4a785888212806bc8d3dfe30fdafb +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3bd73b79e1ddb9326b099d9d99e39c69e29e489e --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00524f80fe89f57c8ecf9f2f9dc024b6e8d51ff33dc728e1e1fd3f6ad37850fe +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e520c58ea732ffd2439ef008df8976e7403b06bb --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1181ffbb552446faa96d8ee32392b2c1b39a4cf527ec4464195226fb48614186 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..700ea92d5391589f2454efe762610255c81600c6 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ab5ed77a4f01fa5710c63d099e946f0e155952b9941b8b735731bbf8bb82707 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..35dbefba29f1660fe78111c47bff30d4f55d4e4a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aa7c2fba37483d5e67515cc27ded091b200d763d69668fe2c040b69543be0bf9 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c57d21795c7b4a24b918983e8a8bfd7b8af02773 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:28c3f154612cf75bad34670b9c988889ff1e59464b48518220043aeb1118dc0c +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..03bc2a7493664d95533b1cfbf76b5598a9df3816 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6a046bf968b6e99ef1449a8544cab818cba1e728cafccb8c653fbc31ab005888 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a748267835f98a250548e1e013fe8bf1bacfb1a0 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b63b8fca3c8a29f2d25cb8aca666b59372cf3a02a4065796db74b3aad9e44f0f +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..28419dc501fbc75ce5d72e61d9221799e369a40a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf990933dc46de08128075172fb7bb16a5b70b7b29dda5e8cc3abef98dab3f9b +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..89bb826727528f139ee8e8d052cd9bbfded1b1e6 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7357a742fe48eb9cb0c1f24fe5d5e8521a6edfcdb2cbbedb4fa4d836ee0a1db6 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c26709429a04e9d8272a3fddd6878e9ba98ebe82 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:96225521b8d8429e840a844792e5f3a02e5422cad6473f642b3636e1652b9a0b +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..79f04492e8669f431c34ea0c3ef4cdaee4bd96c9 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ebf1d3433788451bf70d37e322553acfd8aae64e8f95460c4ccfdf2d1d4bde29 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4bca89bca56a0dcb8236c8f15433242b1c148a3b --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6a0e9ba10ce1911001f72e5754db8e0b7f77e2d58824695ff9f072c8668992e +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..40d0cd33a6b54c10b338f4eb8c7c8e05c7638ed1 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14c67fcebdf63e8e828dc7a832891476cdf03eb20ed3a014a78581188473bfb7 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eb861ae05494bb7150b76df252c5e63e10b42e86 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:07c9c11d90d2e6ca53145be1bb20ab227050e741da8b74773e7110e1d72e1fe6 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9bbda3cd0fd7e613d3fe37c46f20f3f117b75cc --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f6190ba73a7466a95d45a2debeb4f93260b2013b56215dae909361b47d0ae1b +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0c85542bf77ccb1a63be1d52779a1cbbcb08a25a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a09e6a5ace22eac25ff31847ce1ac730b3ed28a6ec78e87eaaeb4e14bfc8d85a +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bd92687b18547970e87ebbd51a5c9aa235b2610 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f2358da9743b1971df883d43eaf0f1c9c9d8abfce2d897d22103988a34094f74 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..298cd02e830f0e1ad38295b3073349670dd91421 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:861a80c8c44eeb1396a80cd3f78c27dfdebdcd3718d5240fe4807b43203817b4 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e140528e90224e7bf55a56ca4467a540bd97e31d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5f01d61907b016ff3612d4a19785836d7c73aeaf95084ed33bed5d6d0b6d8f02 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7564426079ca403fa1082c1a81513ec86c916cd5 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:350721cefaa73c96696f66b8e0f4778efc9bee4d7110cf21eff799236af6b357 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1491f8c7f184339a8911ecf64d1b040eef955f03 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:163f18db38affb3b09190705fc7e561a95c16c439976ca3c514761028b45b0a3 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..39efaf426033881ca56b1e2345fd5972c84259e8 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:995992aa804cda16442d178278d50edfd019912a1b8946c44ac09a510b41de3f +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7d06df649c5e1c02f90fa91e6081cd4fb880c5fe --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:845e4a1942deeaebe521ea7de3c636692ec2391cd89ae38ce9c2fed1a4e91d1f +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ad3200217418caa4cf0263ea51986767c7fed9d1 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b9f87ae9e95ec02540770ff3c699bf79caaaa84439a798b58e5928f8f9453e03 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7c4c6ca7fbf598e9da9f08b5179567823c6c8fa2 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0f12d0a24e5c505fddc46f0937eb1cf08afe595c3d7c58db9f288d8b6f04d7f +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b40ede0fdf54a8a9ff36bd349a9480b7df9d2539 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:245542a56f24d0516ea663308cf1682b8246a38d6a1623faad5fad900384d2ca +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee7492f636fb8105ca5c4aae9017c5bba532fb58 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d6b7568abad206cbf1c8a63bd7375222468295c2f2fe608da15084290beff8e +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f05422b8ca8058d325d7269bbd28d08352e443b6 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d007dd838062172d6e51dc219d06609b4e7bed860391b7bb75ee02283aa2849 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..886fdd31a02d345ac96dc950c5127e5872c284b4 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59ba44e57608b9d5b0b0dae85f16e3ee6d56bfb81b01492588bfac7d006bdda5 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9764b4606d5d899ac2b9cebde2196103ec98827b --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:25b7424336fa4b1174206bfc1350ec5df9684a69562576c70bb2cd09c5c14743 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ccd362396a834eb1dfef701ae49409a016f53399 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3b0b466f545ba895b6c203f4142bfc6fd98595a582a639f38bc61ae645b8be75 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..49f6c90c3111d4bdb117a1328311c89bd822bfe9 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0ee90bc0177877555dc1ab094e1e521751622eb22bc4c5848542960ef9bc6bf +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..577ba74cf98c31064414034ef451495d4b9b6fb8 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3a33fd46c5d0d7bcdac7bf1366eeaa324a2582c00d05d2ac5a863a53f085bd3 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f151045b40143445c8a1c7c4b7469d37140dcbe --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d39c4afb8be77684b25fd50e68b10e1bf0319af14fcbd1bdac6217399b0e3cdc +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..61e654d6d7d77ea8cde6ecb24ea086cfe0dd7d75 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2324f350618ec7fc889aa8290f4dd8dff73a116d0d2ecac2b27c771d3c18b2e2 +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..97b862fc1affa2503a1a73489c2c10e7fa34a4bf --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab58a5b8d6ca34f60a49bcb354db9397befe1626a9cf7047f0c07e4210462c13 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..16f9e563af13c9d87fd1f6f4837ffec97fd7637b --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c279bab8c38a168f2cc79f59c6d2893ab65e4e1efbaa6cf881874afe821613d +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..564b84d0b34f6dcb7573f66aee6f9bb168f002c7 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9bf5af42a1053b47d061ebdb6ac20bc9ad33953bfc64593aa8cc2e3f9d0bfcf +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..83dafe87775fa9b61e51d28f6ccdf13b94709fd6 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:28885e98f6bfef2af1e9164ce8a4cde5afd1aa05abcf9e01d1aafd26cc88ee84 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7d68ead357ceacb2c5fa017827fc26c0add9bcfc --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:46d19deb984b6100a694c6c6d81ea6c3857277d740d431d47354a8936d3e8247 +size 71125719 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..580fe3196632fa4aae75442d8ebd7bc9b7d9b144 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab9fbce7aac871ab8809675a0bb22dfc241036ca3912b802f8fef5b69abd0e3f +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f52f95d4e05678f7f5f371969f5b8ef7e20c2a79 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:95dfb15792e8ee527c51f42a652f828a966b81d86619cba782356e23aa66f432 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5eb8affe59ebfa184e98f191a66746be86678584 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72484643bbc27d7c493947830dd7b2cdafd062e114eeb0ecc6bd679505dbf934 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..872394d8f11efb7ce34fe1a8e219b2d7b429a0c0 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:973ba1807a8fbdff3970d748c270c1b3ca08639f129219fb7c7e6ed029a3658d +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c062b4350c89b829621bdeed8f19b3e5d11a411f --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:622a61042e256949b89a6d72c146339f154e7dba53b71327d34335ffc90100c7 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7e4cd4493289e99f8924d8862cb9a13fb3218ebf --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:575faba15b7cfe180ef69d0d4a74b81e99f9d14d11f01b0828035ec315faadee +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9fb6d02b6b529111537518359a98e2761c858fb7 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23e1b055967790c1b12cfbd2e8f0699aaffa5695471bafc2878e6204a17d486d +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..093520faff969ef1f9a2b92e12c405ac19de16dd --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:225e485e3927a4f9d64313a169732fe2c23d7be8a929884552ad6db7a39838b7 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b9a3d3f99a2358a33dbed49bd373900e12c42d9f --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8cc8bcf3d14f22b5b46ae9938cc8773e4192f765b47eeece16ba63721513627b +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..94a326ab53117d9b77946bd8d202e5e7f1370cd4 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b5184f2af95091a972968e4dcc4f00d0d8b132c9633397ffb7907724ed0d7c01 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c9ae514120dd68b4e317a3f3ee38cece383e9382 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0dc7e82f5f5a9fcb5dd5803b6c7f58e64c36cf7c1d8a2c7fcf91e991f345f017 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c3bc5628dc88195151b44e66692c19114c384f16 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f67c85ca205346a659e53e63c4ce7250878ba72dfe8d217d29a2f912d00509e +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..792cd5c7d625f16f417808ed7431d2c97df09f0a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfc4c411855eee2de01e07718a80962143c261688f9db7d67e927f44af3c40cc +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..49b34d26f08cf9b009741de91a9c843ae4dde902 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aac9f034d4fb0d55a713c72bab2157bc7faa103c3d8a92748d348bd920c022bd +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0378dafe619895ab767cef6efcbcd4dbb49eaa1 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7c4449a42ffc65437a7af4da3dbd528df12ad1583bb6b405c946bd216a681aa2 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..28197aa261e83ba40432d3d239c51bc530e9e415 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae57ca42119451f72d640df494f9135cd59cde494be0538cd475a36a5a0683f3 +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..37e9b9808ec37fa31377d7f89d760f972fc64063 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:919964aec627ad816240e68008765c0bf4e78d3a16dbf645d2fb73913d2ca2bd +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b3c01b37b1df7d42a8c141ffdf93f38450e53ee0 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b48cc451953fc29b06937740f8fafc57e6c758a3e00a2cd2d106b8952a44918a +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0522e718e00e4ce432a663c0670e33ef865e5ee5 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3206b51c43b5facc3b5efbb8e39c1f7444562efd051657b7643f9b8f36488f17 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f7f26c272c5dbee2781abe2069d115d4d888d104 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f312f38db97a9d36fdf68261b5ff60471b4b762f575d73c2e0ec887321080688 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..825c3e90cfe233cd4f060a7798de1191e7359e01 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd10a9707e14854459fdcea2cef73baa84ad0ec2b1837d8e836c6808083d1e71 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..60e2e15f8993989e25cdc0ede4b349068c3039bc --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:15f530c155a38143649c95cbc2d8e5550f0493d4312338a148a7ae52d6ad327e +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c7c51019d43683ed395ced051212fa685cdaa49 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:576feb0920e96999a36f10105005c99bb17d78a8d04e62c209d95791a32fb8b5 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..207153d5c182f00b7e6896a943152ab963ddd417 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35f0c49e43a0bf2427f7f3d476c2015cb1790872386603b77642145836969413 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c2eaf820f61f6e620a304b07fe259bc417308600 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69b1899ac6d7905408f647e15f250691e6dbe68af003f8bb056421cab64fbbd0 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0369234de1f478e7b883f6a0343ba01fb77cbc2c --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2b8cc85f33a2b342da19818e4af49de7c7c409778559cd23a64fe7d08dbda4a +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..21e7f9da043e7e59fd872544e82773b03b30731a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abc6dc81284658ee20531512c8977bad8dc6ddfab1ca848e00396f9ec0faee9e +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bbf1aa04214f9e2fe8f19b4731decb1c8270df7d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f5b68215f38a93fb01443b7ccb2f49488eca1abefeb63037cde2e94ab9b65b1 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76bf204796f5d99fd952d5abb0c05f00f3237779 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:49a4e91ff3f81989d162c15a46aa2bfae6aacc5ddfeb1412e9ee0cb6dac0a5cb +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eb5d8ef8fe9d5891eb447d88d8c463f37b8aeb3d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:279995b2cb40345740382031e0b88a57bf6d2d4c4ea04e2c3117101d6e62ab55 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1282539eee2def5e4421870b77ce26e72886538e --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:55cd419a4f58bac4ff8165632797ecb582f42f3148e06e2ceb2f44399751f76a +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f991fc552255cbf7568dd8a60757d0b481903265 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a40eef758d77a62899767c971043641e67ff37c49d95c7ebf0ad415ed806a18 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..351d452348a5750a6ef4c4968637865295e8c5c1 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:406431bc05fb56acc19f05fdf4bbba9c90db41b61ddb4cf56eb11a6d929f3070 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..563d54ca57df917ca9acdcfe6c4688df5a7d9e6b --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0939d0262398d0a4ecc47ff01406a03b678a8c2db5799f988f4efc9461ee682a +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fc4edf8ae432034e271147b480c1d9f3ac7b5970 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d061e264fdcec20dfa69cca106496048cdd62aec3f316de37e3b15a572e95b0 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0d1877562767db953212aade4abc7232c4a6c884 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14a010e1a00960951da7a96627ba81f93b30391483d3332de93a640efb1bde34 +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..182aa50716c51b6b4107bbdaed733bf6ae2a369f --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e6715a6553015d9d0a9eac5eb595ce7d2c56f07951c57b3bff16d993c91beb7c +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ed6ff404dca0d2be4421e6305de9ff9bbb8aaf2d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:955e37e9d193eab649ecca4f901ddea88d53485dbe174ef559fbee552ecdcf6d +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d4350f0f17045a856e696e2a93e3803613c9258 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6ed53d2597380f826a2ffeb3923e5f6b10411dc6397bed470000939555ebfe2 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a909dddebcaf095a8f5ce8b19f549976118e56aa --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:42e8ee7519f74fbdbaae57dc08697d1831e0c7f8cf32f3090ecd9583118a530e +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..45a86ead63166b5bf1d2e78e5f2724ea029ef7bc --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6e3e15526aa8340e864c8e4084e8d78e127873a4018703e5e477655176114160 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..71aed5379f6a2d6b38ea66fe1848ea208e17e060 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2c429cc737163e2bbbe7b370e33a5439680eb7820fdc09f6452315848bd4dc4c +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2f6b18f4474d6bbbd8d3eb2d7fc5519aed1a8424 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:024c9b8b5896e389d460e67c9db9f218ce0fee03d4a889fabf04279fc410025c +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ba8d383b505bbc83c3feb62901ec6cb0a0fa3f71 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf31662f778d7019ddcf1a151db62cd1e7d573fae6312f9aa5369cee4cb15908 +size 71125858 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cd8d8cc488e3a13b5012100d18f2a1d5b6283812 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b9574629e19c1d4a95049f7a9d66d04e4e342adca727543c75186f2994678b6 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6518601091ada7dcf04657551126a7369606d38f --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f13a12879c56f9ef356f60fb124c86a968770d92a8b80cf09b1595ab44c058c9 +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..89dc8285ea3e525a38c696281856c29643414bb0 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6df70c1375b507afd066eab233ae00e2e41c39e4c02a9a09984b62272d318c5f +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d9218a153127d5a2e617536a604b91cd3a8edc5 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54017ab1f5e5e77ca8e8ef7521f22be00112669cdb6d5366dfcbd1673f244061 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d76eef42c68d99f453331e93955ecc10f39f2ef9 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80ad2b53a45fafda94ecf5bcfe34fc4026f61481be156c9a30739d1045a5bdab +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..56beff5d4881c4d40c4be374fb27135127f4c70b --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a409ed6d818f5218516731f969796557016ef168cc419ef5b34af85da7cdeda1 +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0cf110515b1e89d28899ebb9b37e3219283ee873 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:98498fa930464b1e86e3508e0a5f813bff8b5fea60b234d61263aba432a818ce +size 71125805 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bcab1ae753154367535d6602a84ab0af65edaa1 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e3c1de2072d7c480ef596e8e634caeb6e121f1998c7b288e6afb788c237b7971 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dfdedd8e355b8c088d18fe818dc1b88cf358f69a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:76396af366b2691b20cd44e853d4042ead0c595aad8c50617df182e52986c1eb +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..22de65885c24939d496e27fe6184b45c881500af --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d4561bf37610b1587db75dc0e4b96614a0da90435a3eeee7512d41761b7dddc +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..59eebb0ca150b6a8f95ed95afb6e227eaa00420a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:42ece984cb94c33a2582504b98fdafa40478a6b12e7146fe8eb372adc17c41c6 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aedc6e91a5498167276bba8b9eb69b593103ddd2 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97f56ac3fdd171e0e2e5eba3869f783d17b78accefae477b31ad36306e29d269 +size 71125869 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c05c773c146eb5f87389695edac29c47da26202 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0808e1c1579621e0c6519bdfff2f50aea0898e55cd558718d8e2afe1f69cbcaf +size 71125741 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..74c90dd93b59392e3c42cbe046af0ee304b9de12 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:beb3602b1a81cad68d1b040f4b739d5b0d599b968b9726988064cf4bed7cbdad +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76a6651b85810a36531549f327465c5189f2d858 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e7f47123b4152ef42f6a765d0e504a6ad8f6660073290928d8b3ff85d6ad346 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..df6d41b8dbee9bced5819b8d8324b8ac79429f5a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d335ec60b2a45bf367aa723f662b3ee6172033da54a0732f06d8f1e6c96ff6a +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c61f6f344efc74379f29333edd806887252849e4 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abf650a7c5557f0fad8fe8d720de10511aacf341388aad32c27065396572ce82 +size 71125677 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f67506c11a2fd78432b3376bc3bfb40591b2cdda --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb9b5375b0737f0a744c2450fb2e2a93734911ab91ca8de9a4cc22ca664f606a +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc352384c71fc7fbdc8be9483a02948585c74f41 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:831b562cb963ffa19bc2036bb8781afdcb783758e0bc173d810d52281a0536f1 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..36c9df552ddfafd60ddd96f500a2313ab59e2f3b --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4278cb8d6b615c1a0d1540043398aed6e0b4963f39e885e8810457a154236b94 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6a1293bfc29d74827949c5894f7597f6d52955b9 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e687819a92fab13fd1d90b3b0da5db05c83d6605e6def6eb0864740e50c16af8 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bdbe8310d5cb26b995a82d4a12d42fe296549c7b --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f07fc0f63bfec16447304f4d445b35345525c98037c3ca10863e06d5e964e1c +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..076414247cdc8125ae7e0376cd94887c00313118 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1266d0a020867f9e7554737c0d8d5926e19996c6b32f279ab0591b6e0c6d518b +size 71125655 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6717b17ed92c17ac20b6648a7b8fe5eacc938992 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9fc30e98fe3d053ed9ffdd2201a77daf57e15a1c334c378fd6b9348b84f99f9 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4d26f8eead5e1f6b5a3ad519474a50ee17d11ece --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a24a66ae491b0f60d31a5421f39c40e928b0222852f4b5cea7288ab2695e7e88 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eacb8c85769b105b6b0dfc295c4473544a034ac2 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:582af41de7031280cf3850d91c7ed4a5a04aafa83d041af54b411287239869d2 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b280e7922cc32662cca9969ee60a882c2e768c56 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1bb90b887c4382b5554bf7defbe34d705e4a1241371271bd0c26c3da5ff935ba +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..087d5ca73d8cff9b5870f704288d8dabcdc348f4 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3bd995d585d91c6b04e7345c0441be65c203487237810dd423245b7e94e56de5 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ae0eb356488d6cdcda6b1b56289575502aaca612 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ecd6c679e8627c5e989224336b87cecd58bad1064b0c1d15b08931d94e9378df +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ea63ed89f35c75d5056864f31fd32236601f1eb4 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b31c0672dc9a50fbf935b7f2327ec2f6561bbac79baad1c114e9c5eccd68aee3 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3d4c084fdf996e0807f88c971d6aa9f821b456f7 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dc6ab1d13708c7da33dbeeb0d91af5179fc89a309a691dadc0022622eac9f71b +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4119d72f981b93830a74f20a2957fed848831df5 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:74c05f851de6e42d5dce373cd2db04802a185736e9895bfb0c526c50c08e7d4b +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..45fe5fe0c5828f6ac5f9108a61ee49dfbe609c51 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21285bf049febc694f9c2650eafdb814d288802fac8b7d5be3801bd65606c2ca +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4380c06a34fd8f1bb919a0e76b32296c142cb09 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7e938a18a076e4cc97aee52a96e4431ec3f693a864462bd761314df60909c89 +size 71125719 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57321ca6b521a869a4acd7f60b5fa648128fdeb5 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bf49f5559c580e3ac728bc3fbc5b01d47d553ba74f3b81a4f6f886d7a7eb01a +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..461629f6794c599d3aee8a2b06dd3451ad3b5f5f --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b78c962db6dd34343edebd4ca21a46f0f6ed6d64a2e35bf29abd5426d0700e23 +size 71125858 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..538491d0621009ba39ed37b77eadecb69602a677 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d2966d28009f1a37e86daaea52bcd871d392c8d4fd7656ad0a09632c2e236daf +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f6db6c2ca7cb094fba162316cf97f84d57aaeae9 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f005e2427c78f3f4049978e48a6d038c61996c722475276d7657db6595064ec4 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1c7325883bb92de834707e066367c23c246a0d43 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3cedffafbc36cc6553f9520593e9e5a81a2de834632ce5729e8cd61c47ffc272 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..844243bff11e9775f4c58f7c302ddd831f292295 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e8a62eb22ab68096f2080224aad272f26e5b9e0d6b959d6e6bdbb2d6e06418e +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..df42bdd05d170062e9594c41331fb492dc97dff2 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c0fba19944423206b671aef2e9485c4355a0769e120a37a1b6c2585617d6cf0 +size 71125858 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..580f49b8c179c19550914a4b9686b2b5d845e86d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab0e07964de574400f3f6ae77359521d601157a1f66a8e846c343fa34cb7dfca +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eab73891fc594617da56d9665bad3c960bedb6ab --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8bb43054ad4662a82e0d971aae2fa663a8deb9e893b9d164c38692dfbd011a43 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1762c11a6b68c27af4591b91de207a1f28548975 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:053cb1c1534caf81d2787179d68802c988959406283e5435d9cd9fed49452218 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..08d87985f5916e0c89db58a63a6f9d09f5e1d7e2 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1079ee6b31b5dec07737a6b9405127bf5f94f75ed6f12e3e521fbc218dc0eb65 +size 71125783 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1d7d91e33b335cbed3a21868783dad3d29d2c050 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce233226bf6abb654e4de56d5ffe38eee2dba5223682af9b403e3cfee57387d7 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1ab2f46e144273a0d6fa4f4323ddb7af04271a71 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c296a33b4afb23793cdd45f34470dd71651597b3223ca3edd9656101897fc55a +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..12f16d1ecf40339ace574bcedaf3ac5a658414ce --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c57fad0ca8dad78ebac7c1da6bbc8afaa374dd692762c7901102ae4bd49d2628 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..96bd6451fe70e878e2eca702f806e78a72f74ad4 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d30d9c4774e3562bb368ac3c04066b71df2fa2f159c49248aba8790f8e456e5d +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5d3ec1b50bd86a97380aa5ce738fc0dd3d166625 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:18a80d251bf10c22f6aee157368f010f3000b08d3f9c2c32b1b54ba6c5fef0db +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..54a3b5b548290ba5232d63f02a01134aa4c52a69 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:df4fcd90b900f01b3ebec5ea61d3d03773dfc3e8514d46833c08df58048ae442 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d93552d67183e08a3e1ccafadcbb592eee3a7737 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7a4d0d1b3cc1c66194d768351868e4b18d1cb151bacfbddf36b065200b4399be +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8ca9280c6229d16f0fcf693f441b661992cdeded --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a05772bda11a00259d936bf242f2a4d16b2060bf9186ad116cca2a604601f2b0 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f7ad76ead3601f9701151fd78ccbb6ea0bdb3f61 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33dec68fe7ae26604ccf0d3c44d3acd9c9d6f6b671dad04fa73edf53a5816e93 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5bf362f70d5a508dd2593bc09e840c14c26c94ad --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0540e18e424f8fab441b25551f9f3027399365db746307651b6879a47160d03 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9741dfbe43e9b30f051a08e7b7f0f326a1d2c715 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0c09abeed95daab4c1b7883ae434048f19a35b8119a288b1af27fb5b72d810a7 +size 71125783 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..35ec4ddf48e5be7e826e6b0c52c6dac3b40306b5 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e001943e0b7d7dc7ae5a6782e36ac4223086888c4b66ad71f5ccaeed49d72a38 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2423f60a84859b2e59a13922238c2be52abc014a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cdea88b53bfa8e201742059a1184e75a7e01ea61eb568681ffd83eb62883e7d1 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..53fe973e603931e656182648f21d25773da696ae --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9640cca0c85f836e4177894d278dcf1280a079f1b09c827a2357d4fa8bdf7716 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb6e18d9caef090a20bbc268b72cdea659086e6d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:116780b0aeb4b2bb51c4b42a36a064766cfbb3ab363ea13db11429346e040ae2 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..da641aa36eb4e1a36ffd498063554eada893110d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a36289d62efe5f06b8834ee375906d9f6f9e7c47b39059e999df318449a937de +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bdcab93c4cff97f8ccdd75d35e79e17b6306266c --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba04366dac9cefdb26b5a7484a5c05a534330073b0345063a5bbc601d9756dbd +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f067ac24547b1beccd07e5d9e7f490cafa356762 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8245f5e7e9098f3872e80e51bf3be8bc63f208b6b9d7d3fa1fc8dfb2a5e31bbf +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..420f834a234aa3f47cfdf97c48b01d6debc9d19e --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:037b1824fe47a3f8f0e76faa487769ec3a6793fb591119cd55d149504fdf6439 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..682178bf9953ce1c91b6a5bcc28b9b320af5075d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ec75f2a5d713af793b4ba3e463049744089be7b63cd667f679265ce7d3c97cb +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..89542807e09c42b7f8c56a14229bbaf79eb9567a --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2de470445fbaa3a5d9b2eebace54c35a1afe13dc9463fc08ea3408ad6bb990c +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e828a17ab1393099937c6e9fcac8740c3b081705 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:557906698d0c1bd2466e1f1e4f6abc32a6192ae3d0595d830a82661cd3ff5210 +size 71125655 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb6ae7ab976c646a143d1a626bc06e7f252aff5c --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d43348ca34384e2f04b27af21750dc121f2c3cb45e0dbfce772b8d7e4f5f143 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8424dde0c40144f98820d1ca68d429d1a5c61a1f --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bbaf06d8bfc36fb0aceb28a88e8dc975b8e1e5ea9889236cee4b5a2c20c63896 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0186c4753b33cf6f0badd6dfdb7dda48d60f56bb --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f40d92822705fd72af77cb4a8046d85737b125c7f2c87020d62b63f2f4efc16 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..709d128d1390a643627e2e190df90221e146967e --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6ebd1577b98f5b1e6d41b7f4700f871c11dbedd2d873aea7c8f2a1d235ee685c +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5a73530cd018ade34af63b6f3550cffb6865dde0 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d687dd09ab6da60c2c113fb8784ba715a46e33ac73539425b56b66b9273ca039 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3df1050e65cf49fb0ea23df87b977fc3729835da --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d2792821a807847b6f849ff0f008caa8d8e28b32275076e0867898049a72ed51 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0090efcc648b498242c0a58cc507c63cf07427e2 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5869874ea18122ad6f3ea7a05f406fd42e6d2f6e9eff1fc4e2ffcd3402377f3a +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80120a538d7183742289f6bf03baedb52b4eb754 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0a2ff5d2a182be2466d293a96e4cf36d8f28a2eaffc754a57c99ee7b6b015d52 +size 71125858 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b2a81a03b9e42bacefdf8769dc768ae5cc3c7c1c --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:300fd1d6c1cbe8f13c69e34d733df02c2e180d4c7286c4133a9c936f1f725edb +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..be50da8babfc5a2c9f05f4ac52ea47336c7c2ece --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db014dec1064154b3883931a67f90a26279a93f5dcef35fec00b6724d37742b2 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..669e63d2504f808759de831f631678cc07bf497c --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7884e046db85b4d30624a1123c7253010b8febea6fc7ac9fdd0f429c0bd2c644 +size 71125655 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b77f2b29e46b6b82d0f95270675657680d7fa103 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:839a02ef0e7ffe2cb6933a2bd5743a5de2fd75dcd512e00b543bde64823b756a +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..295a4df0c8a7f5e8a1fa5ffcd5b706c545e80eba --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e7fd3b428b01465d1fd40928e19f5958302301dbc51af6580dfc5bfa6139715 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fa220713a43aa5b5083f7403165edb20d7ce86da --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c6f65027987981a851a531249b7559cf52b2cac4ba6ff96fc32d48e922b1b30 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..274117d6dcfe7231b873d074ef8f17cd7b0cc7a8 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee39bb84aba79fbf715e23ec93585d761977a05408577406eaa26a05b265b411 +size 71125858 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe9906e01f9b30ead0530507e90b43568d06f5bf --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4fb5a735ed7292c7a98098d5da51933d00b947941436b3f9da54c2e19ea058a9 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4fd0ab76051818bf600cbe640afe2a9c42d78a5 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a718f0529b0f686d58c84d48710d26796bde725118efce67dd3e6907875d5008 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..60dc314697b8f7efb5509f7354ad18c1ccd9c86c --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:474416eea80d61f422854f551e3b195887e6d9a6ea57e1304347d10ba9816f73 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..587ca0875e855a6f5adee6bc6627a8bd7719e899 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:930bc5344be7807abced8b8a96b0607f3e486cd92808da56788d822ccc0b3442 +size 71125794 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..479ddd97be9ae5a5522ac31e7113c0244ea45617 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:314b99e236a4b8011ea6014bbfbbf4ed2791abb7db5cc6d0e4d463c3022c0e32 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0f94631b21158b099b86d7a7383c7afe543cf22 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:77db86165bd7f1c702ae67576f6c9102f764fa595d018e81d3f86aa84394926e +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3c169da80bf3b5d1806569846ed25967567ef507 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c75cae7d7be14be1d8f186e9987cba09e62aabf1d8ec73faa294fd351a21cacc +size 71125719 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bd293018bbb265ddfaa3c5d7f42d31cd526f4d8 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:049b476a5d957eee65ab57769aa0098c61d906e953a08d64c656151320ab2793 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..95b710049ed606b229e83bb631043459c77f5350 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a2cec5f14f5bdf4fc3a3ffb9ea2260faa13b82b03578b994385d2c457e213775 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bf1219844b97f3a63d17264b749e6108f7ceaab5 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e191969c9f3d587a9f0d68b8417f77e2dd5154779249cd3c768151498ca03c43 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0ec6c8bb0aed5013210291c5bf27d7f3ef4442b9 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:861aed770643a9eed3184b4bfdbc2f4c2ceae9bd9e635dbec3c16051b9b8b3b5 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..67c1476ea36f6277e5be3fb5b7d637c0df272b00 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b8e47ed086e8a23d82ff4fd692f852f0a6cf7eff82320f289798d20adee1729 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7b6742be8c41a1127d6b31dd44f91af59ea655b7 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92ca8dc163f13951c4e5be41e8e4c394a00edd7a70f4bf9104da0a081a9c606b +size 71125858 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3cf66911480e178bc4993bc7adffa292c389f37d --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:83d90bea0a92fb79a0160f64cc8b0fb1f03302df0d11fbbd9b22c1f7cda389a4 +size 71125730 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..42ae0822a0085635f82c26c2a9229f75512654be --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:da1335c0e3b669028f3cc5fbb691661dd49184caa359e6968a0b2d9c6c7d3494 +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..63a04ac7c2f61dad30576645765c86069253d605 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1e1e19ef2100f8093d4c35943f5f72aab918e0d6caf77dec0329b6e1736cb19a +size 71125666 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b4ef408050868ca1038aedb9436745fc2ef725ff --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6035270ffe663d3ccc572ca66fb512ed0270720067a0bc16e06b1c9a1f9cdcde +size 71125858 diff --git a/1b58b81b5/global_step16765/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/1b58b81b5/global_step16765/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..07229204c81105b1570b82f48734c2de530c5861 --- /dev/null +++ b/1b58b81b5/global_step16765/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:10c00e488185ba60b81a94113471f4492c99d667e7675589880a9fbddbe7704b +size 71125783 diff --git a/1b58b81b5/global_step16765/layer_01-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..db1a45f1a95817cf1b9e192afda9e15a4eb8030d --- /dev/null +++ b/1b58b81b5/global_step16765/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:102b4e34332befc024d43c590463e138b6f581f84d2ff08fd4e06759608266f0 +size 214435075 diff --git a/1b58b81b5/global_step16765/layer_03-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0d56443bf6694961708985230398d82e099ac415 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ec1c5bef596a612a377fa2f55251179d9956cb5fadba0dcc4681d4050b1012aa +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_04-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0037fc45212556ad1f883b748558bb0a5c4342bb --- /dev/null +++ b/1b58b81b5/global_step16765/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d128b4e8dd72efb5fb2c0ae42ed99a2cf41c2b099c1e9cb612b2b590af33f9e +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_05-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b19f0c4cb08acdd97a767ca0ff0418f88dce14f6 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:12dd551e1cd5cbbbba2733ac0170d24f13ed72b5098ce78058a368ae9b4290f8 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_06-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4cd1129b0aa6feba76d2bcb2e5f508b727f19f7c --- /dev/null +++ b/1b58b81b5/global_step16765/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:addd0315fb812483e63f7bf369f91285b54a4218447da3ba9de3c35bae63a642 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_07-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..70eb5dd5814733db10db9d33750113826757cd76 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2a13286274209656270fa803447136ded7a468e89ddddfd186e5dea816993faa +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_08-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..385da94d594592c615cbf6ec03a6308da20e2418 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b8be8234059646b4c53edfa11ec565c5d685312262c3d3c914d4f9addd492cdc +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_09-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..84482e41d75820ac1f25099ae60050e25dedbfb3 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:606e1bbf42365b2add3c6ee80f94ca1e5e55b485b63899c84dcdb6d9fc2a0eda +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_10-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cea4475e924f7cd8103df3b93c69bfa7827c1802 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ecd88ad715b312fe9d9c052d2aab075b65e2ba4d8ad85ca1b85822c95100dcd3 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_11-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e7d96dcf3cd60219fba3067cc1b12785a0bd6786 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0caadd3cf8af2ec32f47186a4d4a0287e5df66611b611cd59e828c5ea37afc28 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_12-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5bffbeae012fffd121bc29edda4b849c485c00ef --- /dev/null +++ b/1b58b81b5/global_step16765/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d574d7b36877d3371a97e223d4d9de0a38f011ed579eb8a23655f1f96b79535 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_13-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8ab3ecc5b781bad07c2427aa84d62a4537070bee --- /dev/null +++ b/1b58b81b5/global_step16765/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfd81d1fd393bb2068ffc96bed94a48292bcf3103fdef108c411df9420006b52 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_14-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..43779b9bb27f356434d6e3f09879962c5be179df --- /dev/null +++ b/1b58b81b5/global_step16765/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1bd09fd4b3a94702b2c4090ea7ccc9d790df7d1e889b35e1a97831c3f9c72afa +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_15-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4b548f32766d9ea6f019468d970d2f6611aef5fb --- /dev/null +++ b/1b58b81b5/global_step16765/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:136195883a872695c93d3a58471bf15d8be210dbcf1fff14420db8e0a6136816 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_16-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb6d8148aae80f3fe273b8c9624eca0449510d7d --- /dev/null +++ b/1b58b81b5/global_step16765/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe443cab3e8a84b896fafca0ed301f242e985730a5c0debb2091d1de538740ca +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_17-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6cdd4dbe127a2edb4fc72b3d3df8a3f7ee2c571c --- /dev/null +++ b/1b58b81b5/global_step16765/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c45f4aaa26940d0615df3b88a447d434e634ca1f2f0569852cc80c556ea845a +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_18-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f51143d914552141a975b58fbcf0b67c5adf64ad --- /dev/null +++ b/1b58b81b5/global_step16765/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48250e792deaa5c50d9c28688c701e765acaed808e4967e5e2b900f3945a3e38 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_19-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3b2fd8f923450d35f9dbfbffef0efb91848ba608 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:76de37ae93dc3fefffef4351f7fd3e734e8cd1de1f6990a6b9b9d8187379330f +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_20-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3a22e39b493ec434dd48965a6470a0ff21d7af67 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23fc84b0d80b77d53bc7e2157460949f5973d09cc50a5121ce93b8031777f506 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_21-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_21-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b637bdb87b5015ccb70e2605c7c2428ab8f45e95 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_21-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8a57b343f2ed7c84b15f7977a5bb82f179a27092d3b0071cedb1858a625941e7 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_22-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6979abbbbd6d0ad6be3f104bc28a2dc1329e2722 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51df93d84aefe8dc465b760ac79f623b7d58992aab2cc1e8e620bf3fe05cb24a +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_23-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_23-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..55a9a9b2ba0ef6f36b6208cbb5d74c9a49d36151 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_23-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:08e411961929e2df2a8431afa8203d2dce8d8bf9e558a7512ac9915dfdba8346 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_24-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_24-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..22085f1165ed153344400734e8d62eef8741f164 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_24-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:602b2d653d66b81e05fec469341141685640c84281501c9050ce62d8ed075866 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_25-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_25-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..555f956bff47d02860ec9bd49df8ffb983fc57b3 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_25-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66a4b4a6ea949b2bc3bba9809bfd439821340004a04122d68dffee4983f98af1 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_26-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_26-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ba43055db7f49856e2d0ed79cdb839ac4c870d62 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_26-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85ae71fd1096bbeef513b7d363dda303ffb2c93af367e71eaa379c96188157f6 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_27-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_27-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cbdd53688d683ef4801396936da157c98ac8e63a --- /dev/null +++ b/1b58b81b5/global_step16765/layer_27-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c1d4052c3d026f70cb31689416c56afba16bdfa7deeea1973fc3886c5ef62be2 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_28-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_28-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aa84117cf0bea273e36c2c57987a03f57815a234 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_28-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ba86ef9630cf7f3679a66de33c26e2dd6108e5ff798920c042a67f7e4989bb6 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_29-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_29-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5caec01caba15ee0c6cdede9c6da68f574073765 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_29-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f68be8d87d7e872355638a9cd807f5030058e636c1a652256f33201453bf513 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_30-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_30-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7508cc2b25aefecac30a97de4641b432e71e02da --- /dev/null +++ b/1b58b81b5/global_step16765/layer_30-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d6854e3d3b1c5bb86aabf09437de5ce2a0c90f41ac5bab66a849901c272f2423 +size 100720899 diff --git a/1b58b81b5/global_step16765/layer_32-model_00-model_states.pt b/1b58b81b5/global_step16765/layer_32-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9163ebd037877005e35289fd5c397c43b111b4e4 --- /dev/null +++ b/1b58b81b5/global_step16765/layer_32-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b1223e68ee180b2b12d6c6402a13cd0f158c1d7456dd877fbb666bc42ece220 +size 9411 diff --git a/1b58b81b5/global_step16765/mp_rank_00_model_states.pt b/1b58b81b5/global_step16765/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ad98eae03a35d93c4abb04a2acf0f297e82bb290 --- /dev/null +++ b/1b58b81b5/global_step16765/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b16087834700834bb8f59a972d25d920d52b499d03384d47be2babe25819cad9 +size 45363 diff --git a/1b58b81b5/sbatch_1b58b81b5.sh b/1b58b81b5/sbatch_1b58b81b5.sh new file mode 100644 index 0000000000000000000000000000000000000000..c6ecaf260545b0e46d05322bf632739e2df09e3e --- /dev/null +++ b/1b58b81b5/sbatch_1b58b81b5.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=32 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b58b81b5 + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train1b5.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=1 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1593M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 8790000000 +# -> Samples: 4291992 +TRAIN_SAMPLES=4_291_992 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 4_292 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 10000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b58b81b5/sbatch_1b58b81b5val.sh b/1b58b81b5/sbatch_1b58b81b5val.sh new file mode 100644 index 0000000000000000000000000000000000000000..a1adc58ff462ccfe85219f4a8e3e5d5344921bfa --- /dev/null +++ b/1b58b81b5/sbatch_1b58b81b5val.sh @@ -0,0 +1,168 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=32 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b58b81b5val +VARIANT_CKPT=1b58b81b5 + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train1b5.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_8B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=1 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1593M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 8790000000 +# -> Samples: 4291992 +TRAIN_SAMPLES=4_291_992 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 4_292 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b58b81b5/tensorboard_1b58b81b5/events.out.tfevents.1677455165.nid006628.15145.0 b/1b58b81b5/tensorboard_1b58b81b5/events.out.tfevents.1677455165.nid006628.15145.0 new file mode 100644 index 0000000000000000000000000000000000000000..48718bc8de9440d80b6d1d176ec21adcaaa124fb --- /dev/null +++ b/1b58b81b5/tensorboard_1b58b81b5/events.out.tfevents.1677455165.nid006628.15145.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23b2a359faf754688caa0295aaca7978d6cb9f7a5eb72a799851257cf7d20351 +size 40 diff --git a/1b58b81b5/tensorboard_1b58b81b5/events.out.tfevents.1677484350.nid006707.129185.0 b/1b58b81b5/tensorboard_1b58b81b5/events.out.tfevents.1677484350.nid006707.129185.0 new file mode 100644 index 0000000000000000000000000000000000000000..853079681733cddc50557dd9839956a57e3b3d26 --- /dev/null +++ b/1b58b81b5/tensorboard_1b58b81b5/events.out.tfevents.1677484350.nid006707.129185.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63c72ab45ca4d69cc54788dd227dfce616b3cf08b9c0e557f1aa93fba94c47db +size 29861022 diff --git a/1b58b81b5/tensorboard_1b58b81b5val/events.out.tfevents.1677500570.nid006220.70420.0 b/1b58b81b5/tensorboard_1b58b81b5val/events.out.tfevents.1677500570.nid006220.70420.0 new file mode 100644 index 0000000000000000000000000000000000000000..26f4839ec09b078c5cabd423dc3b87efa40c3a18 --- /dev/null +++ b/1b58b81b5/tensorboard_1b58b81b5val/events.out.tfevents.1677500570.nid006220.70420.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51a91fe5d999508c20e4a9a60c9e9818d46a62549bb6dc12c87a9f3f951237c9 +size 980 diff --git a/1b58b8400m/3319359.err b/1b58b8400m/3319359.err new file mode 100644 index 0000000000000000000000000000000000000000..32087a4a950b7110e5136215e3c30cd98cab2b69 --- /dev/null +++ b/1b58b8400m/3319359.err @@ -0,0 +1,2212 @@ + 5: 2023-03-16 09:04:29.558484: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:29.558492: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:29.558500: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:29.558495: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:29.558504: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:29.558500: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:29.558505: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 09:04:29.558504: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:29.606669: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:29.606672: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:29.606681: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:29.606680: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:29.606684: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:29.606687: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:29.606688: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 09:04:29.606674: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:29.615118: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:29.615107: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:29.615112: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: 2023-03-16 09:04:29.615092: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:29.615091: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:29.615088: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:29.615115: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:29.615109: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:29.615096: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:29.615082: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:29.615106: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:29.615106: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 09:04:29.615116: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: 2023-03-16 09:04:29.615100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:29.615096: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:29.615101: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:29.615788: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:29.615780: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:29.615795: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:29.615787: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:29.615778: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:29.615782: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:29.615792: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 09:04:29.615786: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:29.615942: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:29.615947: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:29.615942: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:29.615945: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:29.615952: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:29.615968: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:29.615965: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 09:04:29.615964: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:29.673287: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:29.673295: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:29.673306: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:29.673304: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:29.673312: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:29.673312: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:29.673303: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 09:04:29.673314: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:29.681796: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:29.681802: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:29.681805: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:29.681820: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:29.681816: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:29.681823: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:29.681826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 09:04:29.681821: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:29.681974: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:29.681971: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:29.681984: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:29.681970: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:29.681969: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:29.681990: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:29.681992: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 09:04:29.681983: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:29.747690: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:29.747697: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:29.747694: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:29.747705: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:29.747693: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:29.747701: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:29.747691: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 09:04:29.747691: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:29.747871: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:29.747887: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:29.747879: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:29.747885: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:29.747895: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:29.747898: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:29.747901: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 09:04:29.747895: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:29.748375: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:29.748384: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:29.748375: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:29.748387: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:29.748395: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:29.748393: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:29.748392: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 09:04:29.748396: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:29.748538: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:29.748549: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:29.748545: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:29.748544: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:29.748552: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:29.748547: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:29.748553: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 09:04:29.748541: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:29.813237: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:29.813242: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:29.813234: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: 2023-03-16 09:04:29.813270: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:29.813280: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:29.813276: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:29.813244: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:29.813251: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:29.813272: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:29.813282: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:29.813255: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:29.813241: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 09:04:29.813241: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: 2023-03-16 09:04:29.813282: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:29.813287: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 09:04:29.813289: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:29.916540: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:29.916538: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:29.916541: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:29.916551: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:29.916548: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:29.916541: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:29.916544: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 09:04:29.916544: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 09:04:31.411685: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:31.411688: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:31.411687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:31.411691: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:31.411699: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:31.411693: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:31.411694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:31.411690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:31.412132: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:31.412134: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:31.412137: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:31.412140: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:31.412142: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:31.412143: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:31.412145: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 09:04:31.412150: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:31.464284: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:31.464289: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:31.464283: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:31.464290: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:31.464297: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:31.464294: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:31.464296: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:31.464295: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:31.464749: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:31.464753: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:31.464757: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:31.464760: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:31.464759: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:31.464762: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:31.464763: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:31.464768: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:31.494538: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:31.494537: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:31.494549: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:31.494543: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:31.494548: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:31.494554: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:31.494555: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:31.494556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:31.494762: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:31.494762: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:31.494767: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:31.494768: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:31.494769: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:31.494772: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:31.494775: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 09:04:31.494778: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:31.495002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:31.495016: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:31.495008: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:31.495018: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:31.495007: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:31.495019: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:31.495020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:31.495018: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:31.495445: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:31.495448: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:31.495451: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:31.495452: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:31.495453: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:31.495455: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:31.495457: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 09:04:31.495461: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:31.499341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:31.499335: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:31.499345: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:31.499342: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:31.499346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:31.499350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:31.499364: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:31.499341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:31.499768: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:31.499779: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:31.499779: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:31.499784: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:31.499787: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:31.499789: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:31.499792: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 09:04:31.499794: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:31.508330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:31.508336: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:31.508345: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:31.508342: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:31.508345: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:31.508353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:31.508349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:31.508351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:31.508770: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:31.508772: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:31.508775: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:31.508777: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:31.508777: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:31.508779: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:31.508781: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 09:04:31.508783: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:31.508651: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:31.508656: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:31.508659: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:31.508665: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:31.508664: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:31.508662: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:31.508666: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:31.508670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:31.509092: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:31.509095: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:31.509096: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:31.509097: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:31.509099: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:31.509102: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:31.509105: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 09:04:31.509105: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:31.620285: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:31.620289: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:31.620282: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:31.620297: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:31.620297: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:31.620300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:31.620294: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:31.620300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:31.620723: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:31.620727: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:31.620730: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:31.620733: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:31.620737: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:31.620737: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:31.620740: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 09:04:31.620740: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:31.665820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:31.665823: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:31.665822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:31.665827: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:31.665832: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:31.665833: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:31.665838: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:31.665830: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:31.666197: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:31.666201: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:31.666204: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:31.666205: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:31.666203: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:31.666205: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:31.666210: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 09:04:31.666212: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:31.668963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:31.668967: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:31.668971: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:31.668976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:31.668976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:31.668974: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:31.668973: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:31.668976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:31.669418: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:31.669422: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:31.669426: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:31.669429: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:31.669430: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:31.669432: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:31.669433: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 09:04:31.669436: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:31.673316: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:31.673311: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:31.673326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:31.673324: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:31.673326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:31.673330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:31.673324: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:31.673323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:31.673536: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:31.673535: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:31.673542: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:31.673543: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:31.673543: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:31.673546: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:31.673549: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 09:04:31.673556: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:31.680480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:31.680481: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:31.680487: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:31.680495: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:31.680495: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:31.680499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:31.680686: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:31.680690: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:31.680691: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:31.680693: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:31.680695: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:31.680523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:31.680701: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:31.680522: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:31.680717: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 09:04:31.680721: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:31.683236: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:31.683244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:31.683253: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:31.683259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:31.683262: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:31.683247: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:31.683268: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:31.683272: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:31.683696: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:31.683700: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:31.683706: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:31.683707: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:31.683709: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:31.683711: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:31.683713: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 09:04:31.683713: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:31.726267: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:31.726269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:31.726279: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:31.726280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:31.726281: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:31.726278: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:31.726284: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:31.726282: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:31.726663: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:31.726666: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:31.726670: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:31.726671: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:31.726674: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:31.726678: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:31.726679: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 09:04:31.726684: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:31.728749: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:31.728748: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:31.728754: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:31.728763: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:31.728760: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:31.728767: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:31.728771: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:31.728774: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:31.729182: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:31.729184: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:31.729187: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:31.729189: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:31.729192: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:31.729193: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:31.729196: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 09:04:31.729198: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:31.732760: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:31.732759: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:31.732777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:31.732770: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:31.732768: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:31.732769: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:31.732769: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:31.732775: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:31.733167: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:31.733171: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:31.733170: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:31.733176: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:31.733176: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:31.733181: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:31.733180: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 09:04:31.733185: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 09:04:34.796484: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.796490: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.796495: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.796497: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.796499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.796503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.796505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.796508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.797183: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.797196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.797195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.797198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.797200: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.797192: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.797194: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.797192: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.798884: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.798886: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.798886: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.798892: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.798897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:34.798898: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:34.798895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.798897: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.798899: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.798901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 09:04:34.798905: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:34.798908: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:34.798915: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:34.798915: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:34.798916: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 09:04:34.798918: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:34.803854: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.803859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.803869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.803871: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.803869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.803869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.803865: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.803874: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.805750: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.805751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.805756: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.805757: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.805766: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:34.805767: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:34.805763: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.805762: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.805766: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.805774: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:34.805777: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:34.805784: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:34.805785: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:34.805786: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 09:04:34.805806: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 09:04:34.805825: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:34.798540: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.798542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.798542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.798546: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.798548: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.798549: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.798549: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.798558: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:34.798558: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:34.798559: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:34.798561: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:34.798564: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:34.798566: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:34.798567: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 09:04:34.798570: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 09:04:34.798584: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:34.931979: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:34.931972: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.931986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:34.931975: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.931984: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:34.931980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.931989: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:34.931989: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.931994: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:34.931985: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.931995: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:34.931983: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.932003: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:34.931990: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.932004: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 09:04:34.931992: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.932423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.932430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.932439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.932439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.932445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.932621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 09:04:34.932441: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.932448: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.932629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 09:04:34.932451: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.932630: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.932626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.932633: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.932640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.932636: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.932644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:34.933798: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:34.933803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:34.933803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:34.933805: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:34.933808: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:34.933814: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:34.933808: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:34.933819: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:34.933818: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:34.933823: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:34.933821: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:34.933824: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:34.933848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 09:04:34.934072: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:34.933851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 09:04:34.933861: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 09:04:34.933864: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:34.934074: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.934077: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.934077: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.934081: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.934082: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.934085: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:34.934087: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:34.934085: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.934094: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:34.934094: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:34.934096: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:34.934274: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 09:04:34.934099: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:34.934101: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 09:04:34.934139: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 09:04:34.934150: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.934274: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.934275: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.934277: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.934280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.934283: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.934290: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:34.934751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.934754: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.934754: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.934755: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.934759: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.934759: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.934758: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.934766: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:34.934764: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 09:04:34.934768: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:34.934769: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:34.934775: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:34.934774: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:34.934775: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:34.934777: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 09:04:34.934779: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:34.935568: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.935564: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.935572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.935577: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.935581: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.935577: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.935580: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.935585: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.937519: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.937519: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.937526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.937527: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.937533: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:34.937529: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.937533: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:34.937531: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.937533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.937544: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:34.937544: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:34.937546: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:34.937549: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:34.937551: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 09:04:34.937553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 09:04:34.937566: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:34.945427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.945442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.945438: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.945443: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.945441: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.945446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.945448: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.945451: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.946496: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.946502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.946505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.946509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.946505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.946512: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.946514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.946518: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.946972: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.946972: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.946981: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.946979: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.946984: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.946984: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.946979: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.946990: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947529: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 09:04:34.947639: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947528: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 09:04:34.947647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 09:04:34.947644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947534: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 09:04:34.947657: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 09:04:34.947650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947540: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:34.947540: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947543: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:34.947544: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:34.947546: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:34.947649: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: 2023-03-16 09:04:34.947548: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 09:04:34.947549: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947561: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 09:04:34.947651: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 09:04:34.947574: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:34.947653: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.948642: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.948645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.948647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.948647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.948651: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.948650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.948658: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:34.948660: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:34.948660: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:34.948662: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:34.948665: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:34.948666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:34.948905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: 2023-03-16 09:04:34.948716: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 09:04:34.948720: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.948906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: 2023-03-16 09:04:34.948730: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 09:04:34.948733: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.948907: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.948912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.948913: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.948915: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.948925: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:34.948924: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:34.948926: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:34.948928: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:34.948930: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:34.948931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:34.948937: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.948939: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 09:04:34.948951: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 09:04:34.948953: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:34.949515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:34.949517: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:34.949519: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:34.949519: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:34.949523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:34.949526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:34.949528: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:34.949528: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:34.949526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:34.949534: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:34.949537: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:34.949537: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:34.949540: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:34.949543: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 09:04:34.949541: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 09:04:34.949557: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:34.950031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.950027: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.950035: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.950039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.950037: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.950043: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.950039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.950045: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.950407: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.950418: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.950413: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.950417: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.950420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.950423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.950424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.950426: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.951687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.951688: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.951690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.951692: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.951694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.951692: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.951705: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:34.951704: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:34.951707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:34.951708: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:34.951709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:34.951709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:34.951730: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.951736: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 09:04:34.951746: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 09:04:34.951752: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:34.951973: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.951974: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.951977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.951979: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.951980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.951982: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.951996: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:34.951996: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:34.951998: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:34.951999: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:34.952000: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:34.952001: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:34.952023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.952037: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 09:04:34.952039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 09:04:34.952056: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:34.934291: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:34.934292: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:34.934294: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:34.934295: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:34.934298: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:34.934308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.934308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 09:04:34.934323: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 09:04:34.934324: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:35.092955: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.092958: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.092966: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.092969: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.092975: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.092971: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.092979: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.092981: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.094872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.094872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.094875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.094877: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.094875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.094875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.094874: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.094888: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:35.094889: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:35.094892: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:35.094895: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:35.094896: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:35.094896: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:35.094899: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 09:04:35.094912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 09:04:35.094926: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:35.139674: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.139676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.139681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.139689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.139684: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.139684: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.139690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.139697: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.141693: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.141696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.141698: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.141703: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.141707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:35.141704: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.141706: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.141709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:35.141705: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.141714: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:35.141718: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:35.141720: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:35.141722: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:35.141724: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 09:04:35.141766: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 09:04:35.141779: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module scaled_upper_triang_masked_softmax_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module scaled_upper_triang_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module scaled_masked_softmax_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module scaled_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module fused_mix_prec_layer_norm_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module fused_mix_prec_layer_norm_cuda... + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 2: Successfully preprocessed all matching files. + 2: Successfully preprocessed all matching files. + 2: Successfully preprocessed all matching files. +10: Successfully preprocessed all matching files. + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 0: Building extension module utils... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: + 3: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: + 7: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: +11: +11: +11: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: +12: +12: +12: +12: +12: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: +13: +13: +13: +13: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 0: Building extension module utils... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... + 5: Loading extension module utils... + 6: Loading extension module utils... + 8: Loading extension module utils... + 9: Loading extension module utils... +14: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 0: Loading extension module utils...Loading extension module utils... + 0: + 0: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Loading extension module utils... + 1: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 5: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 5: + 5: + 5: Loading extension module utils...Loading extension module utils... + 5: + 5: Loading extension module utils...Loading extension module utils... + 5: + 6: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 6: + 6: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 6: + 6: Loading extension module utils... + 3: Loading extension module utils... + 8: Loading extension module utils... + 3: Loading extension module utils... + 8: Loading extension module utils...Loading extension module utils... + 8: + 4: Loading extension module utils... + 3: Loading extension module utils... + 8: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 8: Loading extension module utils... + 8: + 8: + 3: Loading extension module utils... + 3: Loading extension module utils... + 4: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 3: Loading extension module utils... + 9: Loading extension module utils...Loading extension module utils... + 9: + 4: Loading extension module utils... + 4: Loading extension module utils... + 9: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 9: Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + 9: + 9: + 9: +10: Loading extension module utils... + 7: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... + 7: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... +10: Loading extension module utils... + 7: Loading extension module utils... +10: Loading extension module utils... + 7: Loading extension module utils... +10: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils...Loading extension module utils...Loading extension module utils... +14: +14: +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +12: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +12: Loading extension module utils... +12: Loading extension module utils... +12: Loading extension module utils... +12: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: + 0: +12: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 0: + 0: + 0: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 0: + 0: + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... +12: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +12: Loading extension module utils... +13: Loading extension module utils... +13: Loading extension module utils... +15: Loading extension module utils... +13: Loading extension module utils... +13: Loading extension module utils... +13: Loading extension module utils... +13: Loading extension module utils... +13: Loading extension module utils... +13: Loading extension module utils... + 0: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 1: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 8: + 8: Loading extension module utils...Loading extension module utils... + 8: + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step... +14: +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +14: +14: Loading extension module utils...Loading extension module utils... +14: +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 2: + 2: Loading extension module utils...Loading extension module utils... + 2: + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: + 1: Loading extension module utils... + 1: Loading extension module utils...Loading extension module utils... + 1: Loading extension module utils... + 1: + 1: + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: + 3: Loading extension module utils...Loading extension module utils... + 3: + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 3: + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 7: + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +10: +10: Loading extension module utils... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +11: +11: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +11: +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +12: +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +12: +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: +13: +13: Loading extension module utils...Loading extension module utils... +13: Loading extension module utils... +13: +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + 0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/1b58b8400m/3319359.out b/1b58b8400m/3319359.out new file mode 100644 index 0000000000000000000000000000000000000000..caaaa6b326e9de9191ade229328a99ee57e23cbc --- /dev/null +++ b/1b58b8400m/3319359.out @@ -0,0 +1,17457 @@ +Model parameters: d_model 2048 ffw_size 8192 kv_size 128 n_heads 16 n_layers 28 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 28 --hidden-size 2048 --num-attention-heads 16 --kv-channels 128 --ffn-hidden-size 8192 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 128 --train-samples 4_291_992 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-1b58b8400mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 4_291_992 --lr-warmup-samples 4_292 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 10000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_1b58b8400mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_1b58b8400m --load checkpoints_1b58b8400m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3319359.json --zero-stage 0 +START 3319359: Thu 16 Mar 2023 09:04:09 AM EET + 0: + 0: + 0: ======================= ROCm System Management Interface ======================= + 0: ================================= Concise Info ================================= + 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 0: 0 43.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 2 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 4 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 5 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 6 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: ================================================================================ + 0: ============================= End of ROCm SMI Log ============================== + 9: + 9: + 9: ======================= ROCm System Management Interface ======================= + 9: ================================= Concise Info ================================= + 9: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 9: 0 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 2 44.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 4 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 5 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 6 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 7 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: ================================================================================ + 9: ============================= End of ROCm SMI Log ============================== +10: +10: +10: ======================= ROCm System Management Interface ======================= +10: ================================= Concise Info ================================= +10: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +10: 0 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 2 40.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 3 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 4 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 6 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: ================================================================================ +10: ============================= End of ROCm SMI Log ============================== +12: +12: +12: ======================= ROCm System Management Interface ======================= +12: ================================= Concise Info ================================= +12: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +12: 0 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 2 41.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 4 44.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 6 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: ================================================================================ +12: ============================= End of ROCm SMI Log ============================== +13: +13: +13: ======================= ROCm System Management Interface ======================= +13: ================================= Concise Info ================================= +13: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +13: 0 41.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 2 42.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 4 39.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 6 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: ================================================================================ +13: ============================= End of ROCm SMI Log ============================== + 6: + 6: + 6: ======================= ROCm System Management Interface ======================= + 6: ================================= Concise Info ================================= + 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 6: 0 48.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 2 44.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 4 49.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 5 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 6 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: ================================================================================ + 6: ============================= End of ROCm SMI Log ============================== + 3: + 3: + 3: ======================= ROCm System Management Interface ======================= + 3: ================================= Concise Info ================================= + 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 3: 0 47.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 2 45.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 4 44.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 6 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: ================================================================================ + 3: ============================= End of ROCm SMI Log ============================== +14: +14: +14: ======================= ROCm System Management Interface ======================= +14: ================================= Concise Info ================================= +14: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +14: 0 49.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 2 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 3 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 4 44.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 6 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: ================================================================================ +14: ============================= End of ROCm SMI Log ============================== +15: +15: +15: ======================= ROCm System Management Interface ======================= +15: ================================= Concise Info ================================= +15: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +15: 0 39.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 4 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 5 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 6 44.0c 78.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: ================================================================================ +15: ============================= End of ROCm SMI Log ============================== + 7: + 7: + 7: ======================= ROCm System Management Interface ======================= + 7: ================================= Concise Info ================================= + 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 7: 0 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 2 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 4 42.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 6 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: ================================================================================ + 7: ============================= End of ROCm SMI Log ============================== + 8: + 8: + 8: ======================= ROCm System Management Interface ======================= + 8: ================================= Concise Info ================================= + 8: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 8: 0 47.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 2 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 3 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 4 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 6 41.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: ================================================================================ + 8: ============================= End of ROCm SMI Log ============================== + 4: + 4: + 4: ======================= ROCm System Management Interface ======================= + 4: ================================= Concise Info ================================= + 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 4: 0 42.0c 100.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 2 38.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 4 37.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 6 43.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: ================================================================================ + 4: ============================= End of ROCm SMI Log ============================== + 1: + 1: + 1: ======================= ROCm System Management Interface ======================= + 1: ================================= Concise Info ================================= + 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 1: 0 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 2 44.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 4 43.0c 99.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 6 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: ================================================================================ + 1: ============================= End of ROCm SMI Log ============================== +11: +11: +11: ======================= ROCm System Management Interface ======================= +11: ================================= Concise Info ================================= +11: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +11: 0 50.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 2 43.0c 99.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 3 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 4 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 6 45.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: ================================================================================ +11: ============================= End of ROCm SMI Log ============================== + 5: + 5: + 5: ======================= ROCm System Management Interface ======================= + 5: ================================= Concise Info ================================= + 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 5: 0 47.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 2 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 4 43.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 6 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: ================================================================================ + 5: ============================= End of ROCm SMI Log ============================== + 2: + 2: + 2: ======================= ROCm System Management Interface ======================= + 2: ================================= Concise Info ================================= + 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 2: 0 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 2 39.0c 79.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 4 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 6 39.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: ================================================================================ + 2: ============================= End of ROCm SMI Log ============================== + 3: Launching on nid005723 (3/16), master nid005720 port 9999, GPUs 8, CUDA: True +10: Launching on nid005730 (10/16), master nid005720 port 9999, GPUs 8, CUDA: True + 8: Launching on nid005728 (8/16), master nid005720 port 9999, GPUs 8, CUDA: True + 0: Launching on nid005720 (0/16), master nid005720 port 9999, GPUs 8, CUDA: True +13: Launching on nid005733 (13/16), master nid005720 port 9999, GPUs 8, CUDA: True +12: Launching on nid005732 (12/16), master nid005720 port 9999, GPUs 8, CUDA: True +14: Launching on nid005734 (14/16), master nid005720 port 9999, GPUs 8, CUDA: True +15: Launching on nid005735 (15/16), master nid005720 port 9999, GPUs 8, CUDA: True + 7: Launching on nid005727 (7/16), master nid005720 port 9999, GPUs 8, CUDA: True +11: Launching on nid005731 (11/16), master nid005720 port 9999, GPUs 8, CUDA: True + 2: Launching on nid005722 (2/16), master nid005720 port 9999, GPUs 8, CUDA: True + 9: Launching on nid005729 (9/16), master nid005720 port 9999, GPUs 8, CUDA: True + 1: Launching on nid005721 (1/16), master nid005720 port 9999, GPUs 8, CUDA: True + 6: Launching on nid005726 (6/16), master nid005720 port 9999, GPUs 8, CUDA: True + 4: Launching on nid005724 (4/16), master nid005720 port 9999, GPUs 8, CUDA: True + 5: Launching on nid005725 (5/16), master nid005720 port 9999, GPUs 8, CUDA: True + 0: using world size: 128, data-parallel-size: 128, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 + 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. + 0: using torch.bfloat16 for parameters ... + 0: ------------------------ arguments ------------------------ + 0: abort_on_unmet_fused_kernel_constraints ......... False + 0: accumulate_allreduce_grads_in_fp32 .............. True + 0: adam_beta1 ...................................... 0.9 + 0: adam_beta2 ...................................... 0.999 + 0: adam_eps ........................................ 1e-08 + 0: adlr_autoresume ................................. False + 0: adlr_autoresume_interval ........................ 1000 + 0: apply_query_key_layer_scaling ................... True + 0: apply_residual_connection_post_layernorm ........ False + 0: attention_dropout ............................... 0.1 + 0: attention_softmax_in_fp32 ....................... False + 0: bert_binary_head ................................ True + 0: bert_load ....................................... None + 0: bf16 ............................................ True + 0: bias_dropout_fusion ............................. True + 0: bias_gelu_fusion ................................ True + 0: biencoder_projection_dim ........................ 0 + 0: biencoder_shared_query_context_model ............ False + 0: block_data_path ................................. None + 0: checkpoint_activations .......................... False + 0: checkpoint_in_cpu ............................... False + 0: checkpoint_num_layers ........................... 1 + 0: clip_grad ....................................... 1.0 + 0: codecarbon_dir .................................. None + 0: consumed_train_samples .......................... 0 + 0: consumed_train_tokens ........................... 0 + 0: consumed_valid_samples .......................... 0 + 0: contigious_checkpointing ........................ False + 0: cpu_optimizer ................................... False + 0: cpu_torch_adam .................................. False + 0: curriculum_learning ............................. False + 0: data_impl ....................................... mmap + 0: data_parallel_size .............................. 128 + 0: data_path ....................................... None + 0: dataloader_type ................................. single + 0: DDP_impl ........................................ local + 0: decoder_seq_length .............................. None + 0: deepscale ....................................... False + 0: deepscale_config ................................ None + 0: deepspeed ....................................... True + 0: deepspeed_activation_checkpointing .............. False + 0: deepspeed_config ................................ ds_configs/3319359.json + 0: deepspeed_mpi ................................... False + 0: distribute_checkpointed_activations ............. False + 0: distributed_backend ............................. nccl + 0: embed_layernorm ................................. False + 0: embedding_path .................................. None + 0: encoder_seq_length .............................. 2048 + 0: eod_mask_loss ................................... False + 0: eval_interval ................................... 1 + 0: eval_iters ...................................... 100 + 0: eval_only ....................................... True + 0: evidence_data_path .............................. None + 0: exit_duration_in_mins ........................... None + 0: exit_interval ................................... None + 0: ffn_hidden_size ................................. 8192 + 0: finetune ........................................ False + 0: fp16 ............................................ False + 0: fp16_lm_cross_entropy ........................... False + 0: fp32_residual_connection ........................ False + 0: gigaflos_no_embeds .............................. 0 + 0: global_batch_size ............................... 128 + 0: glu_activation .................................. None + 0: hidden_dropout .................................. 0.1 + 0: hidden_size ..................................... 2048 + 0: hysteresis ...................................... 2 + 0: ict_head_size ................................... None + 0: ict_load ........................................ None + 0: img_dim ......................................... 224 + 0: indexer_batch_size .............................. 128 + 0: indexer_log_interval ............................ 1000 + 0: inference ....................................... False + 0: init_method_std ................................. 0.02 + 0: init_method_xavier_uniform ...................... False + 0: initial_loss_scale .............................. 4294967296 + 0: kill_switch_path ................................ kill-switch-1b58b8400mval + 0: kv_channels ..................................... 128 + 0: layer_norm_fusion ............................... True + 0: layernorm_epsilon ............................... 1e-05 + 0: lazy_mpu_init ................................... None + 0: load ............................................ checkpoints_1b58b8400m + 0: local_rank ...................................... None + 0: log_batch_size_to_tensorboard ................... True + 0: log_interval .................................... 10 + 0: log_learning_rate_to_tensorboard ................ True + 0: log_level ....................................... None + 0: log_level_replica ............................... None + 0: log_loss_scale_to_tensorboard ................... True + 0: log_num_zeros_in_grad ........................... False + 0: log_params_norm ................................. False + 0: log_path ........................................ None + 0: log_timers_to_tensorboard ....................... True + 0: log_validation_ppl_to_tensorboard ............... True + 0: loss_on_targets_only ............................ False + 0: loss_scale ...................................... None + 0: loss_scale_window ............................... 1000 + 0: lr .............................................. 0.0002 + 0: lr_decay_iters .................................. None + 0: lr_decay_samples ................................ 4291992 + 0: lr_decay_style .................................. cosine + 0: lr_decay_tokens ................................. None + 0: lr_warmup_fraction .............................. None + 0: lr_warmup_iters ................................. 0 + 0: lr_warmup_samples ............................... 4292 + 0: make_vocab_size_divisible_by .................... 128 + 0: mask_prob ....................................... 0.15 + 0: masked_softmax_fusion ........................... True + 0: max_position_embeddings ......................... 2048 + 0: mean_noise_span_length .......................... None + 0: memory_centric_tiled_linear ..................... False + 0: merge_file ...................................... gpt2/merges.txt + 0: micro_batch_size ................................ 1 + 0: min_loss_scale .................................. 1.0 + 0: min_lr .......................................... 2e-05 + 0: mmap_warmup ..................................... False + 0: no_load_optim ................................... True + 0: no_load_rng ..................................... None + 0: no_save_optim ................................... None + 0: no_save_rng ..................................... None + 0: noise_density ................................... None + 0: num_attention_heads ............................. 16 + 0: num_channels .................................... 3 + 0: num_classes ..................................... 1000 + 0: num_layers ...................................... 28 + 0: num_layers_per_virtual_pipeline_stage ........... None + 0: num_workers ..................................... 2 + 0: onnx_safe ....................................... None + 0: openai_gelu ..................................... False + 0: optimizer ....................................... adam + 0: optimizer_fusion ................................ True + 0: override_lr_scheduler ........................... True + 0: pad_vocab_size_to ............................... None + 0: params_dtype .................................... torch.bfloat16 + 0: partition_activations ........................... False + 0: patch_dim ....................................... 16 + 0: pipeline_model_parallel_size .................... 1 + 0: position_embedding_type ......................... PositionEmbeddingType.absolute + 0: pp_partition_method ............................. None + 0: profile_backward ................................ False + 0: query_in_block_prob ............................. 0.1 + 0: rampup_batch_size ............................... None + 0: rank ............................................ 0 + 0: remote_device ................................... none + 0: reset_attention_mask ............................ False + 0: reset_position_ids .............................. False + 0: reset_progress .................................. True + 0: retriever_report_topk_accuracies ................ [] + 0: retriever_score_scaling ......................... False + 0: retriever_seq_length ............................ 256 + 0: reweight_loss_based_on_position_frequency ....... False + 0: sample_rate ..................................... 1.0 + 0: save ............................................ checkpoints_1b58b8400m + 0: save_interval ................................... 10000 + 0: scatter_gather_tensors_in_pipeline .............. True + 0: scattered_embeddings ............................ False + 0: seed ............................................ 1234 + 0: seq_length ...................................... 2048 + 0: sgd_momentum .................................... 0.9 + 0: short_seq_prob .................................. 0.1 + 0: skip_train_iteration_range ...................... None + 0: split ........................................... None + 0: split_transformers .............................. False + 0: sync_tp_duplicated_parameters ................... False + 0: synchronize_each_layer .......................... False + 0: tensor_model_parallel_size ...................... 1 + 0: tensorboard_dir ................................. tensorboard_1b58b8400mval + 0: tensorboard_log_interval ........................ 1 + 0: tensorboard_queue_size .......................... 5 + 0: test_weighted_split_paths ....................... None + 0: test_weighted_split_paths_path .................. None + 0: tile_factor ..................................... 1 + 0: titles_data_path ................................ None + 0: tokenizer_name_or_path .......................... None + 0: tokenizer_type .................................. GPT2BPETokenizer + 0: train_iters ..................................... None + 0: train_samples ................................... 4291992 + 0: train_tokens .................................... None + 0: train_weighted_split_names ...................... ['train'] + 0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] + 0: train_weighted_split_paths_path ................. None + 0: train_weighted_split_splits ..................... [['0:1']] + 0: train_weighted_split_weights .................... [['1.0']] + 0: universal_checkpoint ............................ False + 0: use_bnb_optimizer ............................... False + 0: use_checkpoint_lr_scheduler ..................... False + 0: use_contiguous_buffers_in_ddp ................... True + 0: use_cpu_initialization .......................... None + 0: use_one_sent_docs ............................... False + 0: use_pin_memory .................................. False + 0: valid_num_workers ............................... 2 + 0: valid_weighted_split_names ...................... ['validation'] + 0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] + 0: valid_weighted_split_paths_path ................. None + 0: valid_weighted_split_splits ..................... [['0:1']] + 0: valid_weighted_split_weights .................... [['1.0']] + 0: virtual_pipeline_model_parallel_size ............ None + 0: vocab_extra_ids ................................. 0 + 0: vocab_file ...................................... gpt2/vocab.json + 0: weight_decay .................................... 0.1 + 0: world_size ...................................... 128 + 0: zero_allgather_bucket_size ...................... 0.0 + 0: zero_contigious_gradients ....................... False + 0: zero_reduce_bucket_size ......................... 0.0 + 0: zero_reduce_scatter ............................. False + 0: zero_stage ...................................... 0 + 0: -------------------- end of arguments --------------------- + 0: setting number of micro-batches to constant 1 + 0: > building GPT2BPETokenizer tokenizer ... + 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) + 0: DeepSpeed general environment info: + 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] + 0: torch version .................... 1.13.0+rocm5.2 + 0: torch cuda version ............... None + 0: torch hip version ................ 5.2.21151-afdc89f8 + 0: nvcc version ..................... None + 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] + 0: deepspeed info ................... 0.7.5, unknown, unknown + 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 + 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** + 0: > initializing torch distributed ... + 0: [2023-03-16 09:04:48,860] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +15: > setting tensorboard ... + 0: > initializing tensor model parallel with size 1 + 0: > initializing pipeline model parallel with size 1 + 0: > setting random seeds to 1234 ... + 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 + 0: > compiling dataset index builder ... + 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: make: Nothing to be done for 'default'. + 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: >>> done with dataset index builder. Compilation time: 0.096 seconds + 0: > compiling and loading fused kernels ... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 87 + 0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.o scaled_upper_triang_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 63 + 0: ninja: no work to do. + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 67 + 0: ninja: no work to do. + 0: >>> done with compiling and loading fused kernels. Compilation time: 23.049 seconds + 0: time to initialize megatron (seconds): 46.838 + 0: [after megatron is initialized] datetime: 2023-03-16 09:05:18 + 0: building GPT model ... + 0: [2023-03-16 09:05:18,981] [INFO] [utils.py:827:see_memory_usage] Before Building Model + 0: [2023-03-16 09:05:18,982] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB + 0: [2023-03-16 09:05:18,982] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.55 GB, percent = 6.1% + 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None + 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi + 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 + 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63, ProcessCoord(pipe=0, data=64, model=0): 64, ProcessCoord(pipe=0, data=65, model=0): 65, ProcessCoord(pipe=0, data=66, model=0): 66, ProcessCoord(pipe=0, data=67, model=0): 67, ProcessCoord(pipe=0, data=68, model=0): 68, ProcessCoord(pipe=0, data=69, model=0): + 0: 69, ProcessCoord(pipe=0, data=70, model=0): 70, ProcessCoord(pipe=0, data=71, model=0): 71, ProcessCoord(pipe=0, data=72, model=0): 72, ProcessCoord(pipe=0, data=73, model=0): 73, ProcessCoord(pipe=0, data=74, model=0): 74, ProcessCoord(pipe=0, data=75, model=0): 75, ProcessCoord(pipe=0, data=76, model=0): 76, ProcessCoord(pipe=0, data=77, model=0): 77, ProcessCoord(pipe=0, data=78, model=0): 78, ProcessCoord(pipe=0, data=79, model=0): 79, ProcessCoord(pipe=0, data=80, model=0): 80, ProcessCoord(pipe=0, data=81, model=0): 81, ProcessCoord(pipe=0, data=82, model=0): 82, ProcessCoord(pipe=0, data=83, model=0): 83, ProcessCoord(pipe=0, data=84, model=0): 84, ProcessCoord(pipe=0, data=85, model=0): 85, ProcessCoord(pipe=0, data=86, model=0): 86, ProcessCoord(pipe=0, data=87, model=0): 87, ProcessCoord(pipe=0, data=88, model=0): 88, ProcessCoord(pipe=0, data=89, model=0): 89, ProcessCoord(pipe=0, data=90, model=0): 90, ProcessCoord(pipe=0, data=91, model=0): 91, ProcessCoord(pipe=0, data=92, model=0): 92, Process + 0: Coord(pipe=0, data=93, model=0): 93, ProcessCoord(pipe=0, data=94, model=0): 94, ProcessCoord(pipe=0, data=95, model=0): 95, ProcessCoord(pipe=0, data=96, model=0): 96, ProcessCoord(pipe=0, data=97, model=0): 97, ProcessCoord(pipe=0, data=98, model=0): 98, ProcessCoord(pipe=0, data=99, model=0): 99, ProcessCoord(pipe=0, data=100, model=0): 100, ProcessCoord(pipe=0, data=101, model=0): 101, ProcessCoord(pipe=0, data=102, model=0): 102, ProcessCoord(pipe=0, data=103, model=0): 103, ProcessCoord(pipe=0, data=104, model=0): 104, ProcessCoord(pipe=0, data=105, model=0): 105, ProcessCoord(pipe=0, data=106, model=0): 106, ProcessCoord(pipe=0, data=107, model=0): 107, ProcessCoord(pipe=0, data=108, model=0): 108, ProcessCoord(pipe=0, data=109, model=0): 109, ProcessCoord(pipe=0, data=110, model=0): 110, ProcessCoord(pipe=0, data=111, model=0): 111, ProcessCoord(pipe=0, data=112, model=0): 112, ProcessCoord(pipe=0, data=113, model=0): 113, ProcessCoord(pipe=0, data=114, model=0): 114, ProcessCoord(pipe=0, data=115, mo + 0: del=0): 115, ProcessCoord(pipe=0, data=116, model=0): 116, ProcessCoord(pipe=0, data=117, model=0): 117, ProcessCoord(pipe=0, data=118, model=0): 118, ProcessCoord(pipe=0, data=119, model=0): 119, ProcessCoord(pipe=0, data=120, model=0): 120, ProcessCoord(pipe=0, data=121, model=0): 121, ProcessCoord(pipe=0, data=122, model=0): 122, ProcessCoord(pipe=0, data=123, model=0): 123, ProcessCoord(pipe=0, data=124, model=0): 124, ProcessCoord(pipe=0, data=125, model=0): 125, ProcessCoord(pipe=0, data=126, model=0): 126, ProcessCoord(pipe=0, data=127, model=0): 127} + 0: [2023-03-16 09:05:23,039] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer + 0: stage=0 layers=35 + 0: 0: _to_float16 + 0: 1: EmbeddingPipe + 0: 2: + 0: 3: ParallelTransformerLayerPipe + 0: 4: ParallelTransformerLayerPipe + 0: 5: ParallelTransformerLayerPipe + 0: 6: ParallelTransformerLayerPipe + 0: 7: ParallelTransformerLayerPipe + 0: 8: ParallelTransformerLayerPipe + 0: 9: ParallelTransformerLayerPipe + 0: 10: ParallelTransformerLayerPipe + 0: 11: ParallelTransformerLayerPipe + 0: 12: ParallelTransformerLayerPipe + 0: 13: ParallelTransformerLayerPipe + 0: 14: ParallelTransformerLayerPipe + 0: 15: ParallelTransformerLayerPipe + 0: 16: ParallelTransformerLayerPipe + 0: 17: ParallelTransformerLayerPipe + 0: 18: ParallelTransformerLayerPipe + 0: 19: ParallelTransformerLayerPipe + 0: 20: ParallelTransformerLayerPipe + 0: 21: ParallelTransformerLayerPipe + 0: 22: ParallelTransformerLayerPipe + 0: 23: ParallelTransformerLayerPipe + 0: 24: ParallelTransformerLayerPipe + 0: 25: ParallelTransformerLayerPipe + 0: 26: ParallelTransformerLayerPipe + 0: 27: ParallelTransformerLayerPipe + 0: 28: ParallelTransformerLayerPipe + 0: 29: ParallelTransformerLayerPipe + 0: 30: ParallelTransformerLayerPipe + 0: 31: undo + 0: 32: MixedFusedLayerNorm + 0: 33: EmbeddingPipe + 0: 34: float16_to_fp32 + 0: loss: CrossEntropy + 0: [2023-03-16 09:05:23,287] [INFO] [utils.py:827:see_memory_usage] After Building Model + 0: [2023-03-16 09:05:23,287] [INFO] [utils.py:828:see_memory_usage] MA 2.83 GB Max_MA 2.83 GB CA 2.89 GB Max_CA 3 GB + 0: [2023-03-16 09:05:23,287] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.59 GB, percent = 6.1% + 0: setting training iterations to 33531 + 0: > learning rate decay style: cosine + 0: DeepSpeed is enabled. + 0: [2023-03-16 09:05:23,290] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown + 0: [2023-03-16 09:05:39,362] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False + 0: [2023-03-16 09:05:39,362] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer + 0: [2023-03-16 09:05:39,362] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer + 0: [2023-03-16 09:05:39,377] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam + 0: [2023-03-16 09:05:39,378] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer + 0: [2023-03-16 09:05:39,494] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer + 0: [2023-03-16 09:05:39,495] [INFO] [utils.py:828:see_memory_usage] MA 2.83 GB Max_MA 2.84 GB CA 2.91 GB Max_CA 3 GB + 0: [2023-03-16 09:05:39,495] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.27 GB, percent = 6.2% + 0: ninja: no work to do. + 0: Time to load utils op: 0.1469104290008545 seconds + 0: ninja: no work to do. + 0: Time to load utils op: 0.13644981384277344 seconds +15: Time to load utils op: 0.10825943946838379 seconds +15: Time to load utils op: 0.10652971267700195 secondsTime to load utils op: 0.10720467567443848 seconds +15: +15: Time to load utils op: 0.10592794418334961 seconds +15: Time to load utils op: 0.10726785659790039 seconds + 5: Time to load utils op: 0.30956196784973145 seconds + 6: Time to load utils op: 0.3097412586212158 seconds + 9: Time to load utils op: 0.30931758880615234 seconds + 8: Time to load utils op: 0.312075138092041 seconds +14: Time to load utils op: 0.309873104095459 seconds + 0: Time to load utils op: 0.0005919933319091797 seconds + 0: Time to load utils op: 0.0007503032684326172 seconds + 9: Time to load utils op: 0.0005578994750976562 seconds + 6: Time to load utils op: 0.0004115104675292969 seconds + 5: Time to load utils op: 0.00046539306640625 seconds + 0: Time to load utils op: 0.2025601863861084 secondsTime to load utils op: 0.20277643203735352 seconds + 0: + 0: Time to load utils op: 0.20299005508422852 seconds + 0: Time to load utils op: 0.20305252075195312 seconds +15: Time to load utils op: 0.0004892349243164062 seconds +15: Time to load utils op: 0.0003685951232910156 seconds +15: Time to load utils op: 0.0005090236663818359 seconds +15: Time to load utils op: 0.00054168701171875 seconds +15: Time to load utils op: 0.0005810260772705078 seconds + 0: Time to load utils op: 0.20198273658752441 seconds +14: Time to load utils op: 0.00041961669921875 seconds + 8: Time to load utils op: 0.00041961669921875 seconds + 5: Time to load utils op: 0.20366978645324707 seconds + 5: Time to load utils op: 0.20387578010559082 seconds + 5: Time to load utils op: 0.2039175033569336 seconds + 5: Time to load utils op: 0.20395970344543457 seconds + 5: Time to load utils op: 0.20413756370544434 secondsTime to load utils op: 0.2041490077972412 seconds + 5: + 5: Time to load utils op: 0.2038741111755371 seconds + 6: Time to load utils op: 0.20389842987060547 seconds + 6: Time to load utils op: 0.20373940467834473 seconds + 6: Time to load utils op: 0.2037503719329834 seconds + 6: Time to load utils op: 0.20412540435791016 seconds + 6: Time to load utils op: 0.2041018009185791 secondsTime to load utils op: 0.20418071746826172 seconds + 6: + 6: Time to load utils op: 0.203826904296875 seconds + 1: Time to load utils op: 0.20976543426513672 seconds + 1: Time to load utils op: 0.2097465991973877 seconds + 1: Time to load utils op: 0.21004366874694824 secondsTime to load utils op: 0.2100207805633545 secondsTime to load utils op: 0.20980048179626465 seconds + 1: + 1: Time to load utils op: 0.20983457565307617 seconds + 1: Time to load utils op: 0.21049094200134277 seconds + 1: + 2: Time to load utils op: 0.20913290977478027 seconds + 2: Time to load utils op: 0.21046805381774902 seconds + 2: Time to load utils op: 0.20807552337646484 seconds + 2: Time to load utils op: 0.20897126197814941 secondsTime to load utils op: 0.21082258224487305 seconds + 2: Time to load utils op: 0.21015572547912598 seconds + 2: + 2: Time to load utils op: 0.20796704292297363 seconds + 2: Time to load utils op: 0.2081143856048584 seconds + 8: Time to load utils op: 0.20321893692016602 seconds + 8: Time to load utils op: 0.20291686058044434 seconds + 8: Time to load utils op: 0.203141450881958 secondsTime to load utils op: 0.20310282707214355 seconds + 8: + 8: Time to load utils op: 0.2031412124633789 seconds + 8: Time to load utils op: 0.2031562328338623 secondsTime to load utils op: 0.20316123962402344 seconds + 8: + 9: Time to load utils op: 0.20309066772460938 secondsTime to load utils op: 0.20342612266540527 seconds + 9: + 9: Time to load utils op: 0.20312809944152832 seconds + 9: Time to load utils op: 0.2033085823059082 seconds + 9: Time to load utils op: 0.20330810546875 seconds + 9: Time to load utils op: 0.20334696769714355 seconds + 9: Time to load utils op: 0.20334196090698242 seconds + 3: Time to load utils op: 0.2114427089691162 secondsTime to load utils op: 0.21145224571228027 seconds + 3: + 3: Time to load utils op: 0.21146607398986816 seconds + 3: Time to load utils op: 0.21147584915161133 seconds + 3: Time to load utils op: 0.2114877700805664 seconds + 3: Time to load utils op: 0.21150994300842285 secondsTime to load utils op: 0.21149945259094238 seconds + 3: + 3: Time to load utils op: 0.21150445938110352 seconds +14: Time to load utils op: 0.20390677452087402 secondsTime to load utils op: 0.20375847816467285 seconds +14: +14: Time to load utils op: 0.2038271427154541 seconds +14: Time to load utils op: 0.20375847816467285 seconds +14: Time to load utils op: 0.20344233512878418 secondsTime to load utils op: 0.20341253280639648 secondsTime to load utils op: 0.203416109085083 seconds +14: +14: + 0: Time to load utils op: 0.00045371055603027344 secondsTime to load utils op: 0.00046896934509277344 secondsTime to load utils op: 0.0004668235778808594 seconds + 0: + 0: + 0: Time to load utils op: 0.00040411949157714844 seconds + 0: Time to load utils op: 0.0004138946533203125 seconds +10: Time to load utils op: 0.21134424209594727 seconds +10: Time to load utils op: 0.21175765991210938 seconds + 7: Time to load utils op: 0.2116107940673828 secondsTime to load utils op: 0.2116832733154297 seconds + 7: +10: Time to load utils op: 0.21141409873962402 seconds +10: Time to load utils op: 0.2113499641418457 secondsTime to load utils op: 0.21140098571777344 secondsTime to load utils op: 0.2113490104675293 seconds +10: +10: +10: Time to load utils op: 0.2113971710205078 seconds + 7: Time to load utils op: 0.21169114112854004 seconds +10: Time to load utils op: 0.21135473251342773 seconds + 7: Time to load utils op: 0.21169734001159668 secondsTime to load utils op: 0.21169781684875488 seconds + 7: + 7: Time to load utils op: 0.21170568466186523 seconds + 7: Time to load utils op: 0.21170926094055176 seconds + 7: Time to load utils op: 0.2117137908935547 seconds +15: Time to load utils op: 0.20369338989257812 seconds +15: Time to load utils op: 0.20369505882263184 seconds +15: Time to load utils op: 0.2021496295928955 seconds + 0: Time to load utils op: 0.30265021324157715 seconds +11: Time to load utils op: 0.21156978607177734 secondsTime to load utils op: 0.21156954765319824 secondsTime to load utils op: 0.21157383918762207 secondsTime to load utils op: 0.21157526969909668 seconds +11: +11: +11: +11: Time to load utils op: 0.21158981323242188 secondsTime to load utils op: 0.2115788459777832 seconds +11: +11: Time to load utils op: 0.21158909797668457 secondsTime to load utils op: 0.21158385276794434 seconds +11: +12: Time to load utils op: 0.21120023727416992 seconds +12: Time to load utils op: 0.21121573448181152 seconds +12: Time to load utils op: 0.2112255096435547 secondsTime to load utils op: 0.2112271785736084 secondsTime to load utils op: 0.21122527122497559 seconds +12: +12: +12: Time to load utils op: 0.21123528480529785 seconds +12: Time to load utils op: 0.21125555038452148 seconds +12: Time to load utils op: 0.21125388145446777 seconds + 5: Time to load utils op: 0.00034999847412109375 seconds + 5: Time to load utils op: 0.0003044605255126953 seconds + 5: Time to load utils op: 0.00042176246643066406 seconds + 6: Time to load utils op: 0.00041294097900390625 seconds + 6: Time to load utils op: 0.00037550926208496094 seconds + 6: Time to load utils op: 0.0003559589385986328 seconds + 6: Time to load utils op: 0.00032138824462890625 seconds + 6: Time to load utils op: 0.00035858154296875 seconds + 5: Time to load utils op: 0.00034427642822265625 seconds + 5: Time to load utils op: 0.00038814544677734375 seconds + 6: Time to load utils op: 0.0003361701965332031 seconds + 5: Time to load utils op: 0.0004067420959472656 seconds + 5: Time to load utils op: 0.00037550926208496094 seconds + 6: Time to load utils op: 0.00038361549377441406 seconds +13: Time to load utils op: 0.21175289154052734 secondsTime to load utils op: 0.2117595672607422 seconds +13: +13: Time to load utils op: 0.21176648139953613 seconds +13: Time to load utils op: 0.21177268028259277 seconds +13: Time to load utils op: 0.21178841590881348 secondsTime to load utils op: 0.21178793907165527 seconds +13: +13: Time to load utils op: 0.21179628372192383 seconds +13: Time to load utils op: 0.2116837501525879 seconds + 8: Time to load utils op: 0.0003409385681152344 seconds + 8: Time to load utils op: 0.00032639503479003906 seconds + 9: Time to load utils op: 0.00038504600524902344 seconds + 8: Time to load utils op: 0.0003457069396972656 seconds + 9: Time to load utils op: 0.0004527568817138672 seconds + 8: Time to load utils op: 0.0003936290740966797 secondsTime to load utils op: 0.0004029273986816406 seconds + 8: + 8: Time to load utils op: 0.0003600120544433594 seconds + 1: Time to load utils op: 0.4033975601196289 seconds + 8: Time to load utils op: 0.0003199577331542969 seconds + 9: Time to load utils op: 0.00036406517028808594 seconds + 9: Time to load utils op: 0.0003933906555175781 seconds + 9: Time to load utils op: 0.0003807544708251953 seconds + 9: Time to load utils op: 0.0003809928894042969 seconds + 9: Time to load utils op: 0.0003802776336669922 seconds + 4: Time to load utils op: 0.23303604125976562 seconds + 4: Time to load utils op: 0.2330489158630371 seconds + 4: Time to load utils op: 0.23306536674499512 seconds + 4: Time to load utils op: 0.233079195022583 secondsTime to load utils op: 0.2330772876739502 seconds + 4: + 4: Time to load utils op: 0.23308897018432617 seconds + 4: Time to load utils op: 0.2331066131591797 seconds + 4: Time to load utils op: 0.2331082820892334 seconds +14: Time to load utils op: 0.0003933906555175781 seconds +14: Time to load utils op: 0.0003285408020019531 seconds +14: Time to load utils op: 0.0003573894500732422 seconds +14: Time to load utils op: 0.0003523826599121094 seconds +14: Time to load utils op: 0.00036072731018066406 secondsTime to load utils op: 0.00036334991455078125 seconds +14: +14: Time to load utils op: 0.0003440380096435547 seconds +15: Time to load utils op: 0.0003771781921386719 seconds +15: Time to load utils op: 0.0003504753112792969 seconds +15: Time to load utils op: 0.0003445148468017578 seconds + 2: Time to load utils op: 0.00048828125 seconds + 2: Time to load utils op: 0.0005171298980712891 seconds + 2: Time to load utils op: 0.0005486011505126953 seconds + 2: Time to load utils op: 0.0005638599395751953 seconds + 2: Time to load utils op: 0.0005538463592529297 seconds + 2: Time to load utils op: 0.0005571842193603516 secondsTime to load utils op: 0.0005209445953369141 seconds + 2: + 2: Time to load utils op: 0.0005178451538085938 seconds + 1: Time to load utils op: 0.0004889965057373047 seconds + 1: Time to load utils op: 0.000518798828125 secondsTime to load utils op: 0.0003914833068847656 seconds + 1: + 1: Time to load utils op: 0.00041103363037109375 seconds + 1: Time to load utils op: 0.0004374980926513672 seconds + 1: Time to load utils op: 0.0004782676696777344 seconds + 1: Time to load utils op: 0.0005059242248535156 seconds + 1: Time to load utils op: 0.0005140304565429688 seconds + 3: Time to load utils op: 0.0008232593536376953 seconds + 3: Time to load utils op: 0.0010695457458496094 seconds + 3: Time to load utils op: 0.0012254714965820312 secondsTime to load utils op: 0.0011649131774902344 seconds + 3: + 3: Time to load utils op: 0.0011379718780517578 secondsTime to load utils op: 0.0011222362518310547 seconds + 3: + 3: Time to load utils op: 0.001135110855102539 seconds + 3: Time to load utils op: 0.0011906623840332031 seconds + 7: Time to load utils op: 0.0008587837219238281 seconds + 7: Time to load utils op: 0.0008170604705810547 seconds + 7: Time to load utils op: 0.0008397102355957031 seconds + 7: Time to load utils op: 0.0008046627044677734 seconds + 7: Time to load utils op: 0.0008914470672607422 seconds + 7: Time to load utils op: 0.0008885860443115234 secondsTime to load utils op: 0.0008847713470458984 seconds + 7: + 7: Time to load utils op: 0.0009920597076416016 seconds +10: Time to load utils op: 0.0007531642913818359 seconds +10: Time to load utils op: 0.0008761882781982422 seconds +10: Time to load utils op: 0.0009672641754150391 seconds +10: Time to load utils op: 0.001035451889038086 seconds +10: Time to load utils op: 0.0009779930114746094 seconds +10: Time to load utils op: 0.0010673999786376953 seconds +10: Time to load utils op: 0.0011646747589111328 seconds +11: Time to load utils op: 0.0008292198181152344 seconds +10: Time to load utils op: 0.001056671142578125 seconds +11: Time to load utils op: 0.0009214878082275391 secondsTime to load utils op: 0.0009603500366210938 secondsTime to load utils op: 0.0009791851043701172 secondsTime to load utils op: 0.0009567737579345703 seconds +11: +11: +11: +11: Time to load utils op: 0.0010492801666259766 seconds +11: Time to load utils op: 0.0010128021240234375 seconds +11: Time to load utils op: 0.0009970664978027344 seconds +12: Time to load utils op: 0.0011510848999023438 seconds +12: Time to load utils op: 0.0012018680572509766 seconds +13: Time to load utils op: 0.0007646083831787109 seconds +12: Time to load utils op: 0.0013422966003417969 secondsTime to load utils op: 0.0012888908386230469 seconds +12: +12: Time to load utils op: 0.0013551712036132812 seconds +12: Time to load utils op: 0.0012786388397216797 seconds +12: Time to load utils op: 0.001373291015625 seconds +12: Time to load utils op: 0.0013914108276367188 seconds +13: Time to load utils op: 0.000972747802734375 seconds +13: Time to load utils op: 0.0011589527130126953 seconds +13: Time to load utils op: 0.0011568069458007812 seconds +13: Time to load utils op: 0.0011355876922607422 secondsTime to load utils op: 0.0011358261108398438 seconds +13: +13: Time to load utils op: 0.0010726451873779297 seconds +13: Time to load utils op: 0.0011599063873291016 seconds + 4: Time to load utils op: 0.0009105205535888672 secondsTime to load utils op: 0.0008075237274169922 seconds + 4: + 4: Time to load utils op: 0.0009477138519287109 seconds + 4: Time to load utils op: 0.0010039806365966797 seconds + 4: Time to load utils op: 0.0010688304901123047 seconds + 4: Time to load utils op: 0.0011992454528808594 seconds + 4: Time to load utils op: 0.0012059211730957031 seconds + 4: Time to load utils op: 0.001310110092163086 seconds + 0: [2023-03-16 09:05:39,905] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 + 0: [2023-03-16 09:05:39,906] [INFO] [utils.py:828:see_memory_usage] MA 2.83 GB Max_MA 2.83 GB CA 2.91 GB Max_CA 3 GB + 0: [2023-03-16 09:05:39,906] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% + 0: [2023-03-16 09:05:40,019] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 + 0: [2023-03-16 09:05:40,020] [INFO] [utils.py:828:see_memory_usage] MA 5.81 GB Max_MA 5.81 GB CA 7.36 GB Max_CA 7 GB + 0: [2023-03-16 09:05:40,020] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% + 0: [2023-03-16 09:05:40,122] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 + 0: [2023-03-16 09:05:40,122] [INFO] [utils.py:828:see_memory_usage] MA 5.81 GB Max_MA 5.81 GB CA 7.36 GB Max_CA 7 GB + 0: [2023-03-16 09:05:40,122] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% + 0: [2023-03-16 09:05:40,226] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 + 0: [2023-03-16 09:05:40,226] [INFO] [utils.py:828:see_memory_usage] MA 8.52 GB Max_MA 8.52 GB CA 11.39 GB Max_CA 11 GB + 0: [2023-03-16 09:05:40,226] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% + 0: [2023-03-16 09:05:40,327] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 + 0: [2023-03-16 09:05:40,328] [INFO] [utils.py:828:see_memory_usage] MA 8.52 GB Max_MA 8.52 GB CA 11.39 GB Max_CA 11 GB + 0: [2023-03-16 09:05:40,328] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% + 0: [2023-03-16 09:05:40,434] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 + 0: [2023-03-16 09:05:40,434] [INFO] [utils.py:828:see_memory_usage] MA 8.52 GB Max_MA 8.52 GB CA 11.39 GB Max_CA 11 GB + 0: [2023-03-16 09:05:40,434] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% + 0: [2023-03-16 09:05:40,535] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer + 0: [2023-03-16 09:05:40,535] [INFO] [utils.py:828:see_memory_usage] MA 8.52 GB Max_MA 8.52 GB CA 11.39 GB Max_CA 11 GB + 0: [2023-03-16 09:05:40,535] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% + 0: [2023-03-16 09:05:40,641] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer + 0: [2023-03-16 09:05:40,642] [INFO] [utils.py:828:see_memory_usage] MA 8.61 GB Max_MA 8.61 GB CA 11.39 GB Max_CA 11 GB + 0: [2023-03-16 09:05:40,642] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% + 0: [2023-03-16 09:05:40,743] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer + 0: [2023-03-16 09:05:40,744] [INFO] [utils.py:828:see_memory_usage] MA 8.61 GB Max_MA 8.61 GB CA 11.39 GB Max_CA 11 GB + 0: [2023-03-16 09:05:40,744] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% + 0: [2023-03-16 09:05:40,744] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam + 0: [2023-03-16 09:05:40,744] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler + 0: [2023-03-16 09:05:40,744] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = + 0: [2023-03-16 09:05:40,744] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] + 0: [2023-03-16 09:05:40,745] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: + 0: [2023-03-16 09:05:40,745] [INFO] [config.py:1011:print] activation_checkpointing_config { + 0: "partition_activations": false, + 0: "contiguous_memory_optimization": false, + 0: "cpu_checkpointing": false, + 0: "number_checkpoints": null, + 0: "synchronize_checkpoint_boundary": false, + 0: "profile": false + 0: } + 0: [2023-03-16 09:05:40,745] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} + 0: [2023-03-16 09:05:40,745] [INFO] [config.py:1011:print] amp_enabled .................. False + 0: [2023-03-16 09:05:40,745] [INFO] [config.py:1011:print] amp_params ................... False + 0: [2023-03-16 09:05:40,745] [INFO] [config.py:1011:print] autotuning_config ............ { + 0: "enabled": false, + 0: "start_step": null, + 0: "end_step": null, + 0: "metric_path": null, + 0: "arg_mappings": null, + 0: "metric": "throughput", + 0: "model_info": null, + 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", + 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", + 0: "overwrite": true, + 0: "fast": true, + 0: "start_profile_step": 3, + 0: "end_profile_step": 5, + 0: "tuner_type": "gridsearch", + 0: "tuner_early_stopping": 5, + 0: "tuner_num_trials": 50, + 0: "model_info_path": null, + 0: "mp_size": 1, + 0: "max_train_batch_size": null, + 0: "min_train_batch_size": 1, + 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, + 0: "min_train_micro_batch_size_per_gpu": 1, + 0: "num_tuning_micro_batch_sizes": 3 + 0: } + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] bfloat16_enabled ............. True + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] comms_config ................. + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] communication_data_type ...... None + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa + 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] curriculum_enabled ........... False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] curriculum_params ............ False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] dataloader_drop_last ......... False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] disable_allgather ............ False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] dump_state ................... False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] elasticity_enabled ........... False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] flops_profiler_config ........ { + 0: "enabled": false, + 0: "profile_step": 1, + 0: "module_depth": -1, + 0: "top_modules": 1, + 0: "detailed": true, + 0: "output_file": null + 0: } + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] fp16_auto_cast ............... None + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] fp16_enabled ................. False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] global_rank .................. 0 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] load_universal_checkpoint .... False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] loss_scale ................... 1.0 + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] memory_breakdown ............. False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] monitor_config ............... + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] nebula_config ................ { + 0: "enabled": false, + 0: "persistent_storage_path": null, + 0: "persistent_time_interval": 100, + 0: "num_of_version_in_retention": 2, + 0: "enable_nebula_load": true, + 0: "load_path": null + 0: } + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] optimizer_name ............... None + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] optimizer_params ............. None + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} + 0: [2023-03-16 09:05:40,746] [INFO] [config.py:1011:print] pld_enabled .................. False + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] pld_params ................... False + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] prescale_gradients ........... False + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] scheduler_name ............... None + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] scheduler_params ............. None + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] sparse_attention ............. None + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] steps_per_print .............. 2000 + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] train_batch_size ............. 128 + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 1 + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] use_node_local_storage ....... False + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] world_size ................... 128 + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] zero_enabled ................. False + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 + 0: [2023-03-16 09:05:40,747] [INFO] [config.py:996:print_user_config] json = { + 0: "train_micro_batch_size_per_gpu": 1, + 0: "train_batch_size": 128, + 0: "gradient_clipping": 1.0, + 0: "zero_optimization": { + 0: "stage": 0 + 0: }, + 0: "bf16": { + 0: "enabled": true + 0: }, + 0: "steps_per_print": 2.000000e+03, + 0: "wall_clock_breakdown": false + 0: } + 0: Time to load utils op: 0.0004241466522216797 seconds + 0: [2023-03-16 09:05:40,747] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=1 + 0: [2023-03-16 09:05:40,759] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=35 [0, 35) STAGE_PARAMS=1517252608 (1517.253M) TOTAL_PARAMS=1517252608 (1517.253M) UNIQUE_PARAMS=1517252608 (1517.253M) +15: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 4: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 3: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 9: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 5: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 6: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 0: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 0: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 2: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 7: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt... +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 2: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 6: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +14: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 1: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 9: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 5: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +15: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 4: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 3: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. + 7: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/mp_rank_00_model_states.pt. +10: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:40,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:41,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:41,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:41,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:41,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:41,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:41,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:41,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:41,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +13: [2023-03-16 09:05:41,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:41,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:41,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:41,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:41,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:41,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +12: [2023-03-16 09:05:41,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:41,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:41,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:41,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:41,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +15: [2023-03-16 09:05:41,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:41,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:41,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:41,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +14: [2023-03-16 09:05:41,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +10: [2023-03-16 09:05:41,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... +11: [2023-03-16 09:05:41,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:41,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:41,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +13: [2023-03-16 09:05:41,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +12: [2023-03-16 09:05:41,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +15: [2023-03-16 09:05:41,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:41,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:41,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:41,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:41,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +14: [2023-03-16 09:05:41,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +10: [2023-03-16 09:05:41,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. +11: [2023-03-16 09:05:41,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_01-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +14: [2023-03-16 09:05:41,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +13: [2023-03-16 09:05:41,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +15: [2023-03-16 09:05:41,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:41,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:41,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:41,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:41,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +15: [2023-03-16 09:05:41,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +11: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +11: [2023-03-16 09:05:41,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +12: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 6: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:41,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:41,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +12: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... +10: [2023-03-16 09:05:41,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:41,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:41,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:41,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:41,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:41,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 1: [2023-03-16 09:05:41,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:41,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:41,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:41,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:41,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:41,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 2: [2023-03-16 09:05:41,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:41,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 3: [2023-03-16 09:05:41,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 9: [2023-03-16 09:05:41,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:41,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:41,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:41,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:41,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:41,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:41,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:41,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:41,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 5: [2023-03-16 09:05:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +14: [2023-03-16 09:05:41,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:41,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:41,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:41,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:41,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:41,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:41,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +13: [2023-03-16 09:05:41,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 7: [2023-03-16 09:05:41,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 8: [2023-03-16 09:05:41,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 4: [2023-03-16 09:05:41,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. +10: [2023-03-16 09:05:41,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_03-model_00-model_states.pt. + 0: [2023-03-16 09:05:41,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:41,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:41,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:41,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:41,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:41,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:42,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:42,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +10: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +15: [2023-03-16 09:05:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +14: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:42,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +11: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +11: [2023-03-16 09:05:42,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +14: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +15: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +13: [2023-03-16 09:05:42,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... +12: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +13: [2023-03-16 09:05:42,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +12: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. +10: [2023-03-16 09:05:42,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_04-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +12: [2023-03-16 09:05:42,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +15: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +11: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +10: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +14: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... +13: [2023-03-16 09:05:42,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:42,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:42,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:42,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +13: [2023-03-16 09:05:42,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:42,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +11: [2023-03-16 09:05:42,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +12: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +10: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +15: [2023-03-16 09:05:42,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. +14: [2023-03-16 09:05:42,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_05-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +10: [2023-03-16 09:05:42,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:42,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +12: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 5: [2023-03-16 09:05:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 6: [2023-03-16 09:05:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:42,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:42,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +13: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +15: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +11: [2023-03-16 09:05:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:42,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:42,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... +14: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:42,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 8: [2023-03-16 09:05:42,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:42,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:42,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:42,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:42,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:42,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:42,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 9: [2023-03-16 09:05:42,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:42,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:42,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:42,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:42,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:42,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:42,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:42,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:42,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:42,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:42,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:42,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:42,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +10: [2023-03-16 09:05:42,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:42,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:43,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:43,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:43,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:43,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:43,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +12: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +15: [2023-03-16 09:05:43,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:43,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:43,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:43,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:43,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:43,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +11: [2023-03-16 09:05:43,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +13: [2023-03-16 09:05:43,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:43,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:43,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. +14: [2023-03-16 09:05:43,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_06-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +12: [2023-03-16 09:05:43,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +11: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +10: [2023-03-16 09:05:43,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +12: [2023-03-16 09:05:43,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +13: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +10: [2023-03-16 09:05:43,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +15: [2023-03-16 09:05:43,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... +14: [2023-03-16 09:05:43,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +13: [2023-03-16 09:05:43,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +11: [2023-03-16 09:05:43,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +14: [2023-03-16 09:05:43,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. +15: [2023-03-16 09:05:43,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_07-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 5: [2023-03-16 09:05:43,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:43,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +14: [2023-03-16 09:05:43,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +10: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +12: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +13: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +11: [2023-03-16 09:05:43,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +13: [2023-03-16 09:05:43,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:43,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:43,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:43,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:43,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 2: [2023-03-16 09:05:43,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 4: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... +15: [2023-03-16 09:05:43,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +11: [2023-03-16 09:05:43,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 9: [2023-03-16 09:05:43,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +10: [2023-03-16 09:05:43,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:43,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +14: [2023-03-16 09:05:43,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:43,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:43,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:43,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:43,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:43,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:43,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:43,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:43,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:43,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:43,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:43,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:43,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:43,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:43,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +15: [2023-03-16 09:05:43,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:43,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:43,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:43,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:43,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:43,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:43,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:43,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:43,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:43,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:43,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. +12: [2023-03-16 09:05:43,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_08-model_00-model_states.pt. + 0: [2023-03-16 09:05:43,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:43,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:43,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:43,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:43,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:43,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:43,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:43,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:43,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:43,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:43,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:43,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:43,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +13: [2023-03-16 09:05:43,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:43,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:43,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:43,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:43,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:43,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:43,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:43,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:44,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:44,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:44,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:44,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:44,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:44,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:44,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:44,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:44,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:44,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:44,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:44,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:44,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:44,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:44,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +13: [2023-03-16 09:05:44,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +14: [2023-03-16 09:05:44,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +12: [2023-03-16 09:05:44,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:44,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:44,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:44,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:44,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +14: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:44,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:44,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +12: [2023-03-16 09:05:44,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +10: [2023-03-16 09:05:44,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:44,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +10: [2023-03-16 09:05:44,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +11: [2023-03-16 09:05:44,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +11: [2023-03-16 09:05:44,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt... +15: [2023-03-16 09:05:44,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. +15: [2023-03-16 09:05:44,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_09-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +11: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +13: [2023-03-16 09:05:44,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +12: [2023-03-16 09:05:44,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:44,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:44,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +13: [2023-03-16 09:05:44,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +10: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +14: [2023-03-16 09:05:44,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt... +15: [2023-03-16 09:05:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +10: [2023-03-16 09:05:44,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +11: [2023-03-16 09:05:44,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +15: [2023-03-16 09:05:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +12: [2023-03-16 09:05:44,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_10-model_00-model_states.pt. +14: [2023-03-16 09:05:44,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:44,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +11: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +11: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +15: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +10: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +12: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +14: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt... +13: [2023-03-16 09:05:44,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:44,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:44,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:44,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:44,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:44,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:44,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 8: [2023-03-16 09:05:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 5: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +15: [2023-03-16 09:05:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:44,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:44,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 4: [2023-03-16 09:05:44,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:44,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:44,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:44,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 7: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 3: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 1: [2023-03-16 09:05:44,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +13: [2023-03-16 09:05:44,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:44,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:44,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:44,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:44,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +12: [2023-03-16 09:05:44,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 6: [2023-03-16 09:05:44,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 2: [2023-03-16 09:05:44,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:44,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 9: [2023-03-16 09:05:44,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:44,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:44,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:44,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:44,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:44,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:44,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:44,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:44,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. + 0: [2023-03-16 09:05:44,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +10: [2023-03-16 09:05:44,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:44,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_11-model_00-model_states.pt. +14: [2023-03-16 09:05:44,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:44,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:44,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:44,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:44,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:44,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:44,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:44,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:44,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:44,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:44,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:44,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:44,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:44,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:45,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:45,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:45,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:45,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:45,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:45,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:45,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:45,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:45,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +15: [2023-03-16 09:05:45,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +10: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +12: [2023-03-16 09:05:45,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +11: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +14: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... +13: [2023-03-16 09:05:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +12: [2023-03-16 09:05:45,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +10: [2023-03-16 09:05:45,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +13: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +15: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +11: [2023-03-16 09:05:45,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_12-model_00-model_states.pt. +14: [2023-03-16 09:05:45,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +15: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +11: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +14: [2023-03-16 09:05:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +12: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +10: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... +13: [2023-03-16 09:05:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:45,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +13: [2023-03-16 09:05:45,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +14: [2023-03-16 09:05:45,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +10: [2023-03-16 09:05:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +12: [2023-03-16 09:05:45,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +15: [2023-03-16 09:05:45,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_13-model_00-model_states.pt. +11: [2023-03-16 09:05:45,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:45,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:45,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 8: [2023-03-16 09:05:45,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:45,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 1: [2023-03-16 09:05:45,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +13: [2023-03-16 09:05:45,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:45,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +14: [2023-03-16 09:05:45,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +13: [2023-03-16 09:05:45,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:45,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +12: [2023-03-16 09:05:45,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:45,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 0: [2023-03-16 09:05:45,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +10: [2023-03-16 09:05:45,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:45,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 9: [2023-03-16 09:05:45,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +15: [2023-03-16 09:05:45,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:45,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:45,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:45,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +14: [2023-03-16 09:05:45,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:45,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... +11: [2023-03-16 09:05:45,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +10: [2023-03-16 09:05:45,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:45,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:45,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:45,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:45,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:45,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:45,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:45,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 4: [2023-03-16 09:05:45,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:45,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 3: [2023-03-16 09:05:45,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +12: [2023-03-16 09:05:45,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:45,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:45,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:45,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 2: [2023-03-16 09:05:45,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 6: [2023-03-16 09:05:45,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 7: [2023-03-16 09:05:45,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:45,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:45,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:45,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:45,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:45,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:45,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:45,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +15: [2023-03-16 09:05:45,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:45,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:45,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. +11: [2023-03-16 09:05:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_14-model_00-model_states.pt. + 5: [2023-03-16 09:05:45,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:45,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:45,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +13: [2023-03-16 09:05:46,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +13: [2023-03-16 09:05:46,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:46,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:46,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:46,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:46,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:46,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +10: [2023-03-16 09:05:46,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:46,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:46,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:46,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:46,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:46,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +15: [2023-03-16 09:05:46,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:46,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:46,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +12: [2023-03-16 09:05:46,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +10: [2023-03-16 09:05:46,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:46,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:46,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:46,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:46,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +14: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:46,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:46,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:46,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:46,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:46,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... +11: [2023-03-16 09:05:46,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +11: [2023-03-16 09:05:46,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +14: [2023-03-16 09:05:46,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +12: [2023-03-16 09:05:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_15-model_00-model_states.pt. +15: [2023-03-16 09:05:46,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +13: [2023-03-16 09:05:46,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +13: [2023-03-16 09:05:46,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +10: [2023-03-16 09:05:46,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +11: [2023-03-16 09:05:46,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +12: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +15: [2023-03-16 09:05:46,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... +14: [2023-03-16 09:05:46,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +15: [2023-03-16 09:05:46,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +12: [2023-03-16 09:05:46,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +10: [2023-03-16 09:05:46,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +14: [2023-03-16 09:05:46,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_16-model_00-model_states.pt. +11: [2023-03-16 09:05:46,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:46,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:46,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 8: [2023-03-16 09:05:46,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 1: [2023-03-16 09:05:46,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 1: [2023-03-16 09:05:46,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +15: [2023-03-16 09:05:46,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +13: [2023-03-16 09:05:46,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:46,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:46,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:46,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +13: [2023-03-16 09:05:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:46,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +12: [2023-03-16 09:05:46,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +11: [2023-03-16 09:05:46,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:46,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +11: [2023-03-16 09:05:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 4: [2023-03-16 09:05:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 5: [2023-03-16 09:05:46,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:46,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:46,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 9: [2023-03-16 09:05:46,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 3: [2023-03-16 09:05:46,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:46,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:46,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 2: [2023-03-16 09:05:46,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:46,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 0: [2023-03-16 09:05:46,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 7: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +10: [2023-03-16 09:05:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... +14: [2023-03-16 09:05:46,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:46,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:46,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +10: [2023-03-16 09:05:46,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:46,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +15: [2023-03-16 09:05:46,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:46,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:46,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:46,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. + 6: [2023-03-16 09:05:46,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:46,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:46,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:46,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:46,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:46,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +12: [2023-03-16 09:05:46,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:46,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_17-model_00-model_states.pt. +14: [2023-03-16 09:05:46,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:46,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +15: [2023-03-16 09:05:47,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:47,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:47,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:47,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:47,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:47,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:47,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +15: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +14: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +10: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:47,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:47,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +13: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +11: [2023-03-16 09:05:47,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... +12: [2023-03-16 09:05:47,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +13: [2023-03-16 09:05:47,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:47,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +11: [2023-03-16 09:05:47,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:47,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +12: [2023-03-16 09:05:47,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +14: [2023-03-16 09:05:47,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. +10: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_18-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +13: [2023-03-16 09:05:47,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +14: [2023-03-16 09:05:47,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +10: [2023-03-16 09:05:47,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +13: [2023-03-16 09:05:47,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +15: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +15: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +12: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +10: [2023-03-16 09:05:47,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +11: [2023-03-16 09:05:47,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt... +14: [2023-03-16 09:05:47,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +12: [2023-03-16 09:05:47,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. +11: [2023-03-16 09:05:47,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_19-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +13: [2023-03-16 09:05:47,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:47,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +13: [2023-03-16 09:05:47,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:47,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:47,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:47,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:47,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:47,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 8: [2023-03-16 09:05:47,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 4: [2023-03-16 09:05:47,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:47,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:47,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +12: [2023-03-16 09:05:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +15: [2023-03-16 09:05:47,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:47,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +11: [2023-03-16 09:05:47,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +10: [2023-03-16 09:05:47,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:47,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:47,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 2: [2023-03-16 09:05:47,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:47,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:47,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:47,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +10: [2023-03-16 09:05:47,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:47,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:47,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 9: [2023-03-16 09:05:47,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:47,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:47,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 5: [2023-03-16 09:05:47,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:47,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... +14: [2023-03-16 09:05:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +15: [2023-03-16 09:05:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +11: [2023-03-16 09:05:47,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:47,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +12: [2023-03-16 09:05:47,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:47,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:47,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:47,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:47,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:47,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:47,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:47,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:47,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:47,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:47,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 0: [2023-03-16 09:05:47,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:47,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 3: [2023-03-16 09:05:47,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:47,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:47,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 7: [2023-03-16 09:05:47,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 6: [2023-03-16 09:05:47,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. +14: [2023-03-16 09:05:47,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_20-model_00-model_states.pt. + 1: [2023-03-16 09:05:47,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:47,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:47,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:48,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +13: [2023-03-16 09:05:48,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:48,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:48,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:48,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:48,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:48,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:48,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:48,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +13: [2023-03-16 09:05:48,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:48,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:48,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:48,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +10: [2023-03-16 09:05:48,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +10: [2023-03-16 09:05:48,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +15: [2023-03-16 09:05:48,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +12: [2023-03-16 09:05:48,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:48,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:48,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:48,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +11: [2023-03-16 09:05:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:48,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:48,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:48,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +12: [2023-03-16 09:05:48,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +15: [2023-03-16 09:05:48,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... +14: [2023-03-16 09:05:48,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +11: [2023-03-16 09:05:48,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_21-model_00-model_states.pt. +14: [2023-03-16 09:05:48,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +10: [2023-03-16 09:05:48,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +13: [2023-03-16 09:05:48,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +13: [2023-03-16 09:05:48,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +12: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +14: [2023-03-16 09:05:48,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +10: [2023-03-16 09:05:48,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +15: [2023-03-16 09:05:48,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +12: [2023-03-16 09:05:48,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +15: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... +11: [2023-03-16 09:05:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +14: [2023-03-16 09:05:48,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. +11: [2023-03-16 09:05:48,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_22-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:48,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +14: [2023-03-16 09:05:48,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +12: [2023-03-16 09:05:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:48,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:48,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +10: [2023-03-16 09:05:48,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +15: [2023-03-16 09:05:48,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:48,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +10: [2023-03-16 09:05:48,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 9: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +13: [2023-03-16 09:05:48,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +13: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:48,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:48,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:48,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:48,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +14: [2023-03-16 09:05:48,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:48,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:48,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 5: [2023-03-16 09:05:48,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:48,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +12: [2023-03-16 09:05:48,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:48,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:48,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:48,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 6: [2023-03-16 09:05:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 4: [2023-03-16 09:05:48,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:48,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 1: [2023-03-16 09:05:48,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +15: [2023-03-16 09:05:48,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:48,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:48,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 0: [2023-03-16 09:05:48,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 2: [2023-03-16 09:05:48,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:48,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:48,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:48,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 3: [2023-03-16 09:05:48,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... +11: [2023-03-16 09:05:48,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:48,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:48,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. +11: [2023-03-16 09:05:48,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_23-model_00-model_states.pt. + 7: [2023-03-16 09:05:48,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:48,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:48,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:48,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:48,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:48,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:48,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:48,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:48,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:48,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:48,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:48,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:48,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:48,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:48,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:48,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:48,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:48,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:48,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:48,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:49,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:49,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:49,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:49,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:49,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:49,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:49,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:49,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:49,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:49,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:49,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:49,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +14: [2023-03-16 09:05:49,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +10: [2023-03-16 09:05:49,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:49,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:49,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:49,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +10: [2023-03-16 09:05:49,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:49,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +14: [2023-03-16 09:05:49,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +15: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +13: [2023-03-16 09:05:49,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:49,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:49,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +12: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +11: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +12: [2023-03-16 09:05:49,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt... +13: [2023-03-16 09:05:49,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +15: [2023-03-16 09:05:49,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. +11: [2023-03-16 09:05:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_24-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:49,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:49,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +10: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +15: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +13: [2023-03-16 09:05:49,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +14: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +12: [2023-03-16 09:05:49,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... +11: [2023-03-16 09:05:49,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +13: [2023-03-16 09:05:49,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +14: [2023-03-16 09:05:49,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +10: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +12: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +11: [2023-03-16 09:05:49,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_25-model_00-model_states.pt. +15: [2023-03-16 09:05:49,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:49,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +14: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +10: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +11: [2023-03-16 09:05:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +11: [2023-03-16 09:05:49,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +15: [2023-03-16 09:05:49,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:49,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 1: [2023-03-16 09:05:49,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:49,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 2: [2023-03-16 09:05:49,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:49,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 6: [2023-03-16 09:05:49,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:49,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 6: [2023-03-16 09:05:49,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:49,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 8: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +12: [2023-03-16 09:05:49,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +13: [2023-03-16 09:05:49,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:49,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:49,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:49,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... +13: [2023-03-16 09:05:49,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:49,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 9: [2023-03-16 09:05:49,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:49,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 7: [2023-03-16 09:05:49,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:49,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:49,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +10: [2023-03-16 09:05:49,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +14: [2023-03-16 09:05:49,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:49,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:49,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:49,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:49,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:49,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:49,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:49,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:49,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:49,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:49,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:49,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +15: [2023-03-16 09:05:49,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:49,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:49,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. +12: [2023-03-16 09:05:49,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 0: [2023-03-16 09:05:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 4: [2023-03-16 09:05:49,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:49,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:49,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 5: [2023-03-16 09:05:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_26-model_00-model_states.pt. + 3: [2023-03-16 09:05:49,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:49,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:50,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:50,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:50,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:50,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:50,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:50,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:50,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:50,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +13: [2023-03-16 09:05:50,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:50,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:50,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:50,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:50,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:50,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:50,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:50,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:50,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:50,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:50,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:50,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:50,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:50,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:50,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:50,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:50,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +11: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +12: [2023-03-16 09:05:50,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:50,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:50,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +13: [2023-03-16 09:05:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +11: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +10: [2023-03-16 09:05:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +14: [2023-03-16 09:05:50,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt... +15: [2023-03-16 09:05:50,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +12: [2023-03-16 09:05:50,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +14: [2023-03-16 09:05:50,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +15: [2023-03-16 09:05:50,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_27-model_00-model_states.pt. +10: [2023-03-16 09:05:50,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +14: [2023-03-16 09:05:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +14: [2023-03-16 09:05:50,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +13: [2023-03-16 09:05:50,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +13: [2023-03-16 09:05:50,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +12: [2023-03-16 09:05:50,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +12: [2023-03-16 09:05:50,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +11: [2023-03-16 09:05:50,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +10: [2023-03-16 09:05:50,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +15: [2023-03-16 09:05:50,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +10: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. +11: [2023-03-16 09:05:50,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt... +15: [2023-03-16 09:05:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_28-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:50,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:50,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +14: [2023-03-16 09:05:50,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:50,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:50,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +14: [2023-03-16 09:05:50,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:50,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +13: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:50,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:50,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:50,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:50,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +13: [2023-03-16 09:05:50,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:50,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +10: [2023-03-16 09:05:50,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +15: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +12: [2023-03-16 09:05:50,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +12: [2023-03-16 09:05:50,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt... +11: [2023-03-16 09:05:50,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +11: [2023-03-16 09:05:50,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +15: [2023-03-16 09:05:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:50,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 2: [2023-03-16 09:05:50,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. +10: [2023-03-16 09:05:50,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 8: [2023-03-16 09:05:50,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:50,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:50,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:50,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 0: [2023-03-16 09:05:50,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:50,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:50,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:50,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:50,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:50,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:50,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_29-model_00-model_states.pt. + 5: [2023-03-16 09:05:50,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:50,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:50,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:50,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:50,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:50,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:50,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:50,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:50,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:50,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:50,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 4: [2023-03-16 09:05:50,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:50,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +11: [2023-03-16 09:05:50,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:50,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:50,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:50,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:50,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:50,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:50,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:50,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:50,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:50,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 7: [2023-03-16 09:05:50,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:50,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:51,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 7: [2023-03-16 09:05:51,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +12: [2023-03-16 09:05:51,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:51,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:51,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:51,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:51,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +11: [2023-03-16 09:05:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:51,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:51,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 7: [2023-03-16 09:05:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 6: [2023-03-16 09:05:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:51,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:51,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:51,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:51,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 9: [2023-03-16 09:05:51,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 9: [2023-03-16 09:05:51,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:51,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +12: [2023-03-16 09:05:51,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +12: [2023-03-16 09:05:51,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 4: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 4: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... + 4: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:51,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:51,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +14: [2023-03-16 09:05:51,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 1: [2023-03-16 09:05:51,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 6: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... + 6: [2023-03-16 09:05:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 7: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... + 7: [2023-03-16 09:05:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +11: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... +11: [2023-03-16 09:05:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:51,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:51,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:51,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:51,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:51,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:51,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:51,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... + 9: [2023-03-16 09:05:51,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +14: [2023-03-16 09:05:51,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 3: [2023-03-16 09:05:51,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +12: [2023-03-16 09:05:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... +12: [2023-03-16 09:05:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:51,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:51,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:51,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:51,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: > overriding learning rate value to 0.0002 + 0: > overriding minimum learning rate value to 2e-05 + 0: > overriding warmup iterations value to 4292 + 0: > overriding total number of iterations value to 4291992 + 0: > overriding decay style value to cosine + 2: [2023-03-16 09:05:51,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +15: [2023-03-16 09:05:51,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +13: [2023-03-16 09:05:51,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +13: [2023-03-16 09:05:51,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... +10: [2023-03-16 09:05:51,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +13: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 8: [2023-03-16 09:05:51,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 8: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 3: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 0: [2023-03-16 09:05:51,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +15: [2023-03-16 09:05:51,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +15: [2023-03-16 09:05:51,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +15: [2023-03-16 09:05:51,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 1: [2023-03-16 09:05:51,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:51,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:51,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:51,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:51,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:51,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:51,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... + 1: [2023-03-16 09:05:51,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:51,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +14: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +14: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... +14: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. +10: [2023-03-16 09:05:51,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_30-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 5: [2023-03-16 09:05:51,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 5: [2023-03-16 09:05:51,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 2: [2023-03-16 09:05:51,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... + 2: [2023-03-16 09:05:51,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +13: [2023-03-16 09:05:51,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:51,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:51,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:51,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:51,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:51,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:51,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... +13: [2023-03-16 09:05:51,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:51,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. +10: [2023-03-16 09:05:51,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt... +10: [2023-03-16 09:05:51,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/layer_32-model_00-model_states.pt. + 0: [2023-03-16 09:05:51,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:51,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:51,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:51,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:51,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:51,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:51,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... + 0: [2023-03-16 09:05:51,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:51,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:51,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:51,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:51,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:51,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:51,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:51,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... +15: [2023-03-16 09:05:51,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... + 8: [2023-03-16 09:05:51,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:51,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:51,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:51,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:51,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:51,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:51,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:51,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... + 5: [2023-03-16 09:05:51,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... + 2: [2023-03-16 09:05:51,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:51,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:51,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:51,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:51,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:51,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:51,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:51,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... +10: [2023-03-16 09:05:51,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... + 3: [2023-03-16 09:05:51,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:51,281] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 27 + 7: [2023-03-16 09:05:51,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:51,288] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 59 + 9: [2023-03-16 09:05:51,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:51,301] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 72 + 4: [2023-03-16 09:05:51,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,302] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 39 + 7: [2023-03-16 09:05:51,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:51,305] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 56 + 4: [2023-03-16 09:05:51,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,318] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 36 + 6: [2023-03-16 09:05:51,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:51,336] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 52 +11: [2023-03-16 09:05:51,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,341] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 89 + 9: [2023-03-16 09:05:51,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,342] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 116 + 9: [2023-03-16 09:05:51,342] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 75 + 4: [2023-03-16 09:05:51,348] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 36 +15: [2023-03-16 09:05:51,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,349] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 93 + 4: [2023-03-16 09:05:51,350] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 38 +15: [2023-03-16 09:05:51,348] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 120 + 9: [2023-03-16 09:05:51,353] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 72 + 6: [2023-03-16 09:05:51,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,355] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 39 + 6: [2023-03-16 09:05:51,355] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 51 + 6: [2023-03-16 09:05:51,356] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 52 + 3: [2023-03-16 09:05:51,352] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 27 +12: [2023-03-16 09:05:51,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,361] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 98 + 7: [2023-03-16 09:05:51,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:51,365] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 63 + 7: [2023-03-16 09:05:51,370] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 56 + 0: [2023-03-16 09:05:51,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:51,374] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 0 + 7: [2023-03-16 09:05:51,374] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 59 +12: [2023-03-16 09:05:51,377] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 98 + 9: [2023-03-16 09:05:51,384] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 75 +14: [2023-03-16 09:05:51,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,387] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 115 +14: [2023-03-16 09:05:51,391] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 116 + 6: [2023-03-16 09:05:51,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:51,395] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 48 + 5: [2023-03-16 09:05:51,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:51,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:51,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:51,395] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 29 + 3: [2023-03-16 09:05:51,395] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 26 + 3: [2023-03-16 09:05:51,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:51,399] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 41 +11: [2023-03-16 09:05:51,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,400] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 88 + 3: [2023-03-16 09:05:51,400] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 31 + 6: [2023-03-16 09:05:51,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:51,402] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 55 +11: [2023-03-16 09:05:51,402] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 90 + 4: [2023-03-16 09:05:51,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,403] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 37 +15: [2023-03-16 09:05:51,404] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 120 + 6: [2023-03-16 09:05:51,405] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 51 +12: [2023-03-16 09:05:51,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,407] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 100 + 9: [2023-03-16 09:05:51,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:51,408] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 73 +11: [2023-03-16 09:05:51,411] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 93 + 1: [2023-03-16 09:05:51,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:51,413] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 14 + 7: [2023-03-16 09:05:51,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,413] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 34 + 7: [2023-03-16 09:05:51,413] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 58 + 7: [2023-03-16 09:05:51,415] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 63 + 1: [2023-03-16 09:05:51,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:51,416] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 12 +11: [2023-03-16 09:05:51,419] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 89 + 0: [2023-03-16 09:05:51,420] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 0 + 0: could not find arguments in the checkpoint ... + 0: checkpoint version 3.0 + 4: [2023-03-16 09:05:51,422] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 38 + 5: [2023-03-16 09:05:51,426] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 41 + 3: [2023-03-16 09:05:51,427] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 29 + 6: [2023-03-16 09:05:51,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:51,432] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 49 +13: [2023-03-16 09:05:51,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:51,436] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 107 +11: [2023-03-16 09:05:51,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:51,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,438] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 92 +13: [2023-03-16 09:05:51,439] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 106 + 3: [2023-03-16 09:05:51,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:51,442] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 30 + 8: [2023-03-16 09:05:51,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,442] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 115 + 9: [2023-03-16 09:05:51,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:51,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:51,442] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 70 +14: [2023-03-16 09:05:51,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:51,443] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 76 +14: [2023-03-16 09:05:51,443] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 112 + 7: [2023-03-16 09:05:51,443] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 61 + 9: [2023-03-16 09:05:51,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:51,445] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 74 +11: [2023-03-16 09:05:51,447] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 90 + 0: [2023-03-16 09:05:51,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:51,448] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 2 + 7: [2023-03-16 09:05:51,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:51,450] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 57 +12: [2023-03-16 09:05:51,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,452] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 103 + 8: [2023-03-16 09:05:51,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:51,453] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 64 +11: [2023-03-16 09:05:51,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,454] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 94 + 6: [2023-03-16 09:05:51,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,455] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 100 + 4: [2023-03-16 09:05:51,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:51,456] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 54 + 4: [2023-03-16 09:05:51,456] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 32 +12: [2023-03-16 09:05:51,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,457] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 96 + 9: [2023-03-16 09:05:51,459] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 73 + 3: [2023-03-16 09:05:51,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:51,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,460] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 99 + 9: [2023-03-16 09:05:51,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:51,460] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 25 +15: [2023-03-16 09:05:51,460] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 122 + 9: [2023-03-16 09:05:51,460] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 77 + 8: [2023-03-16 09:05:51,460] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 70 + 1: [2023-03-16 09:05:51,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:51,462] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 11 + 0: [2023-03-16 09:05:51,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:51,469] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 6 + 4: [2023-03-16 09:05:51,469] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 33 + 1: [2023-03-16 09:05:51,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:51,472] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 8 +15: [2023-03-16 09:05:51,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:51,474] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 121 +13: [2023-03-16 09:05:51,475] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 107 +13: [2023-03-16 09:05:51,478] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 106 +11: [2023-03-16 09:05:51,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,480] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 95 + 4: [2023-03-16 09:05:51,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:51,482] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 12 + 4: [2023-03-16 09:05:51,482] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 35 + 6: [2023-03-16 09:05:51,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:51,484] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 14 +10: [2023-03-16 09:05:51,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:51,485] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 50 +10: [2023-03-16 09:05:51,485] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 80 +12: [2023-03-16 09:05:51,486] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 103 + 0: [2023-03-16 09:05:51,488] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 2 + 5: [2023-03-16 09:05:51,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:51,493] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 46 + 6: [2023-03-16 09:05:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:51,494] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 53 +12: [2023-03-16 09:05:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,495] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 101 + 8: [2023-03-16 09:05:51,495] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 66 +11: [2023-03-16 09:05:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,498] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 91 + 9: [2023-03-16 09:05:51,498] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 78 +13: [2023-03-16 09:05:51,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:51,499] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 109 + 5: [2023-03-16 09:05:51,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,499] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 32 + 5: [2023-03-16 09:05:51,500] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 43 +11: [2023-03-16 09:05:51,500] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 94 + 9: [2023-03-16 09:05:51,501] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 77 +15: [2023-03-16 09:05:51,501] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 122 + 7: [2023-03-16 09:05:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:51,499] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 31 + 3: [2023-03-16 09:05:51,500] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 26 + 7: [2023-03-16 09:05:51,502] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 62 + 9: [2023-03-16 09:05:51,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:51,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:51,503] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 79 + 3: [2023-03-16 09:05:51,503] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 28 +14: [2023-03-16 09:05:51,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,505] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 118 + 1: [2023-03-16 09:05:51,504] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 11 + 3: [2023-03-16 09:05:51,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. + 3: [2023-03-16 09:05:51,507] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 24 +10: [2023-03-16 09:05:51,508] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 80 +12: [2023-03-16 09:05:51,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,509] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 102 +14: [2023-03-16 09:05:51,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,513] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 113 + 7: [2023-03-16 09:05:51,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:51,514] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 60 +13: [2023-03-16 09:05:51,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:51,516] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 111 + 7: [2023-03-16 09:05:51,517] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 58 + 2: [2023-03-16 09:05:51,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:51,518] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 21 + 8: [2023-03-16 09:05:51,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:51,519] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 68 + 1: [2023-03-16 09:05:51,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:51,522] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 9 +10: [2023-03-16 09:05:51,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:51,525] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 85 +15: [2023-03-16 09:05:51,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:51,529] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 124 + 0: [2023-03-16 09:05:51,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:51,534] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 5 + 6: [2023-03-16 09:05:51,534] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 50 + 8: [2023-03-16 09:05:51,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,535] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 101 + 1: [2023-03-16 09:05:51,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:51,535] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 67 + 1: [2023-03-16 09:05:51,535] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 15 + 5: [2023-03-16 09:05:51,536] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 46 + 1: [2023-03-16 09:05:51,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:51,537] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 13 + 2: [2023-03-16 09:05:51,538] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 21 +14: [2023-03-16 09:05:51,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,538] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 117 +10: [2023-03-16 09:05:51,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:51,541] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 61 +10: [2023-03-16 09:05:51,541] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 86 + 2: [2023-03-16 09:05:51,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. + 4: [2023-03-16 09:05:51,542] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 35 + 2: [2023-03-16 09:05:51,542] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 20 +15: [2023-03-16 09:05:51,543] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 121 + 8: [2023-03-16 09:05:51,545] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 64 +12: [2023-03-16 09:05:51,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. +12: [2023-03-16 09:05:51,550] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 97 +14: [2023-03-16 09:05:51,550] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 114 + 4: [2023-03-16 09:05:51,550] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 33 + 7: [2023-03-16 09:05:51,550] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 62 + 8: [2023-03-16 09:05:51,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:51,551] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 69 +13: [2023-03-16 09:05:51,551] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 109 + 3: [2023-03-16 09:05:51,550] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 28 +13: [2023-03-16 09:05:51,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:51,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:51,555] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 105 + 6: [2023-03-16 09:05:51,555] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 54 + 5: [2023-03-16 09:05:51,555] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 44 + 6: [2023-03-16 09:05:51,555] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 49 + 3: [2023-03-16 09:05:51,554] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 24 + 2: [2023-03-16 09:05:51,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:51,559] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 23 + 0: [2023-03-16 09:05:51,560] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 6 + 8: [2023-03-16 09:05:51,560] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 66 +13: [2023-03-16 09:05:51,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:51,562] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 108 +11: [2023-03-16 09:05:51,566] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 92 + 1: [2023-03-16 09:05:51,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. + 1: [2023-03-16 09:05:51,567] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 10 + 9: [2023-03-16 09:05:51,567] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 78 + 5: [2023-03-16 09:05:51,569] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 43 + 4: [2023-03-16 09:05:51,570] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 37 +14: [2023-03-16 09:05:51,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:51,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:51,595] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 79 +12: [2023-03-16 09:05:51,614] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 99 +10: [2023-03-16 09:05:51,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:51,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,571] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 119 + 4: [2023-03-16 09:05:51,595] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 34 +15: [2023-03-16 09:05:51,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:51,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. + 7: [2023-03-16 09:05:51,607] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 60 + 1: [2023-03-16 09:05:51,591] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 15 + 8: [2023-03-16 09:05:51,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,607] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 88 + 2: [2023-03-16 09:05:51,581] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 20 + 0: [2023-03-16 09:05:51,578] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 1 +10: [2023-03-16 09:05:51,578] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 87 +13: [2023-03-16 09:05:51,586] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 110 +14: [2023-03-16 09:05:51,581] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 112 +15: [2023-03-16 09:05:51,574] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 123 + 8: [2023-03-16 09:05:51,575] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 71 + 2: [2023-03-16 09:05:51,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. + 9: [2023-03-16 09:05:51,619] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 74 +10: [2023-03-16 09:05:51,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. +13: [2023-03-16 09:05:51,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. +14: [2023-03-16 09:05:51,589] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 118 +15: [2023-03-16 09:05:51,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:51,604] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 42 + 1: [2023-03-16 09:05:51,620] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 9 + 8: [2023-03-16 09:05:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. +11: [2023-03-16 09:05:51,616] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 91 + 2: [2023-03-16 09:05:51,591] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 17 + 0: [2023-03-16 09:05:51,579] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 7 +10: [2023-03-16 09:05:51,589] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 82 +13: [2023-03-16 09:05:51,595] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 104 +15: [2023-03-16 09:05:51,579] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 125 + 5: [2023-03-16 09:05:51,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. + 8: [2023-03-16 09:05:51,578] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 65 + 2: [2023-03-16 09:05:51,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:51,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:51,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:51,607] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 124 + 5: [2023-03-16 09:05:51,606] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 40 + 8: [2023-03-16 09:05:51,597] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 69 + 2: [2023-03-16 09:05:51,606] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 19 + 0: [2023-03-16 09:05:51,605] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 4 +10: [2023-03-16 09:05:51,618] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 84 + 2: [2023-03-16 09:05:51,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. + 0: [2023-03-16 09:05:51,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:51,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:51,612] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 16 + 0: [2023-03-16 09:05:51,614] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 3 +15: [2023-03-16 09:05:51,617] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 127 + 8: [2023-03-16 09:05:51,622] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 71 +15: [2023-03-16 09:05:51,622] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 123 +15: [2023-03-16 09:05:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. +15: [2023-03-16 09:05:51,627] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 126 +10: [2023-03-16 09:05:51,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. + 6: [2023-03-16 09:05:51,629] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 48 +10: [2023-03-16 09:05:51,630] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 83 + 1: [2023-03-16 09:05:51,631] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 13 +12: [2023-03-16 09:05:51,631] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 97 +12: [2023-03-16 09:05:51,632] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 96 + 5: [2023-03-16 09:05:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +10: [2023-03-16 09:05:51,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:51,633] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 45 +10: [2023-03-16 09:05:51,634] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 81 +10: [2023-03-16 09:05:51,634] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 82 + 5: [2023-03-16 09:05:51,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. + 5: [2023-03-16 09:05:51,640] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 47 + 7: [2023-03-16 09:05:51,646] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 57 + 6: [2023-03-16 09:05:51,647] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 53 + 2: [2023-03-16 09:05:51,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:51,650] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 18 +13: [2023-03-16 09:05:51,653] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 110 + 9: [2023-03-16 09:05:51,656] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 76 + 6: [2023-03-16 09:05:51,657] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 55 + 2: [2023-03-16 09:05:51,657] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 17 + 2: [2023-03-16 09:05:51,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b58b8400m/global_step16765/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. + 2: [2023-03-16 09:05:51,659] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 22 + 5: [2023-03-16 09:05:51,664] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 42 + 0: [2023-03-16 09:05:51,665] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 1 +15: [2023-03-16 09:05:51,667] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 125 + 1: [2023-03-16 09:05:51,669] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 10 +13: [2023-03-16 09:05:51,673] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 105 +13: [2023-03-16 09:05:51,675] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 108 +10: [2023-03-16 09:05:51,678] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 81 + 0: [2023-03-16 09:05:51,678] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 7 +13: [2023-03-16 09:05:51,685] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 104 + 3: [2023-03-16 09:05:51,688] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 25 + 2: [2023-03-16 09:05:51,694] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 18 +10: [2023-03-16 09:05:51,695] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 85 +11: [2023-03-16 09:05:51,698] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 95 +13: [2023-03-16 09:05:51,702] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 111 + 5: [2023-03-16 09:05:51,702] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 40 + 2: [2023-03-16 09:05:51,706] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 22 +10: [2023-03-16 09:05:51,706] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 87 +10: [2023-03-16 09:05:51,706] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 83 +12: [2023-03-16 09:05:51,713] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 102 +10: [2023-03-16 09:05:51,717] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 84 +15: [2023-03-16 09:05:51,721] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 126 + 1: [2023-03-16 09:05:51,721] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 8 + 5: [2023-03-16 09:05:51,746] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 44 + 3: [2023-03-16 09:05:51,750] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 30 + 8: [2023-03-16 09:05:51,763] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 67 + 5: [2023-03-16 09:05:51,769] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 47 + 0: [2023-03-16 09:05:51,790] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 5 + 2: [2023-03-16 09:05:51,792] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 23 +14: [2023-03-16 09:05:51,801] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 114 + 2: [2023-03-16 09:05:51,805] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 16 + 0: [2023-03-16 09:05:51,813] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 4 +14: [2023-03-16 09:05:51,818] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 117 +10: [2023-03-16 09:05:51,824] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 86 +15: [2023-03-16 09:05:51,825] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 127 + 8: [2023-03-16 09:05:51,831] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 68 + 5: [2023-03-16 09:05:51,856] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 45 + 8: [2023-03-16 09:05:51,885] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 65 + 0: [2023-03-16 09:05:51,891] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 3 +14: [2023-03-16 09:05:51,928] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 113 +14: [2023-03-16 09:05:51,929] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 119 + 2: [2023-03-16 09:05:51,986] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 19 + 0: successfully loaded checkpoint from checkpoints_1b58b8400m at iteration 0 +15: time (ms) | load-checkpoint: 11238.31 + 0: estimated model parameters: 1.517252608 + 0: estimated model parameters without embeddings: 1.410035712 + 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 09:05:52 + 0: > building train, validation, and test datasets ... + 0: > datasets target sizes (minimum size): + 0: train: 4291992 + 0: validation: 429209600 + 0: test: 12800 + 0: > building train, validation, and test datasets for GPT ... + 0: > building dataset index ... + 0: reading sizes... + 0: reading pointers... + 0: reading document index... + 0: creating numpy buffer of mmap... + 0: creating memory view of numpy buffer... + 0: > finished creating indexed dataset in 0.008630 seconds + 0: number of documents: 208931 + 0: > dataset split: + 0: train: + 0: document indices in [0, 208931) total of 208931 documents + 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_4291992ns_2048sl_1234s_doc_idx.npy + 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_4291992ns_2048sl_1234s_sample_idx.npy + 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_4291992ns_2048sl_1234s_shuffle_idx.npy + 0: loaded indexed file in 0.091 seconds + 0: total number of samples: 4294830 + 0: total number of epochs: 88 + 0: > building dataset index ... + 0: reading sizes... + 0: reading pointers... + 0: reading document index... + 0: creating numpy buffer of mmap... + 0: creating memory view of numpy buffer... + 0: > finished creating indexed dataset in 0.040888 seconds + 0: number of documents: 364608 + 0: > dataset split: + 0: validation: + 0: document indices in [0, 364608) total of 364608 documents + 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_429209600ns_2048sl_1234s_doc_idx.npy + 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_429209600ns_2048sl_1234s_sample_idx.npy + 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_429209600ns_2048sl_1234s_shuffle_idx.npy + 0: loaded indexed file in 0.089 seconds + 0: total number of samples: 429221309 + 0: total number of epochs: 5051 + 0: > finished creating GPT datasets ... + 0: [after dataloaders are built] datetime: 2023-03-16 09:06:07 + 0: done with setup ... + 0: training ... +15: time (ms) | model-and-optimizer-setup: 33408.58 | train/valid/test-data-iterators-setup: 14333.34 + 0: [after training is done] datetime: 2023-03-16 09:06:07 +15: ----------------------------------------------------------------------------------------------------------------- +15: validation loss at the end of training for val data | lm loss value: 3.479142E+00 | lm loss PPL: 3.243187E+01 | +15: ----------------------------------------------------------------------------------------------------------------- +END 3319359: Thu 16 Mar 2023 09:06:32 AM EET diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5a497ffe669058eebb1aa1155ed587bf10fb28d --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b9f487d76b5409e63a4dc8afc52458bc10539078e81ed6657292bc2321b82bd +size 71125719 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..28e21f33dfda7d97e302b6b19a45dc6ac236e207 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:350519e526144af5e243ce6e00b29d07ce29cfc770c296858ad5f62073877229 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..004ee367964a28c6dd2b6f72cc355c9c4905b796 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c1dda8c6789576fe57bccbbf5a9fc948d9c8b7f18b550d9ecf81147e3863d18c +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a476519f3c6c79582953cb66dd9aa71a4ee1cf91 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e8f641e35200eacbc09581fa77c20a2420e043f1731f8a5b5de4cb3491ec6141 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..581d114a05458ebf6ba37fc0d430702f8bf0fa8c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a48b25784e98e371e91d42d41f2fc7cdfece623489990c2ee5110fab63dafeca +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bd896e9106f1f71aaca2ab46ef424cfca4b8435 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f73523d178cdb34b8bc464386a25b01904df0b442dff721a99644f11895da136 +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1a0179455b30547c45ec2a21e461b678aadba139 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:efb23d8a9202a6d8fa31c966f76fd46b7e2dac9bacd39ab6b3395b09ac5be4d0 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cf21ed4df7298da52905cfa7c2d664ba1f8bbb6f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3b00329b66d66a4d85c84bb46637523feb6c4f2842bb72730fec82998789527 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe95da24a17575977119f9754ddcc355f4e47f28 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e4451036592f919c85a175848787a5a80067c1c049019e7ec1c1bba8cf59f6e1 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9901dc2d187a8b390295c829c6f3e9ab85f75b05 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b619d365b5852024074f57735d1d24a2df40dd4e0b25bfc0a00eaf51418a26ba +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76a2715b7e6b151e5069bb03b2a144c589f47d8e --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48916169222b6d01284a55fce69e69443cb802b5e5980e3cc04ceb05f06c528b +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d05107da791d6195c621792e73ee11e4b60c4852 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff861d1d6ab5f954e2b7c4b5271e6bc103f3a4680dc9e6c15d9e530fcd9e0d69 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2de3a6dc82527e7ce2bc696a983b49c9eead50b1 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:773241241356d1b62c77c1924484cc53c5367b68c23c01a87370fc0abc4a5c17 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f34b57d9b6d5a8e2ed271f2f92c0caf78c180b64 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a196d7bbd2aba70fae5678a068bea9c2f1be95856b6fd8af913a88f98e10cba2 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f02b21e68495eb3508ed1dfb0fed4ed08ac4476 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5bee194579b4e2d84be38306df5738db10ae5616494cbd47ccd1f3d22839d30 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a7e4fe7d6d9c16710dd4da0ec46b2ae4fa1c09b8 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7851147c49c720eb398c588782e59e865bfa5aadafc1143c81bba1cba06d8563 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ed82414470b4c3a7ba3ec5e092c579a54445b965 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5b5e70b49827f5c6643daf39a3b974c948387420deacb898b10f2761a8179e4b +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e327d7872ddc0efcbed116dcc6f08b151d9dc02e --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f68a37f615d9c1ba708d2725a483512bf5493faafe08762e552323c49031aca1 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1a7b4b2eadd7585c779281056ec92381e3dceedd --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed2ee43b5ab757e3b0d564b9b3326edd3dba6daf31007c4ce0979ac4840e1c79 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d5bce4bed3e59fe04e75f40a5ce297a82b25988 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35e4481e2967a3d850f4e8a0729cc6dd73b14f2652030eea88b3445529bf66ee +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3b07d27ba2cfaa2c22a89bc700c5996b7b5c8a51 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:86a3ffff540e7edc3a87a277213c1a07009ef96ee279c1a6ccb711f1dfa4767d +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..afc33ac16fe864fa78b0aa611ef3f533890f8f74 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d8c3c96151c7c8b3bb266a8501c76b190ed689888e472c3d318dfdc513854ca +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7ddb1e378a7a4d48e433141fd66b25f7caf7053 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6b4d36046661ef0a51ca87050ac6a9da6b05095ae3605e91afb5684affbe459 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1cd2374673c04a654c151a47d8986ff32d706b5a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:060090411549df36eb37a0860713e06e3f5869cc71a11784178f14df166665d8 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..25e464a10d61773422221e735b401100e9201c6f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fda4293344188c3f2d3049b819ce5fa0671e53531390b967e6820425ec199b1e +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a4a1c42ae2fbdb9f662c2d5be387c1b168139d19 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54ce864aa96538f6f0b2a69844d25f7f3c9eab456bf1683e5cec79aae16f50de +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a119a6b502dc1e1351090780c879baf58efc2f7f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85d011384714774ebefb3c630280001bb301c0bfd9dc9458f6a4f9a0c56f9f99 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..699d0aebea9dc7473ec6a4ec79a7d5cab955c2bc --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bce657a455c64a429c363b4ed72e51d8666aeaa6bd1120aff90d186303f0707 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..316f6668a6e8ec20a3813098e749a489e4bb76da --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a704b394588b979bd12dd14e55c61d328ace34c18bac944c1ac4e077069c1de +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e03471d90af67d5753bc30f1b14e81349237d66 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:13accd5903d79f283dc4e52445b33607c0dbe113afc27a08ca8b77ee37246897 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9efee465b0ff59f6091379176502ad9f374d38b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b72086f5c92405caf8eec7c513918c4e013c51de7389fee429eeab4e61bf96a +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..43652de4da9ba980b5d398512f5b8543f4678aa0 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:01546987e621a71239c16784f14ee5e6e08bd41310b2f3fe34f4aaeebee6fd21 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..be90712b6d12e0582111e763789bc3a238cade0b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ad2783a39d1e26aa2b12deee9ab4c2b9b58e23c7323c85d4f93c95bbc8edcc99 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f2f2423600a94713e0f7d9533918fca55e49539f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:44c6d2b010609f0d3b89c8dad4fbabb1df7c231c93e0e8fd20ae4bb870ad8c43 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f68e39a2dd76fc979339fab74d0b0307189e778b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21a651f9f91366b82d823f9876add63cc43c769731a8bf3e51338fccde0d28f6 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d54ae2781594cdffc745eff788f8307adddc4d8 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c74f88d6c0adeffbfd0c1c99555daec982721696c724494791f4f870a2d5df4 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3d0f244dfdf3cd4b84323ba3dcabcf2a028aafce --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f28ccea5396c1144478293070acb8653f5abadb6625a7e8130269f232f343d6 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..25a3b87eae15314e7ff5406a5292f41a513e8270 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97d31c61a83b999763ee69b600ad6837136397a7eea6279ced1202aa9ed0afb7 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aab3cfab62406b93ae0dba42fcc26bb1a1d661be --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d6404c1ebf208ae2d062a9c4cf7c11d196e8c4f2763ca520f4afc31c1ea87aa +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a5700248ddd8ced8c0bd1d238f7e2516878b7c99 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aaee6bdb0ba4fd2755113bf4cc0eb6ac8b95021bb5498a316cdc4344019da8ab +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c635aa587925c5ae6d258e767bf7f8c9991d7f24 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5980d5d17f7ef764c29f3b0cea12cdc1abd83ff0e78a4de099c8ec108802abd8 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a30036876e6b4688f1dadcaf59eab20b53148a35 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5519a64746585cd641115cebd077558393e0c09e181ccd1a26942be1604958d +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2cfc85a60ef5dfbe7992e02d4e175b844419804b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc9d71c11613b40c0916f85b1055a0174db1d7ff2b34d29e7a3d2095cc4de71b +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8bc0da2d6fdb04d5b5efd9483275403dd06e79a4 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8de47504c88c1af01f502bf23e1e3dfc0da9706a4dfac32162acbbeee8084785 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1e9ad8a6690a9c18e46cbf0671b31169d92ec100 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8da821e86adf865eb7af8f0750cb7d6547fe4df95b2484724ffc968ffab305f7 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cff872cd05f1b063e819f14456d131c24d429e67 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c7a23fd67e0a4659180ca9b9e30ec30d90e005344346cf20f51ea7ac8a9defbb +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c3abd4e356a9d4572e9a29b6d0af612ea4ca20d8 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:90fc17281e1fb19c1d869a31ec6decfea465f6e611dc177ae0727cc996e087b6 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f8bbb8fdb457c00b515b8aa9aa742843b56f2185 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4439558f84ece571a3e35e6f0feca4b42418b6084490aa551627c6328939e28f +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ab0d214e9b1a7841949a578f4f08c4fcfd005589 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f691150e951101a9c9d35b6a5c12cf116f343b59d43bb6094fa28134652cd43c +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ef89b0775786afa2bb5239abe4914f66736f3c82 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:31e60fec4459072961042bf0308938346b9fa3741a579ffad3fc9f7228e06aba +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3b743c66ad88daa3755483cf178491bc10ac8f05 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ea14cdaea0b510f40db5bae3eec72fd1e0e530e4a9eaa00ebf60685a2a32289b +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a8529e78a5b2054604bf9d69acf4097002d29511 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0c1facf711d2803270cd36746c291b1dcee92e6dc1f5170fa17b738556f7282 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..558cc239675b8deb5f8b9a28dfffaf139b39d3cb --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:75ba6c9d88b0eb717578e1838da994c3e75e24ad6f7af10e640c03c0e2a5dfa8 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5245535904abf31c7d9bae8080b7f923219fb4a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f135aa7416c77eb79d636d07170cac185d372f40e410f2c9daff03fc3f62f4a +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..33d2deaf440f3462d5bf54bdb90d6cf9b55fe150 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:89c4ee51d65906f5ded434169e28ba57e0562e28fc24f31a7e7a2a2822e4751e +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f5923c03ea31bfd0906d8ae2c8ebe987b33e1be2 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f08cd7b33fcba75535fddc36d1b2de677f73f4021b9a526912428b343cfb4c5f +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9904ec83cf07985b796d95dde313624a220fc04c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:564205f685ffdc08558c15bbf31946c8cca79c5296e1dbdc0359a47a5469e8b4 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6d6efabd6c01c560bfffdbedab773f8be59b82a7 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56b5ef97535f129e009b3007add704ab0923c21b118ef38eba6df866bba4e090 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5bd4c7d35ab90cc6702d0ea833b005425d5114de --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ae0d8e2f5f4300e590ec1498097bfa68f6639b40bbb76502f23d0c7561a7794 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4d4b33deb2035cef78f50068b2acf3adf2a14a5b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3591770de4ceaadf71c8ee96b9662f4a2a046bc8a4e4e8c07379f15d72487ad0 +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dd55c6725fcbd1de0a158f7d5bd0b9971faf074c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7506bdf529e2927ede1e6cc3ee044a275bfe0658400101c733fbdc446e300e90 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5683d4cecb03148374314e6fd9e635fac826f31b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2f9c7884defe1e5aa4c2c8a41d099fd82caed0f811a1949d621487463e9d1a5 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c1eac56743e90b2eb74af49c257da794e9f7c7c6 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4bcc5ae280016f1de3ad0daf4cd0aa920d7036e7caeeb6a94a4acb7a13f8cf9c +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..700c698d7eda1d6563db6e83b5c1604fd26e0b11 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:77df447dfdbc60e46b3d97a0cce886afd47df69061549a68fe4f1fe081db8c9d +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4c90fac0d32cf641068f35e0c3ad26755f24fda2 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:427bcf402020c7132254a8eeb10ffab3114ff4a37f54a74080f4b9dc4cbdd250 +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..953bc14203fa46a7c044b4d988db5cce0fa2d337 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5267da2ff02b87009975c604a9bedf597cbc6226778a20ebc73ea858e05fd48 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4adf06137c4a53263bc5f99754aacbfd39b81eb7 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5b1e27870907436a404547d6d388aff20c4511a60aeee8f57b4a046d71580eae +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2327cb74c20795a5336d61ccf2d31039a07fcc70 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1e5665189425d04bd268e953fb2e29b98411c43bffaa89c1f73552ff0b43b9fc +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bc801b487b4a13e3949d6816f75c8be702bf3eea --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f7ffa8082fa7a333520f53026f63c8fdccf3900922d897774125944b07388a9d +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dda6385b73cc680c8a1f4f167e36087fe64732fc --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9dce54b25082cd4d1beb4517be1d68bcfbf4ff168260089cbdc0d1160308102 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1d51723e23025fdb7496aca4e58638eb5f7f20ad --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19ca9ea1766a71fe0f7ce44bc7e949580708fd751be56fc9a8ef24fa428c76ce +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8963fa66e1680e0de1ad9434a5ed52f59fcac272 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a176a81f832ffd540333b802544cc8a5d5100de8416ac20eb468bf067d4ee1b2 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a6b3574849e1292d176039f619994fbbbb6105fb --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5ba5828f401356f7f97a17d9a09911e3d96140b54149d516d2b64de2da5d2731 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..829394bb179a05faeee2ada1020bd14fdd14dcd5 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97f08b091c7ea4a37391a8d4b3d3663f8289c293d9077ed73ab1133803ff0713 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7234ce97a15539fd90f0f8fa10cf15e706ece960 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8fd86aa95c1822dbf1895a8f374ea3f05ab63028a263562f2f06efea878971bd +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a252bf642f0aab807492580d2a5ec649a4b57298 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:773d6e7698b220cfdbefc2e979a036fa6cdf4f62c20b92014195f4c5f08953da +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..84169527a81a71983ef9b980cb0057f049aa73c5 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8329aa2d0eba33d965f1d6bbbb6789393a049ff20253a2308f45087a051485f1 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bf2f538c8bc459f84681b04e56d92b1248e1e9a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f2fd5d753b6405622e9e8250b2acdd0de36857b50854ef2b586805bf0eb0a6c2 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5554468b4cde633117695913fa4a2629ef37f764 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e4a12a447fbed08a8bdd1a44a296fa7e58a9459cb2a3e0a05099a5e8bc12cff5 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d34d8f44988f4ba836d56ca433a2f2ed30b7d375 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a2baa82728511015d310f678616c1da7e3ea413539079e1eb3a24e3c286c33a7 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4b219bf39c76f55248d4558f4b8b847d7fd896b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b05c290dc3baf92db5577c80e1a41acac455c578e4d5a2bf780f078b44c2d635 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a7e56f796624ff09562dedd3883fad0c719708e8 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f34b267a49d49c16b652f7625d11e7d767cc5921cb4d5c5c63db0418b62118e +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..48c340ee48b2882de07b21b0c09a1a3a10c2af8c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e58c90e2d27049bc405b5ad82844c561739245ebde0411f6b28bd1f7d697e4c0 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8e9b6ce05fda075b44db2f3cd3347de9b573973e --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:699aea0d4ce26106f2172ac2c893b3d58d16152f6b9c096ba31f23a44618e0c4 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d62a848dd46ed9fbf81fef312a88a77db525f2b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2280b7827bb76075e26f322abf056a100002f041425d7580782d558ec7b7dfd3 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5fc2ff8c049928bd97752faf667e2c393e4aabcb --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7c1a72b4a502d1f0295736854e30a8839ecd1dbc5a0f55658930d29861e33b91 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..24ab049813636b4a9148092635e3adb613b20068 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b6ef20c65dc6d2b7de6443638a331465d577e64ea408c52b36f442d22bc989f +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b62e92f3a025da0435dbf7df31bfdf452c2d8c5 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba6d719ab9ffb2e4c7274d70aaf43a01bd9fd364563c2ad454090f5b04c278d3 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a6e8612622f46cf3d3acd1aea981f76929dff0a7 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:392dd8b86484409871400265a0fc3dd40dca15f9b7d10d68ae7e6a1bf0e3a4cf +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..470cd77aef5ed0e23f858be5fed53074a8fb5c19 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2c0b5be0eb6498a81d508447f38e6d80cc51eeb67d59cfe015badfd4650948f7 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a9f49da16cc7dbf0a38e9161514af2a777f2628a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:28d5beb9016701fd36af58f211a1640d6993aa83909cd15b7640459a8cd3d34b +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ce48cc34fc4290eede2aa94306d1b5861b24c4f8 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:283e22c8a0b951c5e52376c165fb815204123bdaaf4d9fdf84175a4fa859dbca +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c196a0690084c4113987d4737810473b785b0a4c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f300fe052a0157875adadf14b1bbe321337798550c06a8b53ce1c4b5034e818 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..895614dfaec2f8c40d65a86e2b73f64a1eeb1b0f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a1023d57ce53c8788c5db3e20d0706a1e3dd0b4b105926dddb93c79358d0013 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8dc8dddc5d3fdf3efe1e42dbd76541687d4d8e53 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe2a5dc2c082f4530768b28332107b7ebbad08c56844c40cc7b39eaf489ded0a +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..516ee721afda98bb07ffb001df9acf6b8d2ff1bf --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d0666197958f19db04557ec0c4d3bf173d645e445317a8763f1de9a95d0cc76 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..310c6b9efc639870275a79f0406d4067c7268ffa --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:52257a3859a6d099ec6274ac1996d9475dc22befe5a70d5aece41d8acd1487a6 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dbaad320f9b203f835b081cda2c05fe9cb2d5173 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb1243f151cfed78b91b1a1be0e3a68a6bff1f411bb4b7e038aa2600c46043ad +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..951c76c9ae7cf29aa3c9c06403714a3f0dcc7125 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c61b353da2cba0398647224005fc2eccdd66463c992eace03db72cda977e351 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9efc6376a160113ddd82a27c640b6e4c7a45d585 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ae8ee33af7631e9ceb000e417327ba4da71caddf8894bcbb55ed2f7c2f3b5de +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7dc6d117cfbeb9835baec3ead0570d8d580c9712 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72db7e2910dc41e8e18d2dc38671611975c96f2bf6f64eae627db942ef2667a2 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1d12a6ddcfe74f3c1e94c3a1f4a6f1fe5fe76ba9 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63bb73dc225fcd2d6cb87d884bc36c4b2cb8938bb15929f80618141de344e7e5 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e38b3e0efc5c2a91421e3d3615ba33a8ee2d25be --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:950cbf7bea810d89979f7d8cc0d9ea21983dcdb2bac569f2a9f72da621b441dd +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..52b67c935420be66d0360a980414194816608e4d --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3038413c11ab07f3c93b000d4b7c8ff470862606eac57c4418c0bd81d0bbf9bc +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d214be76a5053a5de8010d9499ad8df63e8ec3b0 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b922b2d83f722fd743c8fbca83690850dd469cd9c19ebf766151e93c1814c3a7 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..17ddbb3263e54b8e07a8cb818dcaab5f8b3c142e --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1588b8fc7fa4f2aaf5e4504edf8083ceb89666e28f46f161a11651dabd4b9b1b +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f00fb7734357e16461ce610241a3f3ab1de95a3 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9b986050dc7fdfe881d1a8421d691ea58b5afe998951080a9deda826b0430d1c +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d6a55b0cb5668d45a726ead06d5276f7a2424405 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f54b16bc39ba9538b5d52cbf888dff208bf3d69dd34d6def697645cebcda9830 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2702215cc6c691e736c4ce081147bd702245105b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:595d168c7477579be17ed849063e4bc679210bccfb909465b78efa57e3e79120 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..546500a1870791104847ffdfa2c39869397ce404 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1f71a12fff9a9a1f565199a37d94a94935bf7fa1a4dc3a1f03c0ea6b4eec33c +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c033003764507c0ffa98971ca0f29107203a3e32 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ade4a3a948aa01957f7ba7c086aecb09dc879e9560ba1dd4280ebe8ecaafa3ed +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..62072a34eb4ff2a53289ef39b39b7b489189385a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9d06d180fecdf0d95634716995fe0cb3a50e78e0d980b69f884fa3f1466daa4 +size 71125719 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e2a031012dc1a0b8d68eaed0b923cc6029999bb9 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d846fa5d7ec301640e0dd47afa6a3dc147889d6af1764268503cd7fb27eadc3c +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..346c6be9ff1e3ae71a292f0f6d80d67ff27cfa59 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1f20b596483bc912a6a7f6db0e5e257a1e70cb0d2338d0becc4066d811248c8 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d9a9a0798afe65bbfc5d66dd7142c7851e02af8c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bcef7e3cc4e22f9afbb22849a502f48485afd3b54bdbe0e571173945e770e7b6 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..36d2e3cf6b033ad8e99b539c7031300bae297576 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:affdd4c6c148344be17d743371d4e6fbd1dceb5d79e18e7367845890f17de766 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..88c81ff8fb46fd68076a55efea0b3d8d0e1d8f84 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9dd30dce030561e7077cf054785c8eb040fb4cf7ecbdb726ea88a9c084572566 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..10944cc906dc7c5238675e5556fe700dcb44987e --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c336cd5d6579723ddc40c896fbe943936e5dc0dd90a6e733276794ac504f7f3 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e1f8b2b92fbd8b995743f64df9d23e9b8c30d65a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:601da6dddfadb03d0db9b18e67379d85633d320bd32c83ed426d6c26320116e3 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..00555d0e12bccf78de36c7c838fe1ec778e1f63d --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43f0baeafb3acf7d1ffd661e6dafe6e12f4d29002e58f9fda0ffb7172aaeadc7 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d68ff2e482a26d27dd01e8f749bfad83b8e56b71 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed696996baaaf6db28f2922e97782ec264011feea59549f3db5dea6bc1553d86 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c4922167eaddd7aa5ef29880ddbe06d6d881f291 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:799d5f73607e4cd7a8c4f07a9c3d74020adee92d0a7f9193c60f7ba8e03f9a64 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4d147feaa7406c4c7de05a7f35e5d3301a3a5d88 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fdb87cdc827b00f20c3d5aee625b51abff6974d501668c37b25633246ace4b11 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d74672ea4857bdbcbedaf51a240d9146bd25674e --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:474461bbcd6bb65dcfee3fda91d65a702ef9c8e87a2d49fbc9ee770f39fa94fd +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c8ffd9c026b3cfc4446664b07c9873d1f963518 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ac46bb5101015dac6761854fa00e5878e02582d55e9c17009652620bd53c677f +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e1ce3cb5f9258ccdae6f286b1116590360649978 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f9f0da3c004d2156b95adbac26c142fa1657bec8f0508ff2c8fa4fde2069a10 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bd512be408a0835834e7623de2d0cb6d370228f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1098904c2fc0ae501a1b55d2fadca6639b1200aac44f4e289658d76e986d6b70 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bfa341cef5aec12851e8e1802c4d0d96f1892f79 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a833e22cd346b41b7ccb7084cf3c8892f57e84ba0a986c9cc93ada04b378d5f +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b85a7eb20f18fff83a4d1161116c8699ffe3353d --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:716a99775fa3d5a64b2ac8d8c80e946410e05d4c2570c2e2ce86dfc7f9f21f1f +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e0d3b8eacae3811b352380a6142180625e77d0f0 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8e376aa38785a5c1544fde7808c6e390e1ed12b93298d387ffbdecb648cf4c5c +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8ee129f87fff5c34f95e4c0062bf43b3b0e6072d --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf1a0d282e32d8b09be4e12c3284479693ec9a01e74d38d0ca336f84391f3f7a +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4178d5f430d6e6d0d47fd063104d16caefa314b1 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:583ee07e71633dab4163d8286806e980748c037443cd019aa3cee13b15a11a49 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ade9e836448aaa03db9c0de73e6f78eee42880b0 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1913a27a0e2bb6d22ca943dd0a5eedd4d33813ceaf3d64ef7ea942432336ba19 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d521240777b3ef901834a8619fa6be694c949efe --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4aea90cf565182173ce54d61d30b45fa995f5ed953cba534bfe1606805fc66c3 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8fab3f262e2f5ce0879d364a207a490c3055d34c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97517485054f7e0fad41619f416ce864770197ec66e44d186d69ff2b4da4d956 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3a997a4fbdea574cf21eafaa46f592b8bc715ca8 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2fdfaf0a22d50b697be1d4d8824a971f1f9b1d21a753ad1854be62b61f149ab6 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..90a01e1f00476cf7a693996bcb0e0efd697e9c08 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e90171d7440b495ee5f3a9905de9321962c7cb8d1f03f402a745cd79856cb3ac +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..645152271e982d5cda0321391165191caeb4cb26 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:431777241654a74c450bfeabeef493d2679b05435c69de4cc49e0fc1db6b36b2 +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0c266f07687724d385611c8daba8ac8f10307cb --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9020a124b58498c7b0236ceaf1022ed6659c220834d8afda43bf44427f0c8344 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0ec54fe9e56ad3ef0ec3d914948068da30db94d8 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc37efa4186b139559e61c9b90ca246f5ae46b239333d25f4b59a96f979a991b +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ff94de6dd2b6ff8efab1421ff8b58128303447cf --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e2d8b339d0621c6da38470442da4a2155f586d1db4c91668252b4e985450cd9 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7d090647f89cbf04a8ceb3cf8a2adb222840b293 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5b3ba89f79a81b81acd8e35d17ff9efaad82587cb7bcd630b2ba6f1a4959422 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e16dfb767800b2820baa11b161e31ea41a9f231 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33d99d5ee14a9c60f44b4e7c9387af2819092e092ff6c1c2cc50508b1967cb36 +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..320d5ded4885219ab0b28af8cd1a4394cf8e8ae7 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d0038f1d48f8f6949e3de5bc5e6edea1808bcc259f3ef7b60d426380b1e4563 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f0caba247909a28a4b336c7b4083edca521b5c00 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4388e90ff0762aab760fe66caae6df4589f732adfaa9833e51dcb6497184ee3b +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..38afa2d80963e3a10e50d281a7f68418a4318f86 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d35c29237746cc18839632dd6d33d98745bfa5e990cd71af587958ac8c8e560 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d2941c8b96fdf10e8a188cb7aca1acf513cf97af --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91bd383fa44efc7093e214bd7bfb04960bde791d36a686090f1be69fd0cd10e4 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c51a4ecb15cc3b6b9f48dda13929fb08af7c11f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5a1392cf55b6ab9713ca7d262af2404e1800d3efb1955e72eea7b7c2c9f1310 +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6cb2b988ddd42dcea74c339c4c1188a25d145f51 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0af04e405ebc884b576eaa83f95f94cb8ed020be41b3f5a94ce219ac2a0ae580 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3b7433943399024cc2bd8f8f513b1a3db1734115 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3fde36a939810e5dcc36c333f2a5f457e929c8c8bf6512cf02340d387357b7ce +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d3b853172768f3c9dfc6e65e679b7545294403af --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a8a19aa501e1c09a537d04cc1c245b6fb908aa03011aaa1254faab8575e3c5ce +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9f0f80062d05a42b10cc452832a7975d25b6f4fc --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d206dbaf475673c2205e8a77d15c3be8d3e231e11f36f482687196b54192c0b +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f1a3f06e06e09736fd46135f212426247842f79c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b42dc122d0a8d6a124c99b5b07cbc46f5a96e3abcad4c3b16da550549cc51fe2 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..91d535ca0b5f18bc4a1610bec66de802bbef4983 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:50ae7ec2c928b16c0d44f64282cbfa90b7a7fd019f94931bf218b96c38307916 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b79bbb6c60e1132ab968adb4480489a72440a59f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f43f9a6ef5663fae02df16daf61290f53315039425604fdd46205308b26d1ea +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..58d1ededab8c8657e2358f0d19dcb9a1732e4bb2 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:233897ec98d47ea0e287fb725f189e08bb4c8e3f540cd979fab3ebc814badb9a +size 71125858 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..14e4ebb13d06e08245db6c549f7c5597eed11448 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4dff6ae0ac5dd5ec2842ebf63eb3784744145e1a92af040493115c1538b372c8 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..92a0b53c0efb2a9ee565ccc407698c2ea27e7df7 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:414a86d41cb7c7e4b0f736f5d6eb18aa56154cbbd726c28e2b761f302365c66f +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..83dee9dc4a5d8b174248b2903dae70c40d5df488 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6028f8dca8a179146eea8171dd81a7b802034c410b948cf635b82b6f14ec3bc3 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d023106310f5677be30a1dd5513ba8699120286a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a498cdc113d72148a3ff84625f5fc2ce6b6d191592d7e32233b560cb285dd7a +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c4a97624cf0beb226e629f34aadb6c3bcf615b6a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:20f2e3666255f2fad2f24a0d1994dca5cc2f4e1f0f598701ef33836611062f8f +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f04a456cd2ea61065bd5ab99f71896701678c008 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c4e9009353b804f874bd09db045e28bda451c4d88c901fbe306808f2f7849a18 +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d0051e224df57be8368db195765b088d1c66d8ae --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c0c74112148079fcca56045bcfc60aae61e0defb1671501ba44c11910a96bbe +size 71125805 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..217c1c179e902ef3d778b06594d0f2250ef38ab4 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca883ed46315e8e63e02ff66711605965f14b579230cf3d4e6023de106d5c7e8 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..21b714ab7bebe1effb568c7de8539d64d62ab0ba --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:73d6726d951842b492a4f868c390fe646950e4b8b0b607e4c2f2a1e1422420de +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4e86e4bcbb99d4189c4f9636abbf88a9deee5c75 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:243d63afd2e4d2390f9fcb5757f93ef5741db661e5d4b500b3895f74ff40f5d3 +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..04a5babb461a5e13f44afa63601b977ba41eb43f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:40e78e349894bd6f4be6ca056489140d865d4a5582e8f21e51a0ddbc52ec1494 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..641d50b0be69aa5460049f9ee5edeb7940e3a189 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9ec5574a6917f9bfe39b3b587d9e80eed7a82aeb38850ea0dd080ed1963a2be +size 71125869 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..15971e88183154b1e2a67fd1d4b6e8f4572c4739 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:64a37145390672fd30799f19356ce5106f7f5fe2444a7a627613bb563c2fedf5 +size 71125741 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3a86b51c3024cde90c8320fed1eb618d97c74a38 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:388132504a86656ac0ff716567df1e2540605a88efdc9a1e95b23750c170853c +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a525a140e8233b87589e7010c9e0ddee53d4924 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:603bd13b0e04f5a839335f4f628d073ae96a8692afcf98161ae859c4e808e5d4 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7b617e7a996117c29c0c1e323081daafd7f2ebc9 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:008b3a2df418a5c8fbab27b49d899b49269a18e2c68a19187cf8364871abd4b6 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..88fdfe2c04a8fe8457ae90c4fcf17668e991f7c0 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bcaa85c5f4e0c4873fb66b40623216b8a6f086e0a8d8f31a5b5edca625ddafe3 +size 71125677 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb451d18bd68b599a8334449514294e9d0038a6b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b9ec26c24cebe5461eecf6ebc20207ba15e041266765fba7942d5511018a4076 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..373aa7005733491fe73be5d5b8a6e1d3154d1d7c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:36fb5c446fbf42ffe305014a02cf497ecbc3a45791f553d60198ed584c5fb667 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bdbc1e1836e3b64521d56213dd09b21e326be125 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:004de645d6c63b490b41c372d387814328c2ebc995e2a116f40636dfb573e222 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f0a16cff2c129e92d5e40228fb32f9c03a671c82 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe697a2ef945f2452a22c51e6ce5a6c2af8cadf817a90a5e5c629e47b1ef68ff +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bcee9da5d6b23463d262b054a21c1f0eeac3b0e4 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6640c02afb907742727bd856974960f05fdb25116df560d35d41d4dd0f539790 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..96a4b3abf543dd4f2cfa2291861ae0b538b077da --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b232ecbcfd06d780b9519dc2b5d78e2f9d66dea039b8ace08acacad74669768f +size 71125655 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..991334e3b141d962b978541702cc55edd23a1a18 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d7ce399df7fa6c8c6cbbbf26583af2dece2207bdefaeef32e3a16c15b06686d +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..948eb882f301529b93f9ee45f3f88b52d7bc683a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:968af8e38e3fc26f35a46ace6357f915185878460a6d4b28c7d74a1e5bb7ab85 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..150380d1b54be6882f343291a3e83e7accd7ab21 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c36face4252186ace093370f870059813ca714a3b92edbdddc040050b27e5d9 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1934ce5bd86ef30626bcb4d71630a0e7636b36ad --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:891fd9be1678804dfc74ad624d8c4dd299ff8b27c942693126e1d7fc8d6e2044 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cdff6672265307b2d455745df90b7055be995a5c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1fb2adcdf539596f36e25ce154ba9ef7abea679d501537a8af4b5b26ef4870ed +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8e13cd9051de5140b928af4e22b4d1b1da943e1c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0220832f11541155c7e2ec3887db7f97acea0dd195c63ff07980c41cb7560d8d +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e1f0790348a6f14e535e80f46856d8f3a4e52b4f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:81e9eb7bc29d5b3f29edc8aa351e2324a078c8820f31f7a828bca6969b6fa9b2 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9ebad864e0e72739b009e13272b664a79f0b7ce6 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7394479b6b36b44a764e07fca47272d25eca75d7687bf417ba0cf066f93a73fb +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6ca76b86e6aa7e0ad60742b0c319d19653d9df14 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0dc55985ec171b52e517a957ae0f808e54793594ed555170b1b10b9958cf7b7f +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4621368340332540be3a5218a2f2be8d534542cc --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:71598f4b325eeb0602235f85d2e4902742dd3f50cb59005d8f20c36549da78e8 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7a30f7c40fb4980c262797aefa461dc66efccc02 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:94b4a08708d36d4ed5fc3fe56d46e42e52df89d95d83efccdf94a6bbab4cdb7f +size 71125719 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fb4549aa0b0f98a96785a5734bcce23ba57e2302 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:309fce9828c5e39527e99de5dc31b4c0a23c8ed71a633302b31f003db291d75c +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ad18f28082be8938b4c82ec8bf9c11f80e564eaa --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:723de7bb661419d6cbbc8177a5aec9b0bb1d4c38bc211546e2f2d5504e6044ad +size 71125858 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9f80f707c1e13fb0a8cc0159fbc853204cce62d9 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37a0643a76d1882a893702f612395052324ce4180603a1155e0674f6fbb931b7 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c3fec5bda4a0651b349972dee8aa97615f9d2d43 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ec4c79e278c9ac711addb9d5a0ca0470d5184e380432c1e40ba0edfc1e45378 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..07f4f475943419fb5ffac8392a76d71fc9eef576 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2aa2f282dba5a973c862455d9dc8726a28f85839927ffa1ca08f0fa3073ca913 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4af3df56e7e5bc7e0833e70fb7b562c68356e279 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d54ea12ab90eb46e439c0a3047e2fd86d708c5c0ade05f5115bac82defa3aca +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d61b9eed640dbdb7f8128c2d347faf7853ed5e0 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b9d9fbe489a37cad4525c12e7fdb4a365d4901823e1b5a5aa064919dfc07c54 +size 71125858 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b1bff3771a67d9c65f56d57399951b9965dabaa7 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abb8a4b94f226cb00b5a5673f4a4c28a16eabe655dccabd52c3bb35e64fdcc8a +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7839f629a996e96736d99504563313c52b224fd7 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bcd0aa9ef107beb8d440b44099a83a71c7435a6e53669f96bba7f6893bf3ddf3 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57e41e2ac314bdaa6686d536ca85cd3f54515440 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fd70d83de8fcc72ea26163eae1cfee72d4334e18563f9213741d1eec83f2d9c7 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6ce8adcb439bf3c1418565230edbbae869098f5c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:afef42f789654c0628627a15f41f6916af00a2776dda42705d065bf6d0f1f3ae +size 71125783 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e8f4ab1581d5d088b1105582469fbd149d7a9c44 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1dae8dc6642ab84dc1ea5456127e8091873c2b7e6e58af192a07550607cffe4a +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..523f420fa9d65e1a1d8cd6e1c7bf6d2ba856aa2c --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9f2ef3b53f590ea852d73a3ccddb4b5b429cc00d8871a416c1b212e1907d7ae2 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8b669024f095c6bc72fed34d243eaa13bfc3baab --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a09f85967f6472eac617ac24f919928a78060e58b49203b6db44a0ee308778de +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc54baf067f80a4e80f4fd87583954f42e7d3645 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc183bd54675c0cc884e5c433acc5a25724494a0b069cfa450898698bedc586c +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1ddfd40a58ad00f2b6c4ba6eeda98d8c13f2e32d --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b33191090c1f16227703b18c0747bebac0fb46e3b751639b25b070d7e2f39f3d +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..935107b614a0602074894a7f4924b2fea5fabafc --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0a3ba212db9c481bf18545e4a6618975fdf8fe8e951591ddff3d83995845083 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..83bafaa3e373769aa701b3a43b972c3379370d75 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aad1bcd11910eb5faa53b2e7fe400fe91210bdfd86091b877e71eddd082b4225 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..79117d50eb34ed028474f942787e7dbe18b08eed --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:53a103de3a36f22fca7b0bd8ab90193189833f76ebeae42e13af0c4a81de6df2 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5f02586a4d1b5a34dd04e980cdf5d25002dab451 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82d35135b9a572c05b76193959b41b380f38c31796c993fd75416cd512c7f634 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5afd6547719a52870a999d4d7dc45c9a900de2f6 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d43da8d0d5ca5834f14ceee27234662a3a1e44fca5d8d7ee83f279a62df23eb0 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..59e4c5746e0f94ffc134ed1ad77745ebe4a1e681 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ad0913499d45664f85f1f13581aa4be7c7cb08f7a767fd80619a804dc1f6ff7 +size 71125783 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3d5f77e089e4556a2617c11f220811564198c347 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56fbdeb1dd85390cf6859829f6828975ffb6cb8c5c0afe592f3652bee7019736 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..670d159019eb9db77b9aea822f5a581d9b9a03d3 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bbc0bc8120dbd61fb2556d6ba53f27190814f1f28df04ed20a83e4f89377ae1 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c1db3bedef6d39ebeb8dda172aacc5fa6d634259 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5340528deaacad3cd17d9b7b6cc046d9d8a13819d5f365051fb301b32ed630cd +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8925f86765aaa93188b45a0d0e09eaa4c27d2b28 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85f2660c4249c8e15efa2c0ca9e616043c301d69c3b3bdcb77085210390f89b0 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..53c0f28281aaf11d08ec335bdbae5ce334d7675d --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a83d10b5efff6df445fe7ffa8d7a1957d17a00ca2a9818198525053dff2a0b90 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4f899dc9452de44e879c51a7ad9d6fb50dfa210e --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:55e64f8167cf53688987da5a1fa3c6015fd38acff5dfd582d49051169ca40f8e +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3c5f1f570df28b88fa0c58e808900c3213e1de5f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a21bd29c1e2c4c2c935798fa5a30a9e8e94281da06e8aab443b36e1dd097fa45 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4a4e9d19f270846430def736b1f024c7ba87558 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00329d78d559b7dd26a1bd62979fc9d3c9decf2b698848df2671f920f4140bca +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9706d6974d1e11a28d83bbb64e76607c2ec1f685 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ec3e06b4dfdd4aff5bc5fa84da00a063e968322bb1c4bc66bc0afbc8bdc8b192 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..618d96d0a30bd3fbf6def58184c5cd23b0e01c2f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:45d309670c76cdd6abe7943dcbc7bde74f7ce3f46d5352eb7b363942029b9459 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..119a98600b0c3d3cca0b094a66e747d2d1059747 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:50d1222c880bd2757e5eaf53ae3c0caa0c4e9018cce41a9f6685551399ea215a +size 71125655 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e27aa495f7c0a0f1e1f9c4e936d31dc34f58c1ed --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4e5372a8f82c7f35a5d78e231941bf04b69e3c6925992be8034298ad4c2d98a3 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4ded4c1d85ab39ab26c4552282c22b36bab5b027 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eab41f779e93277a425aaa888d477004896b24be17a05214fcb5eae8136ac62b +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..14013d5a4e74ab7b3cc95a725f72a79ea79a632a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84d0463410e67ae73a966c9ad93a9374d3694cdbff97b88e5062f8bebc7364a0 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..67de28370192242a2ac550ddce930aff886c6541 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef596923ab7bcc935ae0f4ce7b96ea90d85d202afb3e241f687c4556e5b5d99d +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b227e623fb667bee1f484fd9f496269c883f1bef --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b1a5866992c9fddc5664e4a505d9dd08528e575fcdfe6bfe501bb84133591b0b +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aa276b3e607f1dedabb66e68ccd60d9e2b3e9738 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8472b242ddec7a9ad385418c1385e6c4905f7d18db900f801fd38ce9dd8298c2 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..028321e9cfd81ac6ed3292f6fdd2cbbb5adb1b6d --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:07cbca631c3bbca0447e2eef3010141cdd72c518aac31cebfa50c4067d7d95cb +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..13ca51753240bb72f66eb028e18e0238310eb7ab --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:450e3dc1b8e0dc839569b174bb321873c68ed9600cb5c42f35c78d2c00608b03 +size 71125858 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dcafaede2d3d169b7bebdae4913cf1a513fc9284 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dfad706164e437131237bbd19eef8798c4c0e89a2c02755e6b683cf111bb4c04 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..28a6bef07df10d3626858af230526cbd74386e64 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2454161c7ac8777ac59511ca011ccde98ddd1fcb20159fc70c9dcc91862a498b +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0502c2ea9314e16b7f3fb899270e2cb73591d13a --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce44793cb5674d310ccc8d09eb3373778b311bde45259c29da1a475ce77e7010 +size 71125655 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..751c976f3a0fdc3b0113a34588745b4368ed6cea --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6e31e56fbf3036a31471cf1c1bc7af06acbe0160ebbd0e28065bff2e682403ec +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57ce220c60e2199a94410f158e50dcc7a301b74b --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:997839797448698b5c3c7933574fcfee152ed59dbe568dfa1061f010abdda404 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3054cc20393644d3a84e84a240ab48a98047cabb --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fcec2602fc03d2caba66c9313cbf128cb0c2d19ba74ca66dadc9716058db75bc +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c72d388ed8a516e24757120016da28a05d837378 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92c3f630dfda1c4524a3b53bf65d3724e594022e076f1e349ba3dee8dfd8987a +size 71125858 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..922790f3a6765e3be5c819bf7f1c0ce403cdd1e5 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:73acb67b0ad8c37b1505d1add888f4fc1c529d1c1c8b52cc77346b7bc92f2e2e +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6cd8413bc8d9c067fd890d5f69366ac0ea8190af --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c6cff9fe9bd14be48bb3c3282fbf0ee9859ce825da1c66122f40b6334d33d3d4 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..386c2987d241451ee9fd98a36b341b2007fc7d3f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dce9c442a43ce13af6978cecd9135e41e3da0db737c6704a6e6b091217b6cf62 +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..21c57121314ca4d6d565f738c75adef15ba955e2 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ad023360b28b5eee559f4f815789f64c5c7323881ee56a28e37edc052c70903b +size 71125794 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0ca890543e01fdd170db386cf2fba6b2eac8162f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b90c5846866369aa8556d78924baa20465abf65a3c61676f29c433c78989cdc +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..18d3d7f570c5b60e5e515bb02f8abc45735ce95f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7a73ec37ec44682755ebe9d8a59698398dabdfdc376845dd9b504e8682fe4350 +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76f383c17c96de165a3f8c47ef33c1483fa0d82f --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8205f67f5f73d59ac1bb5de78501f22045cd0290e6feddcb5ea78ec2e81598d8 +size 71125719 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7519507aa76422f355cfb0a402b9b7fa78405f17 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:431342eddba4bed0406a49a31e068c8f136603a2be8e9786fbb7d85335952def +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1b9c60ddda17182d5f630e1f2232a12250f6a30 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:89e5eed1ed9b79ade70cce44748d27bf08af25afcae41d5036acd6e93f64085d +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b46052ba3af9398a8b3ce17beff4f908cc0472db --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ccdcd1cb638ae471f7e3717e8f50d2d7dac95c242b8347cc6855ce0ae133265 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a31578a9b793018840f7b263b852d5086d31057d --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2baf45f005583e1188f6c4870a9af9fd124791705a6470af63df2b3c036fb56f +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5e6ec18bba6d6668115f384ba192984ff8584974 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e4f3388f530bab87ca10ca60c245dec0f8eed5d30649eebec83c8edc08d7bd6 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ba1708a006d814f6d354d7f93f03f710cb6dd9f9 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d2c7da8b821d150166de59fbe0f7626ec42e9936b5c9566efdcdded2bb09cfa +size 71125858 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e1ed3adf1a983736c6d7b06a0b65a802f575a721 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b317da84cfbb23826ae0a91bd04b9f824a8a7cec7e7a68cd76a5233837bb108 +size 71125730 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..426fa1a725b83cb09eef3aef5cd43b2e717a5086 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3ba6948d350deecc42be0eebfdbef9c5d1b8fb8ad1a70cc1383b5651a768a10e +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e64826ec95372b8627252667c25fe8c1c3ca3fe3 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6f1bf4c133b61441598faec4b15e2ca61cb7780feec1ce6bb36204d731d00ab +size 71125666 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fa36e81fa5360acde9f1cc93302027ef970af7dc --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4bd3b6f31909177add4003229192da1654b269300da10ad96de99e7833678a2d +size 71125858 diff --git a/1b58b8400m/global_step16765/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/1b58b8400m/global_step16765/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c805ebfbe8339bf9066813f328fa02d12fa00cf1 --- /dev/null +++ b/1b58b8400m/global_step16765/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:15de27c80ade439e256740ff9de0204e13d9413360571a11cf113e849c72617e +size 71125783 diff --git a/1b58b8400m/global_step16765/layer_01-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..385789f166ce1d037ecead484f06488c1ad5cf63 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:25b9644fe8fce9c999f53e7854fca53e3147d84f8ab47d9873bc18d276d77880 +size 214435075 diff --git a/1b58b8400m/global_step16765/layer_03-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..277b97441732fc67ec31b79fdf2ce09e086c6504 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e8db671aa16ecfbf52f6c6a24065b90186511b7a6f3eb5f3ac28cbc65edd36a3 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_04-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..15de6e8dda933b5f365e3497aa93c20924b4dbdd --- /dev/null +++ b/1b58b8400m/global_step16765/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:330399db0507cb70555fe69b059660701800d1266582efd96cd90853c29227d0 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_05-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a9e2b0333c19bf16f383d7b69914216bff02925c --- /dev/null +++ b/1b58b8400m/global_step16765/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a646f3723e869ba5f1fe0402e99d7d52694e31b2be10381e147f178c82a5c7f +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_06-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d500ac8f9473c31436dc5e77f06bdfe11a07532f --- /dev/null +++ b/1b58b8400m/global_step16765/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c99088c51a6683321bf64e1d17dd80538dcfafe07069469aecded25ba5105ba +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_07-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d60b067a25272b7efd72953534fd382a2ce21a42 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:40d5664abbb09b24dcc3a9a0e6478bd974a2656133cc12f4058aff8a7c43f024 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_08-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0c7b783fa24ce748d175c4a23b941de04eae5ffa --- /dev/null +++ b/1b58b8400m/global_step16765/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aef4c0fc2ce3d1f16be3c58cdab1f3db5ca401ae0203bb458a33da02e99da76a +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_09-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d11920af7ce670434196cf3d515121514ab4c05d --- /dev/null +++ b/1b58b8400m/global_step16765/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3be600fec443482e2200e77d7d24048f5026fd54edef3f8349ca6e5e5aa02cdc +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_10-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..717ac39b576dda0228da924c6e6177104d5c78ce --- /dev/null +++ b/1b58b8400m/global_step16765/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c016eef748bb88b1f4d54edf72ba712288059bce3031520280e456c22375f04 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_11-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4503e75f8723e763a7a8c09512c08838cd1d9fbd --- /dev/null +++ b/1b58b8400m/global_step16765/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c4f502eb31dcf410bcb7068b6ab7ea9145a2be73815a06df589bacd497ff6ebc +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_12-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e21b3c4fccbad4e81b2911742aeb3ae35975465e --- /dev/null +++ b/1b58b8400m/global_step16765/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d06fc2f59fac17bb49fcf5a8878b9d942993470570be40438cfb13debbd24c0 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_13-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d305d3152a477b951636f76fbcd546079484ad6f --- /dev/null +++ b/1b58b8400m/global_step16765/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ac775fa19e82de1b88491776f7b0d3129d622eb77a3962b098dbf93fd7d97625 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_14-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a0043bd6a074a4cac2eace313be112daadcb973 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a76a1c3defb959ff610a3f346d7a9710e145d63aa60aa14a4569182755eb3749 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_15-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..629c6e118eb2bb514afd79cca29e2c8987c5e4e9 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2053380dc5b786d8633c81a3a4eca718f9fb9b5e3866d55c66f10cd38edb9968 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_16-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2477d2ff900301aa2996551ac18803d3b5d74c61 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2730ed219d8685027a6e6fb4d15b3afc17ab7ca09bd4b1160537a03fc07ba162 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_17-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c4e63698e6d9c6bd2b372498774bc39d4c66301 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:971ba163967aef59919efd5c7363a36403ed292b09f10301833f4391fb19585c +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_18-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..15f124bfed4b9c15b57c1726d6a44eccd992cc50 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5850bb9e1f935b0e51042da3ceedb3d6a72dbff1e83eda67ac1ef5e36ec2d673 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_19-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..053b4a78a10a2148d7577803d3726adc88123621 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26a29e702046433a626fe66a03f95c404b9de01ac55c5a672eab69c60bfce197 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_20-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dc750d6a65636bb1c91f2d77559e728db58adbbc --- /dev/null +++ b/1b58b8400m/global_step16765/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bcd0f111641c2b038e7e0ca7b7054703ed9c51d3f6b062f6775ce61229d86107 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_21-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_21-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ece3c9fda9e6021181f995a39df2911883ac7ef8 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_21-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ebd98eb0b0148902812e227916121d6b783c49b0eebd2fd755ab75054f2ded8 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_22-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..844851f42c5bd6bcaa9181adce5dade82304c066 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c56743098a0e4a45e429f81ac303a8298fbaf64a06eaec07dab408c676610558 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_23-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_23-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f553d8c8b19b49e2e287c2b212ffe1c2ea03b492 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_23-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba8a648e8d4c1d757d18351046b92c545d080f046c17f7729ad950f10dacef4e +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_24-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_24-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb3637d05e105b35077f758f5e06ffab7e40de26 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_24-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a368ed2eff5e1ad12d9fa592f9ccb673f6ddd3dfed5a38ed6aab197b5751b2e +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_25-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_25-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..103f1fc919ac7b60ff743932c82693d428659953 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_25-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d6ca74abd1b90df5de2e86ba301f8738835d330a9f23444edd95b7093c9321c8 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_26-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_26-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..74d701770d82fc31d0a916c8b7d43032308be2b8 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_26-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b4ab36e1e89e143b62b70c7525f6a7bd8b210f17cf1633f96f2f3dece3288c40 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_27-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_27-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..170998e828ccc0ecba7b1ccde6740dc009ffd986 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_27-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc0d3b20b5e44735e29f4ba8bcb43b9461985ac94e34992a5f6bd96ba03456c8 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_28-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_28-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..497cdc5035b13acc9db6dfe14f55f596fa8afd9b --- /dev/null +++ b/1b58b8400m/global_step16765/layer_28-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c0a5d3b27208da114f0528dd85a1bd28416d2fdf50ce229c3ad90740599a415c +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_29-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_29-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8f0067ae394af706785072ce54a48892ea8b5601 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_29-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69858d168e369a321db7f6960fd43c2caf0e3fff24264eb776462ccef890d121 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_30-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_30-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3945acc3a948e944b66d9aeb565a7c40e14d8124 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_30-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cda9f23f3f9b06c6675b3f5eba141d60d5614531f14be86401d117e77ddb1996 +size 100720899 diff --git a/1b58b8400m/global_step16765/layer_32-model_00-model_states.pt b/1b58b8400m/global_step16765/layer_32-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..85c27aba178635d87ceb36865e5ef5c9d7af8c90 --- /dev/null +++ b/1b58b8400m/global_step16765/layer_32-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fba36f1d5d4e30480a38d6101d410d60464180f81566ad63d52443359bf61405 +size 9411 diff --git a/1b58b8400m/global_step16765/mp_rank_00_model_states.pt b/1b58b8400m/global_step16765/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..10623e83189369796e69ec33341e09d8045b2673 --- /dev/null +++ b/1b58b8400m/global_step16765/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3cd174f5c3fd4e0baad969881780babb81d4efdbc1a6458b1b724a5d2e480ce2 +size 45363 diff --git a/1b58b8400m/sbatch_1b58b8400m.sh b/1b58b8400m/sbatch_1b58b8400m.sh new file mode 100644 index 0000000000000000000000000000000000000000..297a081bb45edf5798ae4743989bed9d34a91058 --- /dev/null +++ b/1b58b8400m/sbatch_1b58b8400m.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=32 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b58b8400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=1 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1593M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 8790000000 +# -> Samples: 4291992 +TRAIN_SAMPLES=4_291_992 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 4_292 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 10000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b58b8400m/sbatch_1b58b8400mval.sh b/1b58b8400m/sbatch_1b58b8400mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..031b544cd980962c43e53b1e4e7dd30d305e38e8 --- /dev/null +++ b/1b58b8400m/sbatch_1b58b8400mval.sh @@ -0,0 +1,168 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=16 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=1b58b8400mval +VARIANT_CKPT=1b58b8400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_8B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=1 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_1593M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 8790000000 +# -> Samples: 4291992 +TRAIN_SAMPLES=4_291_992 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 4_292 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/1b58b8400m/tensorboard_1b58b8400m/events.out.tfevents.1678910070.nid006513.20228.0 b/1b58b8400m/tensorboard_1b58b8400m/events.out.tfevents.1678910070.nid006513.20228.0 new file mode 100644 index 0000000000000000000000000000000000000000..b7e6da22865ce4ab6e7da54f067e3121f9dfcc04 --- /dev/null +++ b/1b58b8400m/tensorboard_1b58b8400m/events.out.tfevents.1678910070.nid006513.20228.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82b8a5580b5984be40128570b37c75d37ede05c2c20fe5bd2031448cca13adfe +size 29861027 diff --git a/1b58b8400m/tensorboard_1b58b8400mval/events.out.tfevents.1678950290.nid005735.38933.0 b/1b58b8400m/tensorboard_1b58b8400mval/events.out.tfevents.1678950290.nid005735.38933.0 new file mode 100644 index 0000000000000000000000000000000000000000..efcc0010a22142b2bf01a84ebea2a062ffc87e07 --- /dev/null +++ b/1b58b8400m/tensorboard_1b58b8400mval/events.out.tfevents.1678950290.nid005735.38933.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21922b6f9fb0b498d2e11d782b5bc448657828729b47d715c9d9eea781c6ab68 +size 980 diff --git a/220m3b9100mdedup/3327359.err b/220m3b9100mdedup/3327359.err new file mode 100644 index 0000000000000000000000000000000000000000..6c582d085551015a03c2b3a1de00a5d11ef5b6bc --- /dev/null +++ b/220m3b9100mdedup/3327359.err @@ -0,0 +1,1121 @@ +4: 2023-03-17 00:51:44.817418: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:51:44.817426: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:51:44.817425: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:51:44.817439: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:51:44.817436: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:51:44.817625: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:51:44.817628: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:51:44.817621: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: 2023-03-17 00:51:44.817448: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:51:44.817454: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-17 00:51:44.817441: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:51:44.817638: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:51:44.817643: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:51:44.817650: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:51:44.817642: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:51:44.817651: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:51:44.817930: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:51:44.817941: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:51:44.817953: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:51:44.817948: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:51:44.817961: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:51:44.817970: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:51:44.817969: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-17 00:51:44.817982: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:51:44.824816: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:51:44.824825: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:51:44.824826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:51:44.824826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:51:44.824829: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:51:44.824831: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:51:44.824832: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-17 00:51:44.824831: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:51:44.832365: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:51:44.832372: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:51:44.832362: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:51:44.832358: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:51:44.832358: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:51:44.832374: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:51:44.832385: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-17 00:51:44.832379: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:51:44.849284: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:51:44.849294: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:51:44.849284: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:51:44.849297: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:51:44.849293: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:51:44.849304: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:51:44.849307: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-17 00:51:44.849296: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:51:44.923932: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:51:44.923947: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:51:44.923949: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:51:44.923932: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:51:44.923939: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:51:44.923935: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:51:44.923939: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-17 00:51:44.923940: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:51:44.924291: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:51:44.924299: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:51:44.924300: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:51:44.924304: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:51:44.924295: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:51:44.924292: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:51:44.924298: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-17 00:51:44.924294: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-17 00:51:46.474671: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:46.474676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:46.474674: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:46.474686: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:46.474682: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:46.474687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:46.474683: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:46.474684: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:46.475035: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:51:46.475042: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:51:46.475041: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:51:46.475049: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:51:46.475050: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:51:46.475055: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:51:46.475065: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-17 00:51:46.475066: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:51:46.545356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:46.545365: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:46.545369: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:46.545370: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:46.545362: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:46.545371: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:46.545370: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:46.545370: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:46.545744: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:51:46.545745: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:51:46.545748: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:51:46.545751: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:51:46.545754: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:51:46.545755: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:51:46.545756: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-17 00:51:46.545760: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:51:46.547501: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:46.547503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:46.547506: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:46.547514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:46.547501: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:46.547513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:46.547518: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:46.547512: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:46.547898: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:51:46.547902: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:51:46.547908: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:51:46.547907: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:51:46.547912: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:51:46.547914: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:51:46.547914: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:51:46.547917: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:51:46.573815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:46.573824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:46.573823: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:46.573819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:46.573827: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:46.573827: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:46.573823: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:46.573820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:46.574004: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:51:46.574009: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:51:46.574011: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:51:46.574011: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:51:46.574014: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:51:46.574015: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:51:46.574017: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-17 00:51:46.574020: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:51:46.586942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:46.586950: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:46.586950: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:46.586952: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:46.586962: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:46.586957: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:46.586958: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:46.586957: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:46.587364: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:51:46.587369: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:51:46.587371: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:51:46.587374: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:51:46.587376: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:51:46.587377: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:51:46.587382: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-17 00:51:46.587384: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:51:46.592422: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:46.592424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:46.592425: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:46.592429: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:46.592433: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:46.592432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:46.592432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:46.592434: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:46.592948: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:51:46.592954: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:51:46.592954: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:51:46.592954: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:51:46.592958: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:51:46.592958: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:51:46.592961: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-17 00:51:46.592964: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:51:46.616053: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:46.616052: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:46.616049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:46.616060: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:46.616057: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:46.616049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:46.616056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:46.616062: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:46.616413: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:51:46.616412: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:51:46.616418: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:51:46.616418: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:51:46.616420: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:51:46.616421: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:51:46.616422: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-17 00:51:46.616426: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:51:46.619840: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:46.619842: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:46.619847: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:46.619855: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:46.619850: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:46.619854: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:46.619859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:46.619852: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:46.620368: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:51:46.620370: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:51:46.620372: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:51:46.620376: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:51:46.620375: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:51:46.620377: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:51:46.620379: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-17 00:51:46.620383: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-17 00:51:51.358587: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.358615: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:51:51.358593: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.358726: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 00:51:51.358612: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:51:51.358594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.358620: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:51:51.358595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.358732: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 00:51:51.358622: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:51:51.358589: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.358736: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 00:51:51.358629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:51:51.358597: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.358734: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 00:51:51.358622: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:51:51.358599: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.358741: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 00:51:51.358624: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-17 00:51:51.358601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.358872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.358741: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-17 00:51:51.358629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.358947: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 00:51:51.358742: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.358877: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-17 00:51:51.358747: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.358883: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 00:51:51.358944: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.358882: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 00:51:51.358957: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.358986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.358889: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-17 00:51:51.358952: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.358890: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 00:51:51.359013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.358953: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.359000: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.358893: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.358957: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.359006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.358897: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 00:51:51.359022: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.358960: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.359001: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.359019: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.358962: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.359006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.359024: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.359011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 00:51:51.359028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.359013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 00:51:51.359027: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.359015: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-17 00:51:51.359029: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.359031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-17 00:51:51.359270: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.359273: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.359286: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.359283: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.359292: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.359292: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.359297: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.359297: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.360321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.360321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.360322: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.360325: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.360330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.360330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.360337: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:51:51.360337: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:51:51.360334: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.360342: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:51:51.360345: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:51:51.360347: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:51:51.360348: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:51:51.360350: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-17 00:51:51.360369: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-17 00:51:51.360383: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:51:51.360669: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:51.360675: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:51.360674: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:51.360675: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:51.360676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:51.360685: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:51:51.360678: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:51.360680: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:51.360691: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:51:51.360693: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:51:51.360881: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 00:51:51.360695: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:51:51.360696: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:51:51.360698: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:51:51.360858: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-17 00:51:51.360700: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-17 00:51:51.360718: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 00:51:51.360920: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-17 00:51:51.360739: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.360885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 00:51:51.360860: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.360922: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 00:51:51.360891: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 00:51:51.360861: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.360925: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 00:51:51.360888: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 00:51:51.360863: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.360926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 00:51:51.360892: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.360895: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:51:51.360928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 00:51:51.360885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.360898: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:51:51.360863: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 00:51:51.360930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.360865: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 00:51:51.360926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-17 00:51:51.360875: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.360934: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:51:51.360935: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:51:51.360875: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:51:51.360877: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:51:51.360879: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:51:51.360940: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:51:51.360943: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:51:51.360943: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:51:51.360895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 00:51:51.360880: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:51:51.360879: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:51:51.360944: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-17 00:51:51.360946: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.360904: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:51:51.360905: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:51:51.360906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-17 00:51:51.360972: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 00:51:51.360906: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:51:51.360908: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-17 00:51:51.360985: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-17 00:51:51.360910: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:51:51.360905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-17 00:51:51.360913: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-17 00:51:51.360926: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:51:51.361179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-17 00:51:51.360921: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-17 00:51:51.360921: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361178: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361181: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361189: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361191: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361196: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:51:51.361196: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:51:51.361199: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:51:51.361201: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:51:51.361388: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-17 00:51:51.361204: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-17 00:51:51.361207: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361232: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 00:51:51.361390: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361245: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361242: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-17 00:51:51.361391: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-17 00:51:51.361257: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.361392: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.361393: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.361397: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.361398: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.361404: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:51:51.361408: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:51:51.361409: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:51:51.361413: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:51:51.361414: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:51:51.361412: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:51:51.361415: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-17 00:51:51.361432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-17 00:51:51.361445: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:51:51.361192: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.361196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.361195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.361198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.361199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.361206: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:51:51.361203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.361203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.361208: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:51:51.361212: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:51:51.361214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:51:51.361214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:51:51.361218: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:51:51.361220: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-17 00:51:51.361246: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-17 00:51:51.361258: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +1: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: +2: +2: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: +6: +6: +6: +6: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: +0: Loading extension module utils...Loading extension module utils... +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +2: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Loading extension module utils... +4: +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/220m3b9100mdedup/3327359.out b/220m3b9100mdedup/3327359.out new file mode 100644 index 0000000000000000000000000000000000000000..bf315112b6ca7b4666f228ec536eed2a12b55ac0 --- /dev/null +++ b/220m3b9100mdedup/3327359.out @@ -0,0 +1,6435 @@ +Model parameters: d_model 896 ffw_size 3584 kv_size 64 n_heads 14 n_layers 18 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 18 --hidden-size 896 --num-attention-heads 14 --kv-channels 64 --ffn-hidden-size 3584 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-220m3b9100mdedupval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-only true --eval-iters 100 --tensorboard-dir tensorboard_220m3b9100mdedupval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_220m3b9100mdedup --load checkpoints_220m3b9100mdedup --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3327359.json --zero-stage 0 +START 3327359: Fri 17 Mar 2023 12:51:21 AM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 45.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 36.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 39.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 46.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 45.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 45.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 49.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 46.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 45.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 38.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 47.0c 77.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 49.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 39.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 36.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 39.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +3: Launching on nid005112 (3/8), master nid005109 port 9999, GPUs 8, CUDA: True +4: Launching on nid005113 (4/8), master nid005109 port 9999, GPUs 8, CUDA: True +0: Launching on nid005109 (0/8), master nid005109 port 9999, GPUs 8, CUDA: True +6: Launching on nid005115 (6/8), master nid005109 port 9999, GPUs 8, CUDA: True +1: Launching on nid005110 (1/8), master nid005109 port 9999, GPUs 8, CUDA: True +5: Launching on nid005114 (5/8), master nid005109 port 9999, GPUs 8, CUDA: True +7: Launching on nid005116 (7/8), master nid005109 port 9999, GPUs 8, CUDA: True +2: Launching on nid005111 (2/8), master nid005109 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3327359.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3584 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 896 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-220m3b9100mdedupval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_220m3b9100mdedup +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 14 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 18 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_220m3b9100mdedup +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_220m3b9100mdedupval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +7: > setting tensorboard ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-17 00:52:10,161] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.100 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.o scaled_upper_triang_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: [1/1] c++ scaled_masked_softmax_hip.o scaled_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 26.314 seconds +0: time to initialize megatron (seconds): 55.415 +0: [after megatron is initialized] datetime: 2023-03-17 00:52:39 +0: building GPT model ... +0: [2023-03-17 00:52:39,552] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-17 00:52:39,553] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-17 00:52:39,553] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.65 GB, percent = 6.7% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-17 00:52:41,537] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=25 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: undo +0: 22: MixedFusedLayerNorm +0: 23: EmbeddingPipe +0: 24: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-17 00:52:41,870] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-17 00:52:41,870] [INFO] [utils.py:828:see_memory_usage] MA 0.42 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-17 00:52:41,871] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.67 GB, percent = 6.7% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-17 00:52:41,872] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-17 00:52:51,666] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-17 00:52:51,666] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-17 00:52:51,666] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-17 00:52:51,673] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-17 00:52:51,673] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-17 00:52:51,790] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-17 00:52:51,790] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-17 00:52:51,791] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.35 GB, percent = 6.8% +0: ninja: no work to do. +0: Time to load utils op: 0.14244890213012695 seconds +0: Time to load utils op: 0.1024620532989502 seconds +0: Time to load utils op: 0.20453763008117676 seconds +0: Time to load utils op: 0.2045900821685791 seconds +0: Time to load utils op: 0.2045726776123047 seconds +0: Time to load utils op: 0.20480632781982422 seconds +0: Time to load utils op: 0.20447254180908203 seconds +0: Time to load utils op: 0.20458173751831055 seconds +3: Time to load utils op: 0.21021580696105957 seconds +3: Time to load utils op: 0.21010351181030273 seconds +3: Time to load utils op: 0.21089434623718262 seconds +3: Time to load utils op: 0.21140646934509277 seconds +3: Time to load utils op: 0.21001601219177246 seconds +3: Time to load utils op: 0.210404634475708 seconds +3: Time to load utils op: 0.21082043647766113 seconds +3: Time to load utils op: 0.2102982997894287 seconds +1: Time to load utils op: 0.2106492519378662 seconds +1: Time to load utils op: 0.21065044403076172 seconds +1: Time to load utils op: 0.21066713333129883 seconds +1: Time to load utils op: 0.21067476272583008 seconds +1: Time to load utils op: 0.21068334579467773 seconds +1: Time to load utils op: 0.21069073677062988 secondsTime to load utils op: 0.21068787574768066 secondsTime to load utils op: 0.21068310737609863 seconds +1: +1: +2: Time to load utils op: 0.21069931983947754 secondsTime to load utils op: 0.21068978309631348 seconds +2: +2: Time to load utils op: 0.21069765090942383 seconds +2: Time to load utils op: 0.21070599555969238 secondsTime to load utils op: 0.21071290969848633 seconds +2: +2: Time to load utils op: 0.21072077751159668 secondsTime to load utils op: 0.2107231616973877 seconds +2: Time to load utils op: 0.2107222080230713 seconds +2: +5: Time to load utils op: 0.21097826957702637 secondsTime to load utils op: 0.21075749397277832 seconds +5: +5: Time to load utils op: 0.2109968662261963 seconds +5: Time to load utils op: 0.21079301834106445 seconds +5: Time to load utils op: 0.21126556396484375 seconds +5: Time to load utils op: 0.21081805229187012 seconds +5: Time to load utils op: 0.2112138271331787 seconds +5: Time to load utils op: 0.21066641807556152 seconds +4: Time to load utils op: 0.21023035049438477 seconds +4: Time to load utils op: 0.21022677421569824 seconds +4: Time to load utils op: 0.2102370262145996 seconds +4: Time to load utils op: 0.21024417877197266 seconds +4: Time to load utils op: 0.21024179458618164 secondsTime to load utils op: 0.21024227142333984 seconds +4: +4: Time to load utils op: 0.21025419235229492 seconds +4: Time to load utils op: 0.21025562286376953 seconds +6: Time to load utils op: 0.2111368179321289 secondsTime to load utils op: 0.2111341953277588 seconds +6: +6: Time to load utils op: 0.21115779876708984 seconds +6: Time to load utils op: 0.21117472648620605 seconds +6: Time to load utils op: 0.2111804485321045 secondsTime to load utils op: 0.21118640899658203 seconds +6: +6: Time to load utils op: 0.21118712425231934 seconds +6: Time to load utils op: 0.21119308471679688 seconds +7: Time to load utils op: 0.21166300773620605 secondsTime to load utils op: 0.21166419982910156 seconds +7: +7: Time to load utils op: 0.21167826652526855 seconds +7: Time to load utils op: 0.21167373657226562 seconds +7: Time to load utils op: 0.21168947219848633 secondsTime to load utils op: 0.21168136596679688 seconds +7: +7: Time to load utils op: 0.21169090270996094 seconds +7: Time to load utils op: 0.21169757843017578 seconds +0: Time to load utils op: 0.0005419254302978516 seconds +0: Time to load utils op: 0.00057220458984375 seconds +0: Time to load utils op: 0.0004322528839111328 seconds +0: Time to load utils op: 0.0004909038543701172 seconds +0: Time to load utils op: 0.0005581378936767578 seconds +0: Time to load utils op: 0.0005602836608886719 seconds +0: Time to load utils op: 0.0005092620849609375 seconds +3: Time to load utils op: 0.0008885860443115234 seconds +3: Time to load utils op: 0.0009737014770507812 seconds +3: Time to load utils op: 0.0008008480072021484 seconds +3: Time to load utils op: 0.0011658668518066406 secondsTime to load utils op: 0.0011487007141113281 seconds +3: +3: Time to load utils op: 0.0011048316955566406 secondsTime to load utils op: 0.0011587142944335938 seconds +3: +3: Time to load utils op: 0.0011887550354003906 seconds +5: Time to load utils op: 0.0009205341339111328 seconds +5: Time to load utils op: 0.0008952617645263672 seconds +5: Time to load utils op: 0.0010769367218017578 seconds +5: Time to load utils op: 0.0011546611785888672 seconds +5: Time to load utils op: 0.0011777877807617188 secondsTime to load utils op: 0.001119375228881836 seconds +5: +5: Time to load utils op: 0.0011546611785888672 seconds +5: Time to load utils op: 0.0011603832244873047 seconds +2: Time to load utils op: 0.0010368824005126953 seconds +2: Time to load utils op: 0.0011935234069824219 seconds +4: Time to load utils op: 0.0010335445404052734 seconds +4: Time to load utils op: 0.0010044574737548828 seconds +2: Time to load utils op: 0.0013682842254638672 seconds +2: Time to load utils op: 0.0013709068298339844 secondsTime to load utils op: 0.0013463497161865234 seconds +2: +2: Time to load utils op: 0.0013737678527832031 secondsTime to load utils op: 0.00141143798828125 seconds +2: +4: Time to load utils op: 0.0012788772583007812 seconds +2: Time to load utils op: 0.0014183521270751953 seconds +4: Time to load utils op: 0.001298666000366211 seconds +4: Time to load utils op: 0.00138092041015625 seconds +4: Time to load utils op: 0.0013208389282226562 seconds +4: Time to load utils op: 0.0013728141784667969 seconds +4: Time to load utils op: 0.0014276504516601562 seconds +7: Time to load utils op: 0.0006687641143798828 seconds +7: Time to load utils op: 0.0010290145874023438 seconds +7: Time to load utils op: 0.0010709762573242188 seconds +7: Time to load utils op: 0.0011267662048339844 seconds +7: Time to load utils op: 0.0013034343719482422 seconds +7: Time to load utils op: 0.0012469291687011719 seconds +7: Time to load utils op: 0.0013267993927001953 seconds +7: Time to load utils op: 0.00038552284240722656 seconds +1: Time to load utils op: 0.000659942626953125 seconds +1: Time to load utils op: 0.0008764266967773438 seconds +1: Time to load utils op: 0.0011692047119140625 seconds +6: Time to load utils op: 0.0011432170867919922 seconds +1: Time to load utils op: 0.001178741455078125 seconds +1: Time to load utils op: 0.0011668205261230469 seconds +1: Time to load utils op: 0.0013232231140136719 seconds +1: Time to load utils op: 0.0013134479522705078 seconds +1: Time to load utils op: 0.0013129711151123047 seconds +6: Time to load utils op: 0.0012197494506835938 seconds +6: Time to load utils op: 0.0015921592712402344 seconds +6: Time to load utils op: 0.0015230178833007812 seconds +6: Time to load utils op: 0.0015861988067626953 seconds +6: Time to load utils op: 0.0012981891632080078 seconds +6: Time to load utils op: 0.0015575885772705078 seconds +6: Time to load utils op: 0.0016281604766845703 seconds +0: [2023-03-17 00:52:52,005] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-17 00:52:52,005] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.41 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-17 00:52:52,005] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.48 GB, percent = 6.9% +0: [2023-03-17 00:52:52,118] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-17 00:52:52,119] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB +0: [2023-03-17 00:52:52,119] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.5 GB, percent = 6.9% +0: [2023-03-17 00:52:52,219] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-17 00:52:52,220] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB +0: [2023-03-17 00:52:52,220] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.5 GB, percent = 6.9% +0: [2023-03-17 00:52:52,322] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-17 00:52:52,323] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-17 00:52:52,323] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.5 GB, percent = 6.9% +0: [2023-03-17 00:52:52,423] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-17 00:52:52,423] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-17 00:52:52,423] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.5 GB, percent = 6.9% +0: [2023-03-17 00:52:52,525] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-17 00:52:52,525] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-17 00:52:52,526] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.5 GB, percent = 6.9% +0: [2023-03-17 00:52:52,625] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-17 00:52:52,625] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-17 00:52:52,625] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.5 GB, percent = 6.9% +0: [2023-03-17 00:52:52,730] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-17 00:52:52,730] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-17 00:52:52,730] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.5 GB, percent = 6.9% +0: [2023-03-17 00:52:52,830] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-17 00:52:52,831] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-17 00:52:52,831] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.5 GB, percent = 6.9% +0: [2023-03-17 00:52:52,831] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-17 00:52:52,831] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-17 00:52:52,831] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-17 00:52:52,832] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-17 00:52:52,832] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-17 00:52:52,832] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-17 00:52:52,832] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-17 00:52:52,832] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-17 00:52:52,832] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-17 00:52:52,833] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-17 00:52:52,834] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-17 00:52:52,834] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0004189014434814453 seconds +0: [2023-03-17 00:52:52,835] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-17 00:52:52,845] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=25 [0, 25) STAGE_PARAMS=220527104 (220.527M) TOTAL_PARAMS=220527104 (220.527M) UNIQUE_PARAMS=220527104 (220.527M) +5: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +5: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +1: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +0: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +0: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:52,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +7: [2023-03-17 00:52:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:52,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:52,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:52,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:52,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +6: [2023-03-17 00:52:52,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:52,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:52,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt... +3: [2023-03-17 00:52:52,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:52,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt. +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:52,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:53,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:53,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:53,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:53,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:53,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:53,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +1: [2023-03-17 00:52:53,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:53,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:53,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:53,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:53,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:53,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:53,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +3: [2023-03-17 00:52:53,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:53,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:53,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:53,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:53,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +6: [2023-03-17 00:52:53,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:53,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:53,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:53,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:53,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +4: [2023-03-17 00:52:53,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +2: [2023-03-17 00:52:53,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:53,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:53,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:53,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:53,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:53,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:53,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +7: [2023-03-17 00:52:53,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:53,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:53,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:53,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:53,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:53,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:53,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:53,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:52:53,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt... +5: [2023-03-17 00:52:53,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +5: [2023-03-17 00:52:53,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +6: [2023-03-17 00:52:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +2: [2023-03-17 00:52:53,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +4: [2023-03-17 00:52:53,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +1: [2023-03-17 00:52:53,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +3: [2023-03-17 00:52:53,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +7: [2023-03-17 00:52:53,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:52:53,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +4: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +3: [2023-03-17 00:52:53,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +6: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +1: [2023-03-17 00:52:53,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +5: [2023-03-17 00:52:53,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +5: [2023-03-17 00:52:53,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +2: [2023-03-17 00:52:53,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +4: [2023-03-17 00:52:53,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +6: [2023-03-17 00:52:53,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +3: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +1: [2023-03-17 00:52:53,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:52:53,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt... +7: [2023-03-17 00:52:53,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt. +7: [2023-03-17 00:52:53,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +1: [2023-03-17 00:52:53,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +2: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +2: [2023-03-17 00:52:53,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +4: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt... +3: [2023-03-17 00:52:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +3: [2023-03-17 00:52:53,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +6: [2023-03-17 00:52:53,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +1: [2023-03-17 00:52:53,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +4: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +5: [2023-03-17 00:52:53,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt. +7: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +5: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +2: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +7: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +3: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +1: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt... +4: [2023-03-17 00:52:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +2: [2023-03-17 00:52:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +6: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +3: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +4: [2023-03-17 00:52:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +5: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +7: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:52:53,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt. +1: [2023-03-17 00:52:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +7: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +4: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +1: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +6: [2023-03-17 00:52:53,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +5: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +2: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt... +3: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +3: [2023-03-17 00:52:53,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +5: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +6: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +2: [2023-03-17 00:52:53,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +4: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +1: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +7: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:52:53,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +5: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +1: [2023-03-17 00:52:53,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +5: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +3: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +2: [2023-03-17 00:52:53,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +7: [2023-03-17 00:52:53,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +6: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:52:53,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt... +4: [2023-03-17 00:52:53,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +2: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +4: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +1: [2023-03-17 00:52:53,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +3: [2023-03-17 00:52:53,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +6: [2023-03-17 00:52:53,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:52:53,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt. +7: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +2: [2023-03-17 00:52:53,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +6: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +7: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +4: [2023-03-17 00:52:53,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +5: [2023-03-17 00:52:53,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt... +3: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +5: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +3: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +7: [2023-03-17 00:52:53,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:52:53,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +6: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +2: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +4: [2023-03-17 00:52:53,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:52:53,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt. +1: [2023-03-17 00:52:53,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +6: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +3: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +2: [2023-03-17 00:52:53,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +1: [2023-03-17 00:52:53,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +5: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +4: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +6: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +2: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +5: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +1: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +3: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +4: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:52:53,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt... +7: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt. +7: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:53,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:53,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:53,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +5: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +7: [2023-03-17 00:52:53,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +4: [2023-03-17 00:52:53,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:53,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:53,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +1: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:52:53,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +2: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +1: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +6: [2023-03-17 00:52:53,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:53,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:53,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:53,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:53,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +4: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +6: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +7: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +2: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:53,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:53,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +5: [2023-03-17 00:52:53,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:54,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:54,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:54,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:54,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:54,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:52:54,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt... +3: [2023-03-17 00:52:53,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt. +3: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:54,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:54,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:54,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +7: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +6: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +2: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +1: [2023-03-17 00:52:54,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:52:54,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt... +3: [2023-03-17 00:52:54,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +2: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +3: [2023-03-17 00:52:54,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +5: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +6: [2023-03-17 00:52:54,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +4: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +7: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +1: [2023-03-17 00:52:54,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:52:54,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +2: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +7: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +3: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +1: [2023-03-17 00:52:54,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +4: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt... +5: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +6: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +5: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +2: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +1: [2023-03-17 00:52:54,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +4: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +3: [2023-03-17 00:52:54,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +7: [2023-03-17 00:52:54,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:52:54,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:52:54,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +1: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +4: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +7: [2023-03-17 00:52:54,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +3: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:52:54,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt... +5: [2023-03-17 00:52:54,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +2: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +5: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +1: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +4: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +7: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +6: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +3: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:52:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +5: [2023-03-17 00:52:54,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +5: [2023-03-17 00:52:54,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +6: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +2: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +4: [2023-03-17 00:52:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +1: [2023-03-17 00:52:54,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +7: [2023-03-17 00:52:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt... +3: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +3: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +7: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +4: [2023-03-17 00:52:54,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +1: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +6: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +2: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:52:54,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +1: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +5: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +4: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +6: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +7: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt... +2: [2023-03-17 00:52:54,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +2: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +5: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +4: [2023-03-17 00:52:54,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +1: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +3: [2023-03-17 00:52:54,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +7: [2023-03-17 00:52:54,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:52:54,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt. +6: [2023-03-17 00:52:54,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +1: [2023-03-17 00:52:54,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +1: [2023-03-17 00:52:54,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +7: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +4: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +6: [2023-03-17 00:52:54,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +5: [2023-03-17 00:52:54,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +5: [2023-03-17 00:52:54,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +3: [2023-03-17 00:52:54,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt... +2: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +7: [2023-03-17 00:52:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +6: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +4: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +2: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +3: [2023-03-17 00:52:54,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:52:54,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +5: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +3: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +6: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:52:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt... +1: [2023-03-17 00:52:54,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +3: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +6: [2023-03-17 00:52:54,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +5: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +2: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +7: [2023-03-17 00:52:54,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +4: [2023-03-17 00:52:54,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:52:54,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt. +1: [2023-03-17 00:52:54,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +5: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +7: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +1: [2023-03-17 00:52:54,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +3: [2023-03-17 00:52:54,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:54,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt... +2: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +4: [2023-03-17 00:52:54,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:54,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +2: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +7: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +5: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:54,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:54,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +6: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:54,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:54,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:54,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +3: [2023-03-17 00:52:54,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:54,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:54,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:54,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:54,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:54,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:54,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:54,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:54,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:52:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt. +1: [2023-03-17 00:52:54,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:54,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:54,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:54,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:54,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:54,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:54,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +3: [2023-03-17 00:52:55,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +7: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +1: [2023-03-17 00:52:55,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:55,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +5: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +5: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:52:55,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt... +6: [2023-03-17 00:52:55,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +1: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +2: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +7: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +6: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +4: [2023-03-17 00:52:55,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +3: [2023-03-17 00:52:55,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:52:55,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +6: [2023-03-17 00:52:55,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +6: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +6: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +7: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +7: [2023-03-17 00:52:55,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +2: [2023-03-17 00:52:55,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +3: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +1: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +7: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +7: [2023-03-17 00:52:55,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +7: [2023-03-17 00:52:55,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt... +5: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +3: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +4: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +7: [2023-03-17 00:52:55,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +2: [2023-03-17 00:52:55,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +5: [2023-03-17 00:52:55,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +3: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +6: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +7: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +5: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +4: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +4: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +2: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +6: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +6: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +7: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +5: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +1: [2023-03-17 00:52:55,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +1: [2023-03-17 00:52:55,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +1: [2023-03-17 00:52:55,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +1: [2023-03-17 00:52:55,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:52:55,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:52:55,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:52:55,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt. +3: [2023-03-17 00:52:55,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:52:55,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:52:55,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:52:55,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:52:55,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:52:55,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:52:55,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:52:55,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:52:55,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:52:55,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:52:55,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:52:55,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:52:55,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:52:55,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:52:55,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:52:55,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:52:55,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:52:55,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:52:55,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:52:55,303] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +3: [2023-03-17 00:52:55,305] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +5: [2023-03-17 00:52:55,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:52:55,314] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +5: [2023-03-17 00:52:55,316] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +4: [2023-03-17 00:52:55,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:52:55,317] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +4: [2023-03-17 00:52:55,319] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +1: [2023-03-17 00:52:55,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:52:55,325] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +2: [2023-03-17 00:52:55,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:52:55,327] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +1: [2023-03-17 00:52:55,327] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +2: [2023-03-17 00:52:55,329] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +0: [2023-03-17 00:52:55,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:52:55,330] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +0: [2023-03-17 00:52:55,332] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +3: [2023-03-17 00:52:55,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:52:55,350] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +6: [2023-03-17 00:52:55,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:52:55,350] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +3: [2023-03-17 00:52:55,352] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +6: [2023-03-17 00:52:55,352] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +7: [2023-03-17 00:52:55,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:52:55,354] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +5: [2023-03-17 00:52:55,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:52:55,354] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +7: [2023-03-17 00:52:55,356] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +5: [2023-03-17 00:52:55,356] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +4: [2023-03-17 00:52:55,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:52:55,359] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +4: [2023-03-17 00:52:55,362] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +0: [2023-03-17 00:52:55,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:52:55,368] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +0: [2023-03-17 00:52:55,370] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +2: [2023-03-17 00:52:55,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:52:55,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:52:55,377] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +2: [2023-03-17 00:52:55,377] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +1: [2023-03-17 00:52:55,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:52:55,378] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +2: [2023-03-17 00:52:55,379] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +2: [2023-03-17 00:52:55,379] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +1: [2023-03-17 00:52:55,380] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +6: [2023-03-17 00:52:55,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:52:55,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:52:55,392] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +4: [2023-03-17 00:52:55,392] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +1: [2023-03-17 00:52:55,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:52:55,394] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +4: [2023-03-17 00:52:55,394] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +6: [2023-03-17 00:52:55,395] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +6: [2023-03-17 00:52:55,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:52:55,395] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +1: [2023-03-17 00:52:55,396] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +6: [2023-03-17 00:52:55,397] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +5: [2023-03-17 00:52:55,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:52:55,403] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +4: [2023-03-17 00:52:55,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:52:55,404] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +7: [2023-03-17 00:52:55,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:52:55,404] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +5: [2023-03-17 00:52:55,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:52:55,404] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +5: [2023-03-17 00:52:55,405] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +0: [2023-03-17 00:52:55,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:52:55,406] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +4: [2023-03-17 00:52:55,406] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +7: [2023-03-17 00:52:55,406] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +5: [2023-03-17 00:52:55,407] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +3: [2023-03-17 00:52:55,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:52:55,407] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +0: [2023-03-17 00:52:55,408] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +7: [2023-03-17 00:52:55,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:52:55,408] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +3: [2023-03-17 00:52:55,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:52:55,409] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +3: [2023-03-17 00:52:55,409] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +7: [2023-03-17 00:52:55,410] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +3: [2023-03-17 00:52:55,411] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +0: [2023-03-17 00:52:55,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:52:55,417] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +0: [2023-03-17 00:52:55,419] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +2: [2023-03-17 00:52:55,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:52:55,421] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +2: [2023-03-17 00:52:55,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:52:55,421] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +6: [2023-03-17 00:52:55,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:52:55,423] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +2: [2023-03-17 00:52:55,423] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +2: [2023-03-17 00:52:55,424] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +6: [2023-03-17 00:52:55,425] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +5: [2023-03-17 00:52:55,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:52:55,428] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +1: [2023-03-17 00:52:55,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:52:55,429] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +6: [2023-03-17 00:52:55,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:52:55,429] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +5: [2023-03-17 00:52:55,430] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +1: [2023-03-17 00:52:55,431] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +6: [2023-03-17 00:52:55,431] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +4: [2023-03-17 00:52:55,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:52:55,436] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +4: [2023-03-17 00:52:55,438] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +1: [2023-03-17 00:52:55,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:52:55,441] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +4: [2023-03-17 00:52:55,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:52:55,442] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +1: [2023-03-17 00:52:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:52:55,443] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +0: [2023-03-17 00:52:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:52:55,444] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +4: [2023-03-17 00:52:55,444] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +1: [2023-03-17 00:52:55,445] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +0: [2023-03-17 00:52:55,445] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +1: [2023-03-17 00:52:55,446] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +3: [2023-03-17 00:52:55,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:52:55,448] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +5: [2023-03-17 00:52:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:52:55,449] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +0: [2023-03-17 00:52:55,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:52:55,450] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +3: [2023-03-17 00:52:55,450] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +5: [2023-03-17 00:52:55,451] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +0: [2023-03-17 00:52:55,451] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +6: [2023-03-17 00:52:55,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:52:55,452] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +2: [2023-03-17 00:52:55,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:52:55,453] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +7: [2023-03-17 00:52:55,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:52:55,453] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +7: [2023-03-17 00:52:55,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:52:55,453] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +6: [2023-03-17 00:52:55,454] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +2: [2023-03-17 00:52:55,455] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +7: [2023-03-17 00:52:55,455] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +7: [2023-03-17 00:52:55,456] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +3: [2023-03-17 00:52:55,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:52:55,458] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +5: [2023-03-17 00:52:55,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:52:55,458] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +7: [2023-03-17 00:52:55,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:52:55,459] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +3: [2023-03-17 00:52:55,460] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +0: [2023-03-17 00:52:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:52:55,460] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +5: [2023-03-17 00:52:55,461] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +7: [2023-03-17 00:52:55,461] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +0: [2023-03-17 00:52:55,462] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +2: [2023-03-17 00:52:55,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:52:55,466] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +2: [2023-03-17 00:52:55,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:52:55,467] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +5: [2023-03-17 00:52:55,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:52:55,467] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +2: [2023-03-17 00:52:55,467] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +2: [2023-03-17 00:52:55,469] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +5: [2023-03-17 00:52:55,469] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +4: [2023-03-17 00:52:55,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:52:55,469] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +1: [2023-03-17 00:52:55,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:52:55,471] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +1: [2023-03-17 00:52:55,471] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +1: [2023-03-17 00:52:55,473] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +1: [2023-03-17 00:52:55,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:52:55,474] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +1: [2023-03-17 00:52:55,476] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +3: [2023-03-17 00:52:55,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:52:55,479] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +6: [2023-03-17 00:52:55,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:52:55,480] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +6: [2023-03-17 00:52:55,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:52:55,481] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +6: [2023-03-17 00:52:55,481] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +6: [2023-03-17 00:52:55,482] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +6: [2023-03-17 00:52:55,483] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +0: [2023-03-17 00:52:55,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:52:55,489] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +0: [2023-03-17 00:52:55,491] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +7: [2023-03-17 00:52:55,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:52:55,501] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +7: [2023-03-17 00:52:55,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:52:55,502] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +7: [2023-03-17 00:52:55,503] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +7: [2023-03-17 00:52:55,504] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +4: [2023-03-17 00:52:55,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:52:55,583] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +4: [2023-03-17 00:52:55,585] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +3: [2023-03-17 00:52:55,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:52:55,687] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +3: [2023-03-17 00:52:55,689] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +0: successfully loaded checkpoint from checkpoints_220m3b9100mdedup at iteration 0 +7: time (ms) | load-checkpoint: 2848.90 +0: estimated model parameters: 0.220527104 +0: estimated model parameters without embeddings: 0.173619712 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-17 00:52:56 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.008213 seconds +0: number of documents: 208931 +0: > dataset split: +0: train: +0: document indices in [0, 208931) total of 208931 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.069 seconds +0: total number of samples: 48805 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.041876 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.012 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-17 00:53:09 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 16736.43 | train/valid/test-data-iterators-setup: 13254.38 +0: [after training is done] datetime: 2023-03-17 00:53:09 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.863982E+00 | lm loss PPL: 4.765473E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3327359: Fri 17 Mar 2023 12:53:32 AM EET diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..41509d5e19be85c5092286ca4d856a0a2bb02ff3 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5014acbb6a549337e0cc5dd9746da5ad6eecf5f9fdab9c9a3c5b4a63694bed8e +size 41353495 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4705046bc2ae8ddf47729d475f464830b61fec78 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5ef1b41e08636f358d4ac8dfb74ae6a7ad10f5d8b93bee5a54afaef52bcbfe2e +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f5456d24ec0460003fc1b9077e76c73cda785998 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:49c178fbc3167b0377eb843d48ec4de799a5cdacf2df79f49978b9d8c67f6b13 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e06946662a177062c9ca77e8aedecefed9d61892 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b75188eeb6274d4da22b4a962cae76727ba4ca1ea59ae549f85ca9d145558142 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9be8711e106eea8b024bd14941c6e69074ceb601 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f7f5d9358cf94cd5c7ddbc3800b27fd2db9c4c4619fccb526c23e2129dfc866c +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..08ecf76457b7213866ed3f0d7727af90949bf657 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd00f2a6cc3695605faf8c9232d5daaf460fe2400e66557328657b19f1f4ee60 +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..da50e87445ecd9a11cc1be4c4ab36edf8bc285ce --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:79773aa1b31a759c272a548f4b0ad1c5d020bf616f3f9cd757a393eb8ac89f2b +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6826a9989fd0e8e854c598bc2a785b91e0183769 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f87e14ff66d60b98c6a6ffcd8cad6417254a56ed448b39ac357a2db533f73bdc +size 41353442 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dc8c60159ab1e8ecd31d12de83f6c8de2000abaa --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ecc6b9e68acc0383e9eb0a7a397b674db688d93dfde9bd372618446eed52d3aa +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2f23e31b01a8b874c429d2006c811ef423e279fa --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56a652d681b4749370bedacb9b5b31cdea59d490e55ead2cf2f0a99b309ed3d1 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..62346b2e0f8845f550484c2ec8798261262e1f2a --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:853f21135d0296c7574d3352147ba8b8b61bf3f85d92857f9f80266303210c64 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f4b8189d7409abed4b087e262249dba04bb37830 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b29f66af0709d0d4d8b17bb50597057ac23c10db74a400e160727610237de048 +size 41353559 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fffc4ffc66f0162ddbac1bd5f37c035f047ecacf --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a41a76abc13edb2debb1b435f206da2e299402f01cc33ddfe7ae18852e9de402 +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0b3bc8a914b0d7bb9d1b587d09f900337b106009 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c789a5af57967db83a3bcefd09716a33e9a9a38402b473d640ae1072a8ada8f +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7909a6d2ccbce2ad860ddc8c15a16df9ece73809 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ba869540063831d031e840edd53fda8dbe77c8fe676f9666fe20b6ba3f06495 +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8904018cbf1cebf7a4950a5239d83ea919f8c0bc --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51768c1087285409c653ae7b3fa4b708371b9a2e4dbd4334a0278815f18f4387 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..03c388595840624c99a45eec705c8d7f336fcb17 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7cc098b57348d8f341eab95f5c6d506eba44307f7f991b6549750d6e5556e392 +size 41353698 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dc22ba50dac0d2c89997f4252f0cac942f7ad098 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f911d77d3b58f44b0360dd7d341eab5c4d0e4c03001e43ba0889f49713f3cda +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..711962cbecf60d22c7ee991c1a74f20729e9b1f2 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1ebb19875d6ded00a6529c499061e04f133e641f4a2b2c776ce48c255c7025e8 +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bfe7d59d336225d8914b09e7af1d5dc4cd43a63e --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d610ceea0411652392c854a9e300296230421b2b89ee386e47766531fea03069 +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..02a962ecb272ef346c478d9ce1edccf8a2b34076 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:829753c45993c1d10ccc58b8b0e4665a2369758ee9ba6ec888622973dece1929 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..45823eeb4226e28e6496620cdad8bf043090caf8 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:920a6c09a20e8ff5bbade3f95b8f8360570859e1caf0b2190519348081f581af +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ce6aa2cb74e2ddbb379a171003be9636691015c0 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c282e87d74952f8a9dbc8e6be31e780944193ecad87484ff57dd1ed36386672b +size 41353495 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3e23ffe5829b587d046371ed0dda38283b0743ab --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:102ad91e8d997fae46b3de4ee50574f47797e44457a6aac01929b311a238b6a9 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5efbe50c1ae7f860d4b0680ed4a7b08664055a5e --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14d9b66a4a91bd3c69b1585140597de13c33fd60d0c3dd7aa75648f08cdf47e3 +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a58a62f43fb3433eedca29e2386e433bd3d3be2 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c20ea613fa9f8924748acc5b0d964061b65a5fc12908cf4bc5c5581f67d6da2 +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b4a91418725409b58aadf1593719b794b85a7e0 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22547b961495fd7a2955c60827e407fab7776f7b6b4f658a35f33f99539bdcbf +size 41353698 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..19f7d0f1c026e7c89586b729c678e5b046f97aaf --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:816e6474b230f1fd36bb3b6f4836d2b65514e44f9d2c327eb383949e9d140a2b +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..16b34b0d66edf0b9f1dbdb0b58dbf05a335d824f --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f4b9a274935f4e0af784ced2e68f8788c5429b8a4e9da955feb2499198dc41a +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dc42fd40bb7e3d3dbc4e1fe7360be3177c7fcb47 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6709e0666e0a800fd39ef70c91f1f91ca080144dfc99327f3be68817c53baa87 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4e37b24a20924f87544272fd213ffdbf67848f65 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:374b796e5dec8f2745bdd63fcf6885582f754bd1d4492b2599d2dfba98a3a0fc +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bfb1da6c371d64755f035b9f9decc58434b9d39d --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f86b2fc330a4051358cf2b4f8b674132fd53463507c4a54c20e2534d5e5a3801 +size 41353698 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5f6b0f2fe47129dbc7dccd18a1b3933ad19ed512 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0520738464c2c743e84cbc5ddff81ecf75741c096bb3c5072354a15396800342 +size 41353442 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..952f7f01489d641502f7c45979cfa76bfce2f05a --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:984f8cf4ffc922dfa352268e8feae18898a7961328bed245b8a2c9493a943075 +size 41353495 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80b4bd0e6e48e61acdf5e9eab6e4efd584d9edec --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a93d2a96b2ca22720e48fc05140be1ee5df9b853d9f68e7bbd0ee36b3060d64e +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1d4472868b51fc09298c280654fd9491ef88fdac --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:762139af8c713aab73cd70a7838917752968795c16df08d20a582e2024d1e46b +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f79a4951a440edefc1e04df7e7cb456747d8c847 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7dd701891a28698bfb194473021c54888e3d283ca02e60b5508005d4389323f9 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..877bcde6e476d3e4bc547638e61817983cb982a4 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b56c4d3d35e2af58598c31b106b1776f4a27ddedd90a8c20443c23550596ac8 +size 41353698 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc269fea2c78504ffeeb0cecee75237e0183b72e --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:41143b3fde148ec659bb74d91d1f385189dd5f2b26b16a09442449050d89663a +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6976e3661be5000590e8a3ef3e4302c6e0d57465 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d99c95a7280300f4b206852c76ea5b7927124a4905f8b2fcaf0a851f4dcd8b6 +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2964841718d5a0f64b49e56b511f1f594a6f29aa --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:08de23da3e61550f18761bbf44a6e43e00dcac9d660af7540393932b021e68f2 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..26c06234ac35f4feecae59402cb0e257b8e0e5d3 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd22ba457d546d753e83d168a34d500d56d2dceb6878ba2e7d2d4d30fff74bd7 +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7598c71f9d2a55ab9f90780076bb718c1e8954ce --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d511d9943811391a9f260a29b86ec7ea760277db731cd0133c740a797e34ce2e +size 41353378 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eb2e66c41fd609481ce77b1ebcb10d421fc4dd38 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:377437a542311db34b48c9cf5f0ae9d90b266ec91038b8b4104074aa34aa77e3 +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e34f8f4471c42c5204baeb649df188cd95d32083 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9af63fa50451dfc754352706608a7b4e5dc039f3e2aed91ffdfae7f822f766b2 +size 41353559 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f1b977cc54676482b46bee9e1b20ac492c4bba5c --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:804c95adc1c8884c4690b6c0d8c142dde6e2fbcfa58358627d305a78e598194a +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0c158cdd7aa0c6f8592d513d3a47f6d240d4903c --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:98550eee7f14c46a5384419dac1111bb84199829f4d56fe397399108567621d1 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2562077e28dc170d6084d889b86f342fa95b8574 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:49fcd4002a66d12be587a6504d9d574147188d66360650c9d2e4ee8872c283a3 +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..51877f7ddaf0bc44f04a7bd2287be500e13383ec --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b9b0eeea47ae7e3f29150b806522ddd24b559b3b621ad356b1f31484703cd375 +size 41353442 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d187eeae3b1ba75e864af0054cd2b6fdbec2a349 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97ea30b54c5c246cdc79feb113ad303e9ac3194b93950db4f0701a254b8e9501 +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..60387903e55f9cb39263f3b7b52c32de69a35135 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0adaf67c591de07b3c6f00d9ee16551f38d3f546ff35c9a03b672f7cf264ae6f +size 41353378 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..df6b465bb58e8a7ccd7f94d28aa55e518cd26d63 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:04cbe967b61f1459ad6bdeca439ff81e35876e6118c6bd9a7afad8d1dbfae936 +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eb723683cd73e47680b9389aea8438ef51be5f44 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0872b25ffe2f82b29c2a59cdc9bb3eb1ddd0c47c463e3bf247988d7ae0e6f28d +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6db51d643de6ee33f64230063d22b45a2bb31df9 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d99cbee03153a934143965bd0a6fe584b42678c642b23721e4767e0bf82e6cbf +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..501f50baba093338e9a875007fd051d2696b73c8 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:758c9645202600a1fead450da036265ebe89d877f830ed5f106a058a692b81f7 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fab9ce3077e19848bec86e4228d00167bc64987d --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:99b4aa36bceaebc608838c9d65a4710b13ecbef7f135edb6b11dd7cf44ee9204 +size 41353495 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0496fc905d7f9348b0f77d33af7870d318b38733 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a42c8ae4d64669860cd5d33884d4f30dc78f8ca79d9daa2a56f2968417c1813 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8648d8036c7eaf5921c1487ecccfe823fdd6a8c7 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:61f7495694bd97c895a5d38d8e0ede868665150235711a2848ebbda1c78cc034 +size 41353634 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..84bab470ccbbf509a532e24771f04a0135f60a25 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47d3d3614f0720c8bc78924313ecf58d4fcf9e509976804ffef738913ead00c3 +size 41353570 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..39da71dc9a1168209b17cec5af33cbaef3327d43 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:004266136601786b3588613395c1a3bcaa9112aadc1204d261d7bec7494f19ce +size 41353506 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..98b3a566ae00536744594221c3017b07cbe56ad7 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f0069a3e27a0ce12c83a0ce5586b93228113ec484426b99a79b8b00c31cd6561 +size 41353431 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..38a8d51fcd8eee35f74e83465c394b1286921bc4 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f6d40d91cb1d60e7e610aee93e40817563ee10ffe030736f5b662942367bfc0 +size 41353495 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d387ab52b815e5f9865e471abfa34e01f68ce8e3 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abd8bed53cedd0fb1f473d3f6370dc323d4775f564e2e39dd4c6c89100348672 +size 41353559 diff --git a/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c905bf77f59bafc6034134ef494b2ae07bf7441a --- /dev/null +++ b/220m3b9100mdedup/global_step7508/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e1d1509c3b154b3680a48ede4635493d9dd8374b656419ffac174be319614d0 +size 41353495 diff --git a/220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b01294e1c113f18fe310a4ddaf2a303241cf75b4 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02595fc129ccc64c9efb8e10dc9b1a843c666a3b03825b4db6c9875d68427935 +size 93816067 diff --git a/220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..da3d42179219376f8109c86be1dc124b575573c1 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf3e22b895d660275d4ba8e46b13a62e591cc9822d68c60066f56abdb1c44e56 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5d34c03868eef4615dcb04ebd70b1a2b6be3c606 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5b401a347b9eef4ea916327e79f7e6aa2184faafc44dcb466a19600728814ec2 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..96fcf9b67b6f895339943f129c5278c8bd3d78cb --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:41e6be6b29f1db63903831ab53b90e548f600b3f6e160cbfcb5bb74f37d20673 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b3a770d987d15c84c8b603786f4a2af8220db10d --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5de686eb23ccd5bff4a13b3b25aaa9221dc4a5757dd337c7ab6f2fcdbda4a320 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cea66604b682418e817c941fc19a142206147471 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d89273c2284f9c81332203a15af182fdd463827be5c3afed6a83add2d9226fa2 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..07e4f566046f9bcf1d4a8d611e1c3d3439b57a46 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b0605cc1db68d071f49fefc2ed42e2e1958244f31b38582f083c6a4f351488f +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..86ac0dd60b2e5a438c2240fcb342ce8b26e9f7e6 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:950d15d40a7af40f1865dbdee1d6133efa31f1c298dfbaed2d253756fbae6011 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b3af8658686ce9f0f232acd34decca30430348a4 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:da73093691ffefb0a4b37adaf0d51d5c0bfd77993d245492681ec2c8786a0edd +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4219c2b57654c0072b4eca395bc11da7c54d16a7 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8940e5daf7cd30704523ac6e5636d6d70cd2724972a21e0fc3b24bf4034b68ef +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..740bb5362c62cb7e81c7ec04deab31ef21748392 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ba3542030177f1787ef3faee08b3e2de2235a026ff7d9a0b16a90c02251d55f +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b3519226c8ae088d9df091df13d087f70f3c24d3 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b151d97e243b99f7e03634b7f389ee346504fad0ac5fa8f8bcf45d77c99dbe5 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..79f674553073cea1eab84c04ba10f2250417bd40 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d680f059b4159542bddf809587671bb2a7de36b0178f4ddb37bc43a643b09b9 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0aa9cf6523aaad0d5773bce8538e28cfbe288117 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f68f1e88e31e6c37a7bf0edd5bbc3b5c3482ef50c3d014605d702207dcf5dbdd +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..03e1f0464c8737c59d2723688a4e5931a465a535 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:febe3f434271e3f51f98ac6cb547bf38e7fcbc0b654331bad12504707faffbdf +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f6cd49a164327a47cab2c8db5ab9e06685a7204b --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bcfd131b953de2a1e1ba4f362baff242e84a04f9d4459dc93b4811ef6020a3b8 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3913972f516b290df2731a134de6053b9b890c49 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bfdb866ac8815c05b78b52d2d842a2ea1f3220848adc9fce2d929d02b70b58cb +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a98bc33454c1dd25fccf55a35fbc34b7253d7e10 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dbf3425b14273e52930bc9f0b352831d3b31b87a3c08eaaf4b9759043a620525 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1c8ed60ba57601c261168d5713f8da8f491edcd5 --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dc9721174092f026c9c54227819afe393a9b1d9a6e05f71f3749ab9e2af95211 +size 19295235 diff --git a/220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt b/220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6dde9fccf9414f47dd4ef77504b587ad831c524b --- /dev/null +++ b/220m3b9100mdedup/global_step7508/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:118b27dd5e803e03be805031facfca2912e0634feec382e00186693ec3d07296 +size 4803 diff --git a/220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt b/220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3df2387ee9d12b727a1901ecbb67355fbaa1cb0b --- /dev/null +++ b/220m3b9100mdedup/global_step7508/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:41fe0646725e5b0edb40663c6aff208969d1a59fb5f27461eb6567e29387f7b2 +size 37747 diff --git a/220m3b9100mdedup/sbatch_220m3b9100mdedup.sh b/220m3b9100mdedup/sbatch_220m3b9100mdedup.sh new file mode 100644 index 0000000000000000000000000000000000000000..415b2221d0293463d8b14fb3b9ae18b97032b64d --- /dev/null +++ b/220m3b9100mdedup/sbatch_220m3b9100mdedup.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=220m3b9100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100mdedup.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_217M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 3936562000 +# -> Samples: 1_922_149 +TRAIN_SAMPLES=1_922_149 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 19_221 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/220m3b9100mdedup/sbatch_220m3b9100mdedupval.sh b/220m3b9100mdedup/sbatch_220m3b9100mdedupval.sh new file mode 100644 index 0000000000000000000000000000000000000000..3053af5e738b72d07f4cbe5f84ab9d952f8ed4f1 --- /dev/null +++ b/220m3b9100mdedup/sbatch_220m3b9100mdedupval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=220m3b9100mdedupval +VARIANT_CKPT=220m3b9100mdedup + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_7B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_217M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 7510000000 +# -> Samples: 3_666_992 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-only true \ + --eval-iters 100 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/220m3b9100mdedup/tensorboard_220m3b9100mdedup/events.out.tfevents.1679002446.nid006541.57653.0 b/220m3b9100mdedup/tensorboard_220m3b9100mdedup/events.out.tfevents.1679002446.nid006541.57653.0 new file mode 100644 index 0000000000000000000000000000000000000000..48bd1ba41d3852488a90b5ed342126ea3b355290 --- /dev/null +++ b/220m3b9100mdedup/tensorboard_220m3b9100mdedup/events.out.tfevents.1679002446.nid006541.57653.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d88432eb1695f3f617a1daa853484898ef4cda99d9234055c9c069d3a69e9892 +size 13358814 diff --git a/220m3b9100mdedup/tensorboard_220m3b9100mdedupval/events.out.tfevents.1679007130.nid005116.105519.0 b/220m3b9100mdedup/tensorboard_220m3b9100mdedupval/events.out.tfevents.1679007130.nid005116.105519.0 new file mode 100644 index 0000000000000000000000000000000000000000..449474062893a618acff12c93ff250008b2c9b60 --- /dev/null +++ b/220m3b9100mdedup/tensorboard_220m3b9100mdedupval/events.out.tfevents.1679007130.nid005116.105519.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:13d02a561786fd3baba2cd5adb363127d388454606512a2cbd0b742f4b63bbc5 +size 980 diff --git a/220m7b5400m/3319352.err b/220m7b5400m/3319352.err new file mode 100644 index 0000000000000000000000000000000000000000..9a86da1ceac5d08a80b909f55a859de11bd67a54 --- /dev/null +++ b/220m7b5400m/3319352.err @@ -0,0 +1,1099 @@ +2: 2023-03-16 09:01:53.489471: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490092: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490095: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: 2023-03-16 09:01:53.489587: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.489620: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.489623: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: 2023-03-16 09:01:53.489899: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490116: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.489635: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.489640: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.489938: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490126: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490146: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.489667: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 09:01:53.489682: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.489971: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490153: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 09:01:53.490151: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.489981: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.489995: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.490005: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.490007: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.490398: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.490440: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 09:01:53.490101: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.490473: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.490491: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.490508: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.490524: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.490537: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 09:01:53.490543: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490576: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490575: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490623: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490661: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490665: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490681: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490653: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 09:01:53.490664: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.491234: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.491252: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.491254: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: 2023-03-16 09:01:53.491244: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491252: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491239: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.491267: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491274: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.491289: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.491291: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.491281: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: 2023-03-16 09:01:53.491274: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491287: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491287: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 09:01:53.491283: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 09:01:53.491292: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.491564: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.491576: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.491584: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.491573: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.491595: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.491602: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.491606: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:01:53.491619: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 09:02:08.598644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.598681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.598703: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.599176: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.598757: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.598764: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.598774: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.599209: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.598794: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.599231: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.598800: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.599296: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.599702: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.599723: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.599261: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.599754: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 09:02:08.599763: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.599778: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.599279: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 09:02:08.599784: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 09:02:08.599786: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:08.599789: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 09:02:08.599295: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.599267: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-16 09:02:08.599525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.599894: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.599557: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-16 09:02:08.599929: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.599954: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.599591: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-16 09:02:08.599968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.599975: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.599578: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-16 09:02:08.599984: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:08.599991: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.599646: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-16 09:02:08.599999: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599616: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.599654: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:08.600154: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600173: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600183: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600197: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600203: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.600089: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-16 09:02:08.600217: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600222: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 09:02:08.600224: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.600308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.600115: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.600164: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.600172: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.600179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 09:02:08.599958: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.600152: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.600171: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.600619: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.600651: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.600654: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.600667: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:08.600671: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.600684: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.600693: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 09:02:08.600701: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.599988: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.600009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.600037: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.600050: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.600071: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.600103: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.600051: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:08.600724: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.600738: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.600779: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.600806: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.600797: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.600819: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.600824: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 09:02:08.600827: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:08.600382: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.600419: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.600440: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.600497: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.600512: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:08.600598: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 09:02:08.600514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.600523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.600551: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:08.600628: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:08.600697: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 09:02:08.601230: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:08.600650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 09:02:08.601266: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:08.600711: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 09:02:08.601290: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:08.600767: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:08.600662: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.601322: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:08.601340: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.600694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:08.601347: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:08.601365: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601364: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601396: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:08.601387: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601421: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601452: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601454: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601465: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601482: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 09:02:08.601484: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.601616: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.601647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.601662: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.601674: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.601709: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.601709: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.601744: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.601713: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:08.602265: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.602286: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.602299: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.602307: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.602311: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.602313: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.602325: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 09:02:08.602329: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 09:02:40.357280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.357312: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.357328: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.357353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.357364: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.357371: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.357388: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.357398: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.359287: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.359302: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.359294: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.359292: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.359296: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.359301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.359300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.359306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.359306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 09:02:40.359319: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.359319: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.359322: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.359324: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.359325: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.359327: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 09:02:40.359329: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.366004: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366036: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366064: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366078: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366098: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366100: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.366126: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.366432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.366476: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.366457: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.366503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 09:02:40.367054: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 09:02:40.366661: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366827: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.366471: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-16 09:02:40.366820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.366523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.366690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366858: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.366487: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-16 09:02:40.366854: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.366535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 09:02:40.367094: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-16 09:02:40.366500: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-16 09:02:40.366865: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.366547: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 09:02:40.367115: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.366505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-16 09:02:40.366902: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.366703: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366876: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.366553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 09:02:40.367128: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.366902: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.366721: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366902: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.366566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 09:02:40.367133: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.366910: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.366732: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366908: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.366629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 09:02:40.367151: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.366520: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-16 09:02:40.366914: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.366745: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366918: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.367346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.366582: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-16 09:02:40.366940: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.366753: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.367348: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.366816: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 09:02:40.366934: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.368048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.368048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.368051: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.368051: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.368051: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.368054: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.368067: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.368067: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.368067: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.368074: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.368077: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.368076: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.368096: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.368100: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 09:02:40.368110: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 09:02:40.368114: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369425: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369434: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369435: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369436: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 09:02:40.369442: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369442: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369447: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369453: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369452: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369452: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369454: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 09:02:40.369455: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.369856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.369862: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.369872: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.369866: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.369865: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.369877: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.369873: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.369991: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 09:02:40.369869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.369870: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.369890: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.369889: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.369996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: 2023-03-16 09:02:40.369891: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.369893: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 09:02:40.369894: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.369912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.369998: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 09:02:40.369932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.369996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.369994: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.370002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370185: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.370007: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370004: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 09:02:40.370177: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.370012: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370014: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.370005: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 09:02:40.370177: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.370017: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370018: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 09:02:40.370020: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370181: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 09:02:40.370023: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 09:02:40.370026: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370184: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 09:02:40.370179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370185: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370190: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 09:02:40.370179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370191: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 09:02:40.370182: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370202: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 09:02:40.370203: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370205: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370206: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370208: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370209: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 09:02:40.370210: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.370182: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.370183: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.370184: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.370185: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.370186: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 09:02:40.370197: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.370197: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.370199: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.370203: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.370202: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.370214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.370216: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 09:02:40.370218: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370289: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370290: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370289: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370293: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370296: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370317: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370318: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370319: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370320: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370320: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370322: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370362: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 09:02:40.370378: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 09:02:40.370384: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Loading extension module scaled_masked_softmax_cuda... +0: Loading extension module fused_mix_prec_layer_norm_cuda... +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +1: Building extension module utils... +1: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: +6: +6: +1: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +5: Building extension module utils... +5: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +5: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +1: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +6: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils...Loading extension module utils... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +3: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Loading extension module utils... +3: +3: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +3: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: +0: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +0: +0: +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: +0: Loading extension module utils... +0: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +7: +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: Loading extension module utils...Loading extension module utils... +7: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +2: +5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils...Loading extension module utils... +5: +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/220m7b5400m/3319352.out b/220m7b5400m/3319352.out new file mode 100644 index 0000000000000000000000000000000000000000..eb731cbbfb8e2e7d10d6e095a1ba99e759d4edab --- /dev/null +++ b/220m7b5400m/3319352.out @@ -0,0 +1,6400 @@ +Model parameters: d_model 896 ffw_size 3584 kv_size 64 n_heads 14 n_layers 18 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 18 --hidden-size 896 --num-attention-heads 14 --kv-channels 64 --ffn-hidden-size 3584 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-220m7b5400mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-only true --eval-iters 100 --tensorboard-dir tensorboard_220m7b5400mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_220m7b5400m --load checkpoints_220m7b5400m --train-weighted-split-paths-path train400m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3319352.json --zero-stage 0 +START 3319352: Thu 16 Mar 2023 09:00:48 AM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 41.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 43.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 46.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 42.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 43.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 41.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 46.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 41.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 47.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 42.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 44.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 43.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 40.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 36.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 48.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 44.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 41.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 38.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +2: Launching on nid006586 (2/8), master nid006584 port 9999, GPUs 8, CUDA: True +6: Launching on nid006590 (6/8), master nid006584 port 9999, GPUs 8, CUDA: True +3: Launching on nid006587 (3/8), master nid006584 port 9999, GPUs 8, CUDA: True +0: Launching on nid006584 (0/8), master nid006584 port 9999, GPUs 8, CUDA: True +5: Launching on nid006589 (5/8), master nid006584 port 9999, GPUs 8, CUDA: True +4: Launching on nid006588 (4/8), master nid006584 port 9999, GPUs 8, CUDA: True +7: Launching on nid006591 (7/8), master nid006584 port 9999, GPUs 8, CUDA: True +1: Launching on nid006585 (1/8), master nid006584 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3319352.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3584 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 896 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-220m7b5400mval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_220m7b5400m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 14 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 18 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_220m7b5400m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_220m7b5400mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 09:03:56,717] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.096 seconds +0: > compiling and loading fused kernels ... +0: >>> done with compiling and loading fused kernels. Compilation time: 27.834 seconds +0: time to initialize megatron (seconds): -4.512 +0: [after megatron is initialized] datetime: 2023-03-16 09:04:27 +0: building GPT model ... +0: [2023-03-16 09:04:27,618] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 09:04:27,619] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 09:04:27,619] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.91 GB, percent = 6.1% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-16 09:04:29,620] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=25 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: undo +0: 22: MixedFusedLayerNorm +0: 23: EmbeddingPipe +0: 24: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 09:04:29,977] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 09:04:29,978] [INFO] [utils.py:828:see_memory_usage] MA 0.42 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-16 09:04:29,978] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.92 GB, percent = 6.1% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 09:04:29,980] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 09:04:43,188] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 09:04:43,188] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 09:04:43,188] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 09:04:43,194] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 09:04:43,194] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 09:04:43,313] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 09:04:43,314] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-16 09:04:43,314] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.62 GB, percent = 6.3% +1: ninja: no work to do. +1: Time to load utils op: 0.3071098327636719 seconds +5: ninja: no work to do. +5: Time to load utils op: 0.19433808326721191 seconds +0: Time to load utils op: 0.20867300033569336 seconds +0: Time to load utils op: 0.20849275588989258 seconds +0: Time to load utils op: 0.20893549919128418 secondsTime to load utils op: 0.20866918563842773 secondsTime to load utils op: 0.20865678787231445 seconds +0: +0: +0: Time to load utils op: 0.2086954116821289 seconds +0: Time to load utils op: 0.2084941864013672 seconds +1: Time to load utils op: 0.20375847816467285 seconds +1: Time to load utils op: 0.2033531665802002 seconds +1: Time to load utils op: 0.20311594009399414 seconds +1: Time to load utils op: 0.20340776443481445 seconds +1: Time to load utils op: 0.20307493209838867 seconds +1: Time to load utils op: 0.20284628868103027 seconds +1: Time to load utils op: 0.20331978797912598 seconds +5: Time to load utils op: 0.20536375045776367 seconds +5: Time to load utils op: 0.2054884433746338 seconds +5: Time to load utils op: 0.20558619499206543 secondsTime to load utils op: 0.20558643341064453 seconds +5: +5: Time to load utils op: 0.20595359802246094 seconds +5: Time to load utils op: 0.20601677894592285 seconds +5: Time to load utils op: 0.20600390434265137 seconds +7: Time to load utils op: 0.20816421508789062 seconds +7: Time to load utils op: 0.20815706253051758 seconds +7: Time to load utils op: 0.2081758975982666 seconds +7: Time to load utils op: 0.20821690559387207 seconds +7: Time to load utils op: 0.20820116996765137 secondsTime to load utils op: 0.20862126350402832 secondsTime to load utils op: 0.20824933052062988 seconds +7: +7: +2: Time to load utils op: 0.21291565895080566 seconds +2: Time to load utils op: 0.21292686462402344 seconds +2: Time to load utils op: 0.21294927597045898 seconds +2: Time to load utils op: 0.212982177734375 seconds +2: Time to load utils op: 0.2129969596862793 secondsTime to load utils op: 0.21297526359558105 secondsTime to load utils op: 0.21300530433654785 seconds +2: +2: Time to load utils op: 0.21300554275512695 seconds +2: +4: Time to load utils op: 0.21247482299804688 seconds +4: Time to load utils op: 0.21245503425598145 seconds +4: Time to load utils op: 0.21250295639038086 seconds +4: Time to load utils op: 0.21253252029418945 secondsTime to load utils op: 0.21253347396850586 seconds +4: +4: Time to load utils op: 0.21254444122314453 seconds +4: Time to load utils op: 0.21253108978271484 secondsTime to load utils op: 0.21254849433898926 seconds +4: +6: Time to load utils op: 0.21042990684509277 secondsTime to load utils op: 0.21043062210083008 secondsTime to load utils op: 0.210432767868042 seconds +6: +6: +6: Time to load utils op: 0.21045207977294922 secondsTime to load utils op: 0.21046233177185059 seconds +6: +6: Time to load utils op: 0.2104630470275879 seconds +6: Time to load utils op: 0.21047711372375488 seconds +6: Time to load utils op: 0.21045947074890137 seconds +3: Time to load utils op: 0.21187448501586914 seconds +3: Time to load utils op: 0.21215415000915527 seconds +3: Time to load utils op: 0.21190214157104492 seconds +3: Time to load utils op: 0.21218371391296387 seconds +3: Time to load utils op: 0.2088029384613037 seconds +3: Time to load utils op: 0.21219873428344727 seconds +3: Time to load utils op: 0.21219778060913086 secondsTime to load utils op: 0.21221137046813965 seconds +3: +7: Time to load utils op: 0.5044615268707275 seconds +0: Time to load utils op: 0.4043433666229248 seconds +1: Time to load utils op: 0.0005488395690917969 seconds +1: Time to load utils op: 0.0005395412445068359 seconds +1: Time to load utils op: 0.0006256103515625 seconds +1: Time to load utils op: 0.0004811286926269531 seconds +1: Time to load utils op: 0.00045752525329589844 seconds +1: Time to load utils op: 0.0004673004150390625 seconds +1: Time to load utils op: 0.0004451274871826172 seconds +1: Time to load utils op: 0.0005104541778564453 seconds +3: Time to load utils op: 0.0004706382751464844 secondsTime to load utils op: 0.0003542900085449219 seconds +3: +3: Time to load utils op: 0.00035762786865234375 seconds +3: Time to load utils op: 0.0004761219024658203 seconds +3: Time to load utils op: 0.00041484832763671875 secondsTime to load utils op: 0.00039458274841308594 secondsTime to load utils op: 0.0004279613494873047 seconds +3: +3: +0: Time to load utils op: 0.0004329681396484375 seconds +3: Time to load utils op: 0.0004169940948486328 seconds +7: Time to load utils op: 0.00047206878662109375 seconds +0: Time to load utils op: 0.0004725456237792969 secondsTime to load utils op: 0.00047516822814941406 secondsTime to load utils op: 0.00048041343688964844 secondsTime to load utils op: 0.0004792213439941406 seconds +0: +0: +0: +0: Time to load utils op: 0.0004875659942626953 secondsTime to load utils op: 0.000476837158203125 seconds +0: +7: Time to load utils op: 0.0005025863647460938 seconds +7: Time to load utils op: 0.0005278587341308594 seconds +7: Time to load utils op: 0.00045609474182128906 seconds +7: Time to load utils op: 0.00046825408935546875 secondsTime to load utils op: 0.00048828125 seconds +7: +7: Time to load utils op: 0.0006608963012695312 seconds +2: Time to load utils op: 0.0008492469787597656 seconds +2: Time to load utils op: 0.0008761882781982422 seconds +5: Time to load utils op: 0.0008141994476318359 seconds +5: Time to load utils op: 0.0009665489196777344 seconds +5: Time to load utils op: 0.0008471012115478516 seconds +5: Time to load utils op: 0.0007891654968261719 seconds +5: Time to load utils op: 0.0010819435119628906 seconds +5: Time to load utils op: 0.0011050701141357422 seconds +2: Time to load utils op: 0.001299142837524414 seconds +5: Time to load utils op: 0.0010921955108642578 seconds +2: Time to load utils op: 0.0012602806091308594 seconds +5: Time to load utils op: 0.001007080078125 seconds +2: Time to load utils op: 0.00124359130859375 seconds +2: Time to load utils op: 0.0012180805206298828 seconds +2: Time to load utils op: 0.001249074935913086 seconds +2: Time to load utils op: 0.001294851303100586 seconds +6: Time to load utils op: 0.001024007797241211 seconds +7: Time to load utils op: 0.00036406517028808594 seconds +6: Time to load utils op: 0.0012106895446777344 seconds +6: Time to load utils op: 0.0013713836669921875 seconds +6: Time to load utils op: 0.0012950897216796875 seconds +6: Time to load utils op: 0.001241922378540039 seconds +6: Time to load utils op: 0.0012912750244140625 secondsTime to load utils op: 0.0012516975402832031 seconds +6: +6: Time to load utils op: 0.001379251480102539 seconds +4: Time to load utils op: 0.0009162425994873047 seconds +4: Time to load utils op: 0.000850677490234375 seconds +4: Time to load utils op: 0.0009992122650146484 seconds +4: Time to load utils op: 0.0010080337524414062 seconds +4: Time to load utils op: 0.0010063648223876953 seconds +4: Time to load utils op: 0.0007655620574951172 seconds +4: Time to load utils op: 0.0007340908050537109 seconds +4: Time to load utils op: 0.0011243820190429688 seconds +0: [2023-03-16 09:04:43,844] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 09:04:43,845] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.41 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-16 09:04:43,845] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.77 GB, percent = 6.3% +0: [2023-03-16 09:04:43,963] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 09:04:43,964] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB +0: [2023-03-16 09:04:43,964] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.77 GB, percent = 6.3% +0: [2023-03-16 09:04:44,069] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 09:04:44,069] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB +0: [2023-03-16 09:04:44,070] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.77 GB, percent = 6.3% +0: [2023-03-16 09:04:44,175] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 09:04:44,176] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 09:04:44,176] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.77 GB, percent = 6.3% +0: [2023-03-16 09:04:44,280] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 09:04:44,281] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 09:04:44,281] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.77 GB, percent = 6.3% +0: [2023-03-16 09:04:44,387] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 09:04:44,388] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 09:04:44,388] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.77 GB, percent = 6.3% +0: [2023-03-16 09:04:44,492] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 09:04:44,492] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 09:04:44,492] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.77 GB, percent = 6.3% +0: [2023-03-16 09:04:44,601] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 09:04:44,601] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 09:04:44,602] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.77 GB, percent = 6.3% +0: [2023-03-16 09:04:44,706] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 09:04:44,707] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 09:04:44,707] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.77 GB, percent = 6.3% +0: [2023-03-16 09:04:44,707] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 09:04:44,707] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 09:04:44,707] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 09:04:44,707] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 09:04:44,708] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-16 09:04:44,709] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 09:04:44,710] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 09:04:44,710] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-16 09:04:44,710] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 09:04:44,710] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 09:04:44,710] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 09:04:44,710] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 09:04:44,710] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.000423431396484375 seconds +0: [2023-03-16 09:04:44,710] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-16 09:04:44,721] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=25 [0, 25) STAGE_PARAMS=220527104 (220.527M) TOTAL_PARAMS=220527104 (220.527M) UNIQUE_PARAMS=220527104 (220.527M) +4: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +2: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +7: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt... +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +1: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +0: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +7: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +2: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/mp_rank_00_model_states.pt. +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:44,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:44,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +4: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +3: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +5: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +7: [2023-03-16 09:04:44,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:44,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +0: [2023-03-16 09:04:44,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:44,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:44,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +1: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:45,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +2: [2023-03-16 09:04:45,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt... +6: [2023-03-16 09:04:45,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:45,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:45,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:45,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:45,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:45,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:45,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:45,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:45,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:45,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:45,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:45,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:45,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:45,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:45,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:45,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:45,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:45,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:45,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:45,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +5: [2023-03-16 09:04:45,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +3: [2023-03-16 09:04:45,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:45,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:45,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:45,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:45,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +0: [2023-03-16 09:04:45,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +6: [2023-03-16 09:04:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +7: [2023-03-16 09:04:45,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +4: [2023-03-16 09:04:45,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:45,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:45,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:45,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +1: [2023-03-16 09:04:45,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_01-model_00-model_states.pt. +2: [2023-03-16 09:04:45,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +5: [2023-03-16 09:04:45,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +5: [2023-03-16 09:04:45,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +7: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +6: [2023-03-16 09:04:45,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +0: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +2: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +4: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt... +3: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +4: [2023-03-16 09:04:45,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +6: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +2: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +3: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +1: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +7: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_03-model_00-model_states.pt. +0: [2023-03-16 09:04:45,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +5: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +7: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +4: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +0: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +3: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +2: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +1: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt... +6: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +2: [2023-03-16 09:04:45,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +5: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +4: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +7: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +0: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +6: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +3: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_04-model_00-model_states.pt. +1: [2023-03-16 09:04:45,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +5: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +5: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +2: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +3: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +1: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +6: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +0: [2023-03-16 09:04:45,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt... +7: [2023-03-16 09:04:45,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +7: [2023-03-16 09:04:45,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +4: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +3: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +2: [2023-03-16 09:04:45,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +6: [2023-03-16 09:04:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +1: [2023-03-16 09:04:45,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_05-model_00-model_states.pt. +0: [2023-03-16 09:04:45,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +6: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +2: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +4: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +3: [2023-03-16 09:04:45,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +1: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +0: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt... +5: [2023-03-16 09:04:45,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +5: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +7: [2023-03-16 09:04:45,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +4: [2023-03-16 09:04:45,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +3: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +6: [2023-03-16 09:04:45,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +2: [2023-03-16 09:04:45,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +0: [2023-03-16 09:04:45,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_06-model_00-model_states.pt. +1: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +7: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +6: [2023-03-16 09:04:45,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +4: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +0: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +3: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +5: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt... +1: [2023-03-16 09:04:45,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +5: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +1: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +3: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +2: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +7: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +4: [2023-03-16 09:04:45,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +0: [2023-03-16 09:04:45,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_07-model_00-model_states.pt. +6: [2023-03-16 09:04:45,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +1: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +7: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +3: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +5: [2023-03-16 09:04:45,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +0: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt... +6: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +5: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +1: [2023-03-16 09:04:45,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +3: [2023-03-16 09:04:45,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +6: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +4: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +7: [2023-03-16 09:04:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +2: [2023-03-16 09:04:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_08-model_00-model_states.pt. +0: [2023-03-16 09:04:45,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +5: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +3: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +4: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +2: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +7: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +0: [2023-03-16 09:04:45,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +6: [2023-03-16 09:04:45,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt... +1: [2023-03-16 09:04:45,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +1: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +4: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +2: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +3: [2023-03-16 09:04:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +5: [2023-03-16 09:04:45,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +7: [2023-03-16 09:04:45,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +0: [2023-03-16 09:04:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_09-model_00-model_states.pt. +6: [2023-03-16 09:04:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +1: [2023-03-16 09:04:45,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +7: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +2: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +4: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +3: [2023-03-16 09:04:45,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +0: [2023-03-16 09:04:45,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt... +6: [2023-03-16 09:04:45,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +3: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +4: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +7: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +2: [2023-03-16 09:04:45,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:45,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:45,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +5: [2023-03-16 09:04:45,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:45,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:45,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:45,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:45,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:45,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:45,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:45,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:45,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:45,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:45,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:45,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:45,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:45,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:45,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:45,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:45,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:45,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:45,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +1: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +0: [2023-03-16 09:04:45,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_10-model_00-model_states.pt. +6: [2023-03-16 09:04:45,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:45,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:45,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:45,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:45,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:45,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:45,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:45,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:45,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:45,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:45,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:45,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:46,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:46,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +1: [2023-03-16 09:04:46,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +6: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:46,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +3: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +7: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +2: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +5: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:46,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +0: [2023-03-16 09:04:46,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt... +4: [2023-03-16 09:04:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +1: [2023-03-16 09:04:46,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +4: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +5: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +6: [2023-03-16 09:04:46,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +3: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +2: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +7: [2023-03-16 09:04:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_11-model_00-model_states.pt. +0: [2023-03-16 09:04:46,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +7: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +1: [2023-03-16 09:04:46,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +1: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +6: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +3: [2023-03-16 09:04:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +4: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +2: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +0: [2023-03-16 09:04:46,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt... +5: [2023-03-16 09:04:46,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +4: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +2: [2023-03-16 09:04:46,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +6: [2023-03-16 09:04:46,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +7: [2023-03-16 09:04:46,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +3: [2023-03-16 09:04:46,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +5: [2023-03-16 09:04:46,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_12-model_00-model_states.pt. +0: [2023-03-16 09:04:46,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:46,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:46,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +1: [2023-03-16 09:04:46,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +2: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +3: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +6: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +0: [2023-03-16 09:04:46,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +4: [2023-03-16 09:04:46,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt... +5: [2023-03-16 09:04:46,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +6: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +5: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +2: [2023-03-16 09:04:46,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +1: [2023-03-16 09:04:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +0: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +3: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +4: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_13-model_00-model_states.pt. +7: [2023-03-16 09:04:46,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +1: [2023-03-16 09:04:46,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +6: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +3: [2023-03-16 09:04:46,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +0: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +5: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +7: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +1: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt... +2: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +2: [2023-03-16 09:04:46,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +5: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +4: [2023-03-16 09:04:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +3: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +0: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +7: [2023-03-16 09:04:46,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_14-model_00-model_states.pt. +6: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +2: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +4: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +3: [2023-03-16 09:04:46,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +1: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +7: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +0: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +6: [2023-03-16 09:04:46,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt... +5: [2023-03-16 09:04:46,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +5: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +3: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +4: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +2: [2023-03-16 09:04:46,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +6: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +7: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +1: [2023-03-16 09:04:46,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_15-model_00-model_states.pt. +0: [2023-03-16 09:04:46,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +5: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +5: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +7: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +6: [2023-03-16 09:04:46,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +4: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +0: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +2: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +3: [2023-03-16 09:04:46,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt... +1: [2023-03-16 09:04:46,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +4: [2023-03-16 09:04:46,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +2: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +3: [2023-03-16 09:04:46,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +1: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +7: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +6: [2023-03-16 09:04:46,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_16-model_00-model_states.pt. +0: [2023-03-16 09:04:46,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +2: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +4: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +1: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +6: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +7: [2023-03-16 09:04:46,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +6: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +0: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt... +5: [2023-03-16 09:04:46,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +5: [2023-03-16 09:04:46,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +3: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +2: [2023-03-16 09:04:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +1: [2023-03-16 09:04:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +7: [2023-03-16 09:04:46,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +4: [2023-03-16 09:04:46,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_17-model_00-model_states.pt. +0: [2023-03-16 09:04:46,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +3: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +2: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +7: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +0: [2023-03-16 09:04:46,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt... +6: [2023-03-16 09:04:46,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +3: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +7: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +5: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +2: [2023-03-16 09:04:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +1: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +4: [2023-03-16 09:04:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +0: [2023-03-16 09:04:46,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_18-model_00-model_states.pt. +6: [2023-03-16 09:04:46,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:46,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +4: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +3: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:46,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +5: [2023-03-16 09:04:46,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +0: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +7: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +2: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:46,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +6: [2023-03-16 09:04:46,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:46,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:46,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:46,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +6: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +5: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:46,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:46,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:46,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:46,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:46,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +7: [2023-03-16 09:04:46,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:46,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +4: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +3: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +2: [2023-03-16 09:04:46,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:46,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:46,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:46,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:46,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:46,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:46,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:46,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +0: [2023-03-16 09:04:46,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:46,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt... +1: [2023-03-16 09:04:46,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:46,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:46,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_19-model_00-model_states.pt. +1: [2023-03-16 09:04:46,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:46,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:46,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:46,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:46,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:46,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:47,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +7: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +4: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +2: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +1: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +6: [2023-03-16 09:04:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +0: [2023-03-16 09:04:47,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt... +5: [2023-03-16 09:04:47,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +3: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +5: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +7: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +2: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +4: [2023-03-16 09:04:47,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +1: [2023-03-16 09:04:47,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +6: [2023-03-16 09:04:47,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_20-model_00-model_states.pt. +0: [2023-03-16 09:04:47,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +0: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +5: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +6: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt... +2: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +6: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +5: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +2: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +7: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +1: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +3: [2023-03-16 09:04:47,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +4: [2023-03-16 09:04:47,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/layer_22-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +0: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +3: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +7: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +6: [2023-03-16 09:04:47,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +1: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-16 09:04:47,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +2: [2023-03-16 09:04:47,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:47,204] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +2: [2023-03-16 09:04:47,206] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +7: [2023-03-16 09:04:47,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:47,214] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +7: [2023-03-16 09:04:47,216] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +5: [2023-03-16 09:04:47,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:47,218] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +5: [2023-03-16 09:04:47,220] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +1: [2023-03-16 09:04:47,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:47,223] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +4: [2023-03-16 09:04:47,224] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +6: [2023-03-16 09:04:47,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:47,226] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +6: [2023-03-16 09:04:47,226] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +4: [2023-03-16 09:04:47,226] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +3: [2023-03-16 09:04:47,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:47,228] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +3: [2023-03-16 09:04:47,230] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +6: [2023-03-16 09:04:47,228] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +0: [2023-03-16 09:04:47,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:47,246] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +0: [2023-03-16 09:04:47,248] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +2: [2023-03-16 09:04:47,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:47,261] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +1: [2023-03-16 09:04:47,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:47,263] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +1: [2023-03-16 09:04:47,263] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +1: [2023-03-16 09:04:47,265] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +5: [2023-03-16 09:04:47,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:47,271] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +3: [2023-03-16 09:04:47,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:47,273] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +5: [2023-03-16 09:04:47,273] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +3: [2023-03-16 09:04:47,274] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +6: [2023-03-16 09:04:47,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:47,280] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +6: [2023-03-16 09:04:47,282] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +7: [2023-03-16 09:04:47,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:47,286] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +7: [2023-03-16 09:04:47,288] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +6: [2023-03-16 09:04:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:47,287] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +1: [2023-03-16 09:04:47,287] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +4: [2023-03-16 09:04:47,286] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +4: [2023-03-16 09:04:47,288] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +1: [2023-03-16 09:04:47,289] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +6: [2023-03-16 09:04:47,289] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +7: [2023-03-16 09:04:47,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:47,291] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +5: [2023-03-16 09:04:47,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:47,292] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +4: [2023-03-16 09:04:47,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:47,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,292] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +0: [2023-03-16 09:04:47,293] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +7: [2023-03-16 09:04:47,293] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +2: [2023-03-16 09:04:47,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:47,294] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +5: [2023-03-16 09:04:47,293] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +4: [2023-03-16 09:04:47,294] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +0: [2023-03-16 09:04:47,295] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +2: [2023-03-16 09:04:47,295] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +2: [2023-03-16 09:04:47,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:47,301] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +2: [2023-03-16 09:04:47,303] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +0: [2023-03-16 09:04:47,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:47,304] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +0: [2023-03-16 09:04:47,306] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +1: [2023-03-16 09:04:47,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:47,307] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +1: [2023-03-16 09:04:47,309] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +3: [2023-03-16 09:04:47,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:47,316] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +5: [2023-03-16 09:04:47,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:47,317] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +3: [2023-03-16 09:04:47,318] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +5: [2023-03-16 09:04:47,319] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +4: [2023-03-16 09:04:47,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,319] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +3: [2023-03-16 09:04:47,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:47,319] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +6: [2023-03-16 09:04:47,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,321] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +6: [2023-03-16 09:04:47,321] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +3: [2023-03-16 09:04:47,321] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +6: [2023-03-16 09:04:47,323] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +1: [2023-03-16 09:04:47,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:47,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:47,324] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +6: [2023-03-16 09:04:47,326] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +1: [2023-03-16 09:04:47,324] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +1: [2023-03-16 09:04:47,327] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +7: [2023-03-16 09:04:47,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:47,333] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +7: [2023-03-16 09:04:47,335] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +6: [2023-03-16 09:04:47,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:47,333] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +6: [2023-03-16 09:04:47,335] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +1: [2023-03-16 09:04:47,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:47,338] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +4: [2023-03-16 09:04:47,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,338] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +1: [2023-03-16 09:04:47,340] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +2: [2023-03-16 09:04:47,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,340] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +2: [2023-03-16 09:04:47,340] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +0: [2023-03-16 09:04:47,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:47,342] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +2: [2023-03-16 09:04:47,342] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +2: [2023-03-16 09:04:47,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:47,343] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +0: [2023-03-16 09:04:47,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:47,344] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +0: [2023-03-16 09:04:47,344] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +2: [2023-03-16 09:04:47,345] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +7: [2023-03-16 09:04:47,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:47,345] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +5: [2023-03-16 09:04:47,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:47,346] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +0: [2023-03-16 09:04:47,346] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +7: [2023-03-16 09:04:47,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:47,347] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +7: [2023-03-16 09:04:47,347] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +2: [2023-03-16 09:04:47,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:47,348] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +2: [2023-03-16 09:04:47,348] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +1: [2023-03-16 09:04:47,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:47,348] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +7: [2023-03-16 09:04:47,349] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +4: [2023-03-16 09:04:47,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,349] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +5: [2023-03-16 09:04:47,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:47,349] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +0: [2023-03-16 09:04:47,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:47,350] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +0: [2023-03-16 09:04:47,350] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +1: [2023-03-16 09:04:47,350] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +4: [2023-03-16 09:04:47,351] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +5: [2023-03-16 09:04:47,351] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +0: [2023-03-16 09:04:47,352] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +3: [2023-03-16 09:04:47,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:47,353] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +3: [2023-03-16 09:04:47,355] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +5: [2023-03-16 09:04:47,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:47,358] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +5: [2023-03-16 09:04:47,360] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +3: [2023-03-16 09:04:47,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:47,363] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +3: [2023-03-16 09:04:47,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +1: [2023-03-16 09:04:47,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:47,365] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +1: [2023-03-16 09:04:47,365] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +3: [2023-03-16 09:04:47,366] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +1: [2023-03-16 09:04:47,367] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +3: [2023-03-16 09:04:47,367] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +4: [2023-03-16 09:04:47,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,368] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +6: [2023-03-16 09:04:47,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,370] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +6: [2023-03-16 09:04:47,372] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +6: [2023-03-16 09:04:47,374] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +6: [2023-03-16 09:04:47,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-16 09:04:47,378] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +4: [2023-03-16 09:04:47,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-16 09:04:47,380] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +6: [2023-03-16 09:04:47,380] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +7: [2023-03-16 09:04:47,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:47,381] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +4: [2023-03-16 09:04:47,382] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +7: [2023-03-16 09:04:47,384] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +0: [2023-03-16 09:04:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-16 09:04:47,389] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +0: [2023-03-16 09:04:47,389] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +5: [2023-03-16 09:04:47,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-16 09:04:47,391] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +0: [2023-03-16 09:04:47,391] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +0: [2023-03-16 09:04:47,391] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +7: [2023-03-16 09:04:47,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-16 09:04:47,392] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +5: [2023-03-16 09:04:47,393] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +7: [2023-03-16 09:04:47,394] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +2: [2023-03-16 09:04:47,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-16 09:04:47,397] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +2: [2023-03-16 09:04:47,399] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +3: [2023-03-16 09:04:47,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_220m7b5400m/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-16 09:04:47,404] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +3: [2023-03-16 09:04:47,406] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +0: successfully loaded checkpoint from checkpoints_220m7b5400m at iteration 0 +7: time (ms) | load-checkpoint: 2690.43 +0: estimated model parameters: 0.220527104 +0: estimated model parameters without embeddings: 0.173619712 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 09:04:47 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.008175 seconds +0: number of documents: 835726 +0: > dataset split: +0: train: +0: document indices in [0, 835726) total of 835726 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.086 seconds +0: total number of samples: 195101 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.037356 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.082 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-16 09:05:01 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 20128.10 | train/valid/test-data-iterators-setup: 13220.62 +0: [after training is done] datetime: 2023-03-16 09:05:01 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.374362E+00 | lm loss PPL: 2.920563E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3319352: Thu 16 Mar 2023 09:05:26 AM EET diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3815702a4b0729a3abc9b9bdc54ccb3f98bc719f --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ee0388bb577c4efc151c69b1e952f103267feb5c39ebd9b9c78dcf4585e2e47 +size 41353495 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eb4a73f8483c2f5d2033c22f771ea4a86aa16f7d --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d041bbeeddd0cd7524a4a6f1e7e361648020519f4c835fbe1967fe88234c892 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7e58d9e7e0aad51971bbd4f52590d11751bedff --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:491ab921f7b9ba704d91b22470c640f302a3ff5f152694c170899c116662cb64 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..82f76d7fcd25ea5feb1181cbef19345c91f9200e --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:733583c10f9ef7c6d9bdf5fc84cbc78198ad7e18c81daec0c94e2681f9762186 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..96e640020dd5b4c260a538e11b2bbb024e6f2d0d --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ffec4d52155cfac72b503b0f93fc0ae1858cc3934199676f078f1a3f91657bf5 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d6fbf177127346c87c7ca89a201fe9426668cba7 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b82eb649de7c1c067deae826656a27067b675831d4390d5c332c7f1935f795fb +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3719f4e5724a6c32e176cfa07f62225ce7af1949 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:544906edf0f131b0269502cdfa2787ad8be8ac9c901a8b9f4212e9eecd2ded73 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..044e6c091d18fbe0014c3749f0144a8eabf126d8 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e3c61a7b8d4ce7de924c1543c8b72e486b72a3d79b32eab096821b85f9ba6d59 +size 41353442 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..20eddb389b2e87bc7344c3330049590d745e5012 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c52a2d79427afc35ae06785e16a4da51acc3a44c8c2a21e15f2bb571977fd184 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b7100121668a515208c231b875459015a356b100 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee5cb2bd39167868683b55c927bf2d806e7e7d6bc5a83569499d411cb1b63213 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1d7d3f892eeaa17195c256ab4bc3953e0859fec8 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c951f8d822470348ce2074e2f5ed45735fb1f7cfee69af83ab409a44dcb885f7 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6e9472fe2dffd1abe70ba186696cb59f5e053250 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb8cd3a43abf90fea40595e91a1959ab3800342164b20c227026c0f2bcae48a8 +size 41353559 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4865ca7bce9a1c82209bf9b44288f0b29131bb23 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b05ac04e930b41eee487c3d2f493ae5e6ef97142ca3fc24edc70795b8f93bf0 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0dfa42bb105cdadb554fa31573c390faf4620855 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:348db6b3fab99af47b68bfc6b6cf1ce95db1a51b91f34f51032346a2838d97e6 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..53ca1d8a1a19fa5b6aaf0c4febe96e8f512dbe4e --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5294d2ee91e98febffa039e0fe2c73a9559e28ef407c57b878e3c6cce78be817 +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..749f908660637bd72d1513789979b50457102293 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e677beb39a58f1e79f9185a032934c5a76d1dc926ddacd5f62cba06a04b4fd4d +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9499cc1237b37e2496bfe41806158b0a530efed6 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:acf36db710317cbb65db0bc1d97afb37c80255e90ca40203b3b27bd67bcdf7fb +size 41353698 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a45e03d0eeba4c598ab240686a7fc9ae4e933798 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9bf17cce70268682d1bb8043aa18c24d5305d4d3e40f024d7c7a416323214992 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b09adc4c9db6a49744da3e38efb5c1de882b1235 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59fd7e79dc0d7d0204ab809eec58c3206997b61e85e74caaddabcbd3aa7e5c13 +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c12b16c649582a8b982d5d420e2fdcc7c3cc65e0 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ef948f2710b6995ebe8827f494071511a6e07f2ff48056b50ea42ef026121b9 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..09cd31fad5f9d43d9dac58203fb738905bc7465e --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b8c46cdffb030add5e467e877825cd8f6905160b306ee9556b434fbc0f9fb0cc +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..33c636e9970284f55217ab408e9e174d5e9601a6 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:849af860de392b9156d9d78a879a9d9dee9ce44971a152b597c747b842c97763 +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d5e9c508188a970ab2ca89bd36afa33daaf02adf --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d8bd396fd37d5835991ce64b960809eff51297dadd3b7f066279b5a400041d8 +size 41353495 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f032333a56e90224caf5bad8b4d62f501fdfabd4 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d2cb0366dd2a1247e0773a66a18d6cc8cefd2db5f9fe5e59d835c235339749c5 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ef87d70af416a07ce67062638ed4836c8c64ad56 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b8cf947dfa07596a302ba36a046c8beeeaedac280bf314be60131f29db06fb57 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..15b661b927435be780fb4c31bfde37a83f83e55b --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b07fea943c84dba4428196d81949243b70e2fdd2eb716dbfa788bf593531de45 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..69b4d9b4d995838caffb28c67748766f5476f3dc --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f867018e943ff6ce263975d949cdb24f054833deaffb768095d5efd9940524b +size 41353698 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..665cf55a3dbb48c231fba410a7980653b367b3f7 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d68a75e5f411779ddc2a4247fda4e8e66da553f847320d26080f3db9588bd86c +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0c7e93e7f8c2d39cdab2766eea764efce50b7322 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0bbe6eab860a3fbe59dab6bffea2754e3876e522cc4fa198060fd4882f293fa +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a9196986cec3d4da7f33a05ee1fbd6f7575b13f --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af46815ed30af3127e04fb72186bfbe0bd6d224ced39d74b7dabcfd7b690a175 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..48b4e68772c76759a675f65fecdfaa08c8dc4b18 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d194df96723017637368fb408dc7d06c94755107855346fe6ea0aaa2cc2bacd8 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..74e03c89b478d5ffb4fb3a9015af07235629c91e --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6495a5580887e39290cfb124d4935b6e52881eb474d0b30dcc8017eac8e724f +size 41353698 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..57ba40f782dc4d5bef3146600e40f3803cda76b1 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b21f20a7d69e23fa8d159d0f44d435377dacbc9d7ab2cc86a39037e25d43afb4 +size 41353442 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aa495de645eafaf54c6aae5af3b67226c37a0257 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f98d56aff7f3a3fa7f5a3a942584fd3f60d04f816ccc3b02fce01faf5a40565 +size 41353495 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2329d6a37956f4f2ea50b9889ba702fda554ff74 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b69a5b9c05499f6f9be645d2dab20b94b3ecb81d625490d99ebadde34a1358a +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e11dabf0606c3646741847c1f1a4657606fd3628 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3445fe12299fd56eb0018b9f9c321e798efcf9ebfc8466ff64e52b1214bdf365 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e700440320303ad553c7d1af4332f1e3a829f10e --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af036264c39c94f4cb0277d26f0e1194ef1ae658412f7360797b6884a2c9b8f5 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..13e329d90611e181a57497085bde2d915f6c3b28 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09c2d16f20c8d83d9cf843fc8544be6ee086fd0a25f1c6fb3cb91c37012bc279 +size 41353698 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..75610b84b54934e7745880b3df0e0e922cb3f5d5 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b4ab4ac12dc4d870dcb75ec1791a708bf446ab5b0d16c434c29c71f3340caf4 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..149bf1c4c852030dce2ab37baaef4ca9d4e39e86 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0249beb36159855dd122488a816aba051b1e12be4f1dea7205ab9b188f384c89 +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e849ddf105601671b31ef2a522e543595b3e7739 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:29d9f47c8805a2e02cafe2f974bd779f5e060077a7cc545d7271cc084d144fd8 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..30e21db69c124ff678b05d07a7b436c1cf0c967c --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ea3902c1738160919d7de05c081f32f305199007b05f8398b82ef24106f6efd +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..232599965f409613f6e74b65fc9027b927012c98 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e90d641223595dc97ad0d62311db53d5161264a6de577220dd073753daeb0cde +size 41353378 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cf8337747c4a13e7425149c2cc4083be3a766517 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:617a0b4055aee852d084d42a69c5ec383841e9ee6eb63266ce959217f6dbb79d +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..967ecf3325b5d0f1432c8eb8bb088e3c37a4a520 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc667555df2e47de3408838c07d0a84014ff517b94de185581550fee64223b84 +size 41353559 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..004cc683b71d7af634f39cf6b1d5cff2819e2bc7 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9c92c0e96113afcfdeac9b8889eed9dceb4fd7b4eaf9c7cfce92e03201ece918 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..98d52d66a23d98944e2e315578af0b894df3bcfc --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2400b37bfb87c431c1200c84f80091ebd3ca9f0121919b9f8e2efdf4f27ab9e8 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..98a106819d1d4779d5b387b98db13632a5abce99 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d6851458145f6ce14ac231cdd3b555257c0da39f5ecc16783fe4868f687b68c +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..166c565b938c0a6bacc82f799b0a09a30af19e2d --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db4325328279fccc1896d1c8376618be523d34ff5cc13ffa791d0d031d3b1266 +size 41353442 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3d9147f6d8e3ad57ff26f9a6b50cd4057638d270 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84ccf52999090b188dfa4df9edb75ad7c0ed25f659888d0d380f2e5eacc76308 +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7887e87d31d3bb83c3cb4191b9c7832ad61bffda --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1865349e9103366f4c530415351ea753be6c58845e929047ec3dbf759a5a4dea +size 41353378 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e59d1ec3faa73cb7bc0ce81758c98f565f474f56 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c88cda48d93ea84b32fe46be45e76196e3c9f591d242dbea871e04ccd109069 +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..841d2108ff017dc51934313065ab70e60302bef6 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c86612b49dd6983be1dd5e315bebd51d243b17aec4f67e1790d7834c6025aaf9 +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4351a7da52c37327121af1a6cc2505926e897da7 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32aa113f77a5c28953f8c753d310d2b2f3fc8ebf634244c5ea9511dc1df4e426 +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3726d50482bb40225d297c6eb365aca1ea1ca670 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9d948c60adb6caab2f8b07457c590784b1105976f54ac92b9f2a4493b55cae4f +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d26bf1663cebaa468e1351af26a3201a4349b42 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cdc1df7be1a8f9cdd0747a888a95d5a00f5cea707ed3666d35e5a078a005dbbd +size 41353495 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f3bdf20abe1ac56bbbc6ad45e28fcd9e5223933e --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d88a799baaa3ab78c8d2e5b9b2d4d1d0710d4c52b74074b49451dbc4f0bdb49 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5e94c2856dbbf0aaebc72b900846f8a9bf482a9c --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e3fda1e5c297c124b7106e98f9f188b9e63a7a0e26025f1151c0017f1551e022 +size 41353634 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f945969af77e829523dc400450b08dc9358a293b --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63702022918ca4a010f3ad2e2ba7ec7f6e41cebb87d37bf72c80821ceb1c55d1 +size 41353570 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b112c5332341f0459d325a99cdea6ca1c580b343 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ed8b9a7048df810f98460edb554e799f11054f0d8c236f0023a305863931802 +size 41353506 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a654b7bb7b1d446b8e991ad35a992d2583759264 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae7dad113088079b6427f803f02d4da0e6d72c6714eaf8588f355de33070c41a +size 41353431 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c16cf5a6909b25ad723e4aa318cb316f4764723b --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e8c8c4f464edb02b9cb94c813251793856be6644a1e3b4a993dd66fcdaff53ab +size 41353495 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9178ef908aad335b7feb50c1c0efb1f031f55f46 --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:124acbb5a7bcac48f8d9b72661387c8e1f52083d720bc829b68216b67c04a538 +size 41353559 diff --git a/220m7b5400m/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/220m7b5400m/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a66c81175c664f40add06d4050f5bff977d0428d --- /dev/null +++ b/220m7b5400m/global_step14324/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ab89b0992ef5b50683eb41ffbc4c6b8dda79d6d4b82204efa3b3099a2cc83f8 +size 41353495 diff --git a/220m7b5400m/global_step14324/layer_01-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..de686f6a4144314fbe912e68930f36834d520328 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:36799102fd340d38552efbe858fd41e46f78b807b69da0d10cfb3fe098f9ecce +size 93816067 diff --git a/220m7b5400m/global_step14324/layer_03-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2541c4f3aca5acf6e05993040c188e26ec5dc76e --- /dev/null +++ b/220m7b5400m/global_step14324/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3d29103fc16c5b2534f77e0af00bd49ff9b8a631c9c3dec7631656ae5ee4578 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_04-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6857c67f51a0cfaee3e22378381380c18cbad419 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72a7bde193abef6b5ee7a75528c086fdeff0d1d4822ca857ad2d1526fa09cc81 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_05-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b36fa0566a62fb8509a7a36cad4f63d870a9bb69 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84a5763a3b13744109c966c26ff5411bad267c3a62aaa2bc6bbdb5902e4adada +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_06-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6271bd4d7166d40a8ec5baa7bc79a0ecdd46c603 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:183de92bdd420334f772fbbcc821c184f07b2e57f1788275aafc35c9f17bc042 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_07-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4f928cebdc88a5f35a2c2cb4681fdd2f25da6b5 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5865e6e68e5385b3f3605a004a8c837a02ebb9bc4f2e9c7990449382d05d97c9 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_08-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..354178842ef49c97acee427bcb19c814ffa591ee --- /dev/null +++ b/220m7b5400m/global_step14324/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8cef647c602ce60ca6d5ec93c821c52cddb0430c6042d0d58bb7fab1b8113dbf +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_09-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..373a9f53341b81e275cdaa57cf3a3deac2396efa --- /dev/null +++ b/220m7b5400m/global_step14324/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:207bbcf570d3905160a2f996294059a2fb43a397933b2caf567156b87a58bf0c +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_10-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a800c5becceca748cfd79822ad633238ee2b6185 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9501021e3626674fd7b4b6d638698db0a1800ae9dbe6ace8662d82c157bf7c17 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_11-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a341aa27154c56525f874ef1cadbb6fdbe70685b --- /dev/null +++ b/220m7b5400m/global_step14324/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6154fa851d6d555dcadc7a186e39b1df54cafe07a8ed66bf9852709f061fe0ba +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_12-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d5f9c852199c0a75fe8b6346fb21f26dfe45b61 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3084d972de166536ed8c25844e6929988010f366819c30d78bc11f3a58e80384 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_13-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1c61caaa94ea8854102ceee239494494327fdc54 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:078ed479ef9883b76a1c8c9c2b55a2af07676dda395e06adad4b6e1d2722395a +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_14-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d11712af6ba6925b56213b1add86479500622ee --- /dev/null +++ b/220m7b5400m/global_step14324/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c32a9e26d5982c080e79cab27642cbde467e6b00c3a97ec365b21ed878be74c0 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_15-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aa3dc2a52e4bbc7c568dc81c037802f0ae909bf3 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9f0e43d4ec514e0d2771d3b5ff5fa0aa8eec58693e9bf198323b4fb385208068 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_16-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a68aceb26d7d558910fdd1c39ee61c48993aaabe --- /dev/null +++ b/220m7b5400m/global_step14324/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e34ecdaf68343d22433b31075ff9ec362af67ae774348d609378da5126a4f20 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_17-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c75d501dc9a19a883b5d674b387d74e1df34c945 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17ce5ef568a6b00c3daec6bb66f24d5a94ab521cd519466c5d499d9d4fc76a73 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_18-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dbbda7b9c9ff3e43f07db08115dab174afa72ea3 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f7f993f44edb50f97300926b452bc21335f897732c8d1a963ce3b70af8720538 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_19-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee5dc7cd3e53eb56dd9e11423a9f6dc82bfae4d8 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b68c19dfa431eebe6419eb83305549d68549f2f9a297eeb2004767286f5fbde1 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_20-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b104caae57d809da7dff64f3e2afd6d8e7afeaf8 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a8302ebf13243c8ebc778a2e0254171689bf22db79b341ec45ac5efecec2b7e7 +size 19295235 diff --git a/220m7b5400m/global_step14324/layer_22-model_00-model_states.pt b/220m7b5400m/global_step14324/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f70deb75a0355f0e9c932c379653063c0d7f13e8 --- /dev/null +++ b/220m7b5400m/global_step14324/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47d8bc96eb8ab687c438d83d10bc2555c3fb4022f1ad227cf5d0730798b1562e +size 4803 diff --git a/220m7b5400m/global_step14324/mp_rank_00_model_states.pt b/220m7b5400m/global_step14324/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..589744279627ff7c9e2ce31ff935ef78803a815d --- /dev/null +++ b/220m7b5400m/global_step14324/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:748da20f92edb9fc1b82d4dfc9e278b6fb5ffeeece44fd11b8d6caf85b7b6ef8 +size 37747 diff --git a/220m7b5400m/sbatch_220m7b5400m.sh b/220m7b5400m/sbatch_220m7b5400m.sh new file mode 100644 index 0000000000000000000000000000000000000000..1247d7ddd7a3671483b46ae63c5de23ddeb349b5 --- /dev/null +++ b/220m7b5400m/sbatch_220m7b5400m.sh @@ -0,0 +1,162 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 24:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=220m7b5400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_217M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 7510000000 +# -> Samples: 3_666_992 +TRAIN_SAMPLES=3_666_992 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 36_670 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/220m7b5400m/sbatch_220m7b5400mval.sh b/220m7b5400m/sbatch_220m7b5400mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..351626b37e11bfc1f1b21bbc2b92348bbd0b3fb8 --- /dev/null +++ b/220m7b5400m/sbatch_220m7b5400mval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=220m7b5400mval +VARIANT_CKPT=220m7b5400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +# DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_7B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_217M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 7510000000 +# -> Samples: 3_666_992 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-only true \ + --eval-iters 100 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/220m7b5400m/tensorboard_220m7b5400m/events.out.tfevents.1678910263.nid005749.128864.0 b/220m7b5400m/tensorboard_220m7b5400m/events.out.tfevents.1678910263.nid005749.128864.0 new file mode 100644 index 0000000000000000000000000000000000000000..e270fb410becdd483d4c714e229225f06b5e46fb --- /dev/null +++ b/220m7b5400m/tensorboard_220m7b5400m/events.out.tfevents.1678910263.nid005749.128864.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:89b449aa18bea45fc5c0e2c9d02dacadedd38f6b197910f28a9e4d70b2ab3ece +size 25510626 diff --git a/220m7b5400m/tensorboard_220m7b5400mval/events.out.tfevents.1678950236.nid006591.28013.0 b/220m7b5400m/tensorboard_220m7b5400mval/events.out.tfevents.1678950236.nid006591.28013.0 new file mode 100644 index 0000000000000000000000000000000000000000..294455d8fd0e5f04bbac61c3c4250f3ce204668d --- /dev/null +++ b/220m7b5400m/tensorboard_220m7b5400mval/events.out.tfevents.1678950236.nid006591.28013.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e766b0aa5c287dea78a4933d9aec6b3cd7ce247813eb5ccaae4eb67f5018fb9d +size 980 diff --git a/221m32b400m/3326647.err b/221m32b400m/3326647.err new file mode 100644 index 0000000000000000000000000000000000000000..e3757a4797d6b258fae45edb6432fa3a6be8e8f1 --- /dev/null +++ b/221m32b400m/3326647.err @@ -0,0 +1,1127 @@ +7: 2023-03-16 23:09:46.130313: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:46.130316: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:46.130319: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:46.130313: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:46.130330: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:46.130338: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:46.130337: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:46.130336: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:46.130812: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:46.130820: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:46.130815: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:46.130826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:46.130825: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:46.130840: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:46.130847: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:46.130845: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:46.131289: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:46.131295: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:46.131296: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:46.131304: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:46.131285: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:46.131293: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:46.131286: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:46.131307: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:46.136023: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:46.136023: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:46.136027: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:46.136036: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:46.136041: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:46.136028: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:46.136049: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:46.136063: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:46.138136: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:46.138155: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:46.138166: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:46.138174: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:46.138181: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:46.138180: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:46.138170: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:46.138173: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:46.138655: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:46.138668: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:46.138677: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:46.138665: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:46.138684: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:46.138681: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:46.138662: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:46.138660: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:46.145515: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:46.145519: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:46.145529: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:46.145527: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:46.145533: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:46.145531: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:46.145536: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:46.145539: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:46.177635: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:46.177641: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:46.177630: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:46.177641: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:46.177628: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:46.177627: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:46.177626: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:46.177638: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:48.207993: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:48.208002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:48.207999: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:48.208007: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:48.207998: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:48.208005: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:48.208005: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:48.208008: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:48.208418: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:48.208419: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:48.208424: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:48.208424: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:48.208426: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:48.208428: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:48.208429: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:48.208432: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:48.221213: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:48.221210: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:48.221221: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:48.221227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:48.221225: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:48.221236: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:48.221260: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:48.221254: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:48.221671: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:48.221677: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:48.221680: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:48.221683: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:48.221686: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:48.221692: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:48.221693: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:48.221698: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:48.229866: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:48.229874: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:48.229875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:48.229879: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:48.229881: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:48.229884: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:48.229867: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:48.229884: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:48.230306: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:48.230309: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:48.230315: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:48.230316: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:48.230318: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:48.230322: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:48.230322: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:48.230323: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:48.269813: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:48.269820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:48.269825: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:48.269820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:48.269830: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:48.269836: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:48.269830: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:48.269840: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:48.270229: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:48.270236: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:48.270239: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:48.270241: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:48.270246: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:48.270246: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:48.270249: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:48.270254: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:48.304497: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-16 23:09:48.304495: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:48.304500: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:48.304504: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:48.304500: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:48.304506: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:48.304509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:48.304511: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:48.304512: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:48.304503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:48.304508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:48.304514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:48.304506: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:48.304511: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:48.304508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:48.304515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:48.304929: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:48.304936: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:48.304938: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:48.304943: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:48.304946: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:48.304959: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:48.304950: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:48.304951: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:48.304955: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:48.304965: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:48.304966: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:48.304968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:48.304970: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:48.304971: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:48.304973: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:48.304978: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:48.309305: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:48.309314: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:48.309318: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:48.309321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:48.309323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:48.309320: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:48.309326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:48.309330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:48.309723: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:48.309728: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:48.309733: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:48.309734: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:48.309736: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:48.309739: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:48.309739: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:48.309747: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:48.312583: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:48.312589: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:48.312594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:48.312592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:48.312594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:48.312597: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:48.312603: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:48.312597: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:48.312803: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:48.312803: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:48.312805: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:48.312808: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:48.312810: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:48.312811: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:48.312814: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:48.312817: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:53.983508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.983477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983540: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.983509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983558: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.983540: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983573: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.983553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983593: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.983750: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.983572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983600: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.983593: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.983654: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.983787: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: 2023-03-16 23:09:53.983860: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.983576: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983795: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.983617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.983687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.983800: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.983585: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983800: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.983640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.983699: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.983812: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.983650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983891: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.983649: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.983720: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.983824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983908: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.983659: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-16 23:09:53.983731: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.984037: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.983666: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-16 23:09:53.983740: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.984042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.983748: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-16 23:09:53.983748: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.984058: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983945: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.983755: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-16 23:09:53.983811: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.984064: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990131: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990132: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990132: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990133: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990136: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990135: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990136: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990148: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990152: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990149: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990156: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990156: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990157: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990159: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990160: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990527: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990531: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990530: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 23:09:53.990586: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990542: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990547: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990549: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990585: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-16 23:09:53.990551: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990549: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990583: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 23:09:53.990586: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990588: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 23:09:53.990593: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990597: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990603: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.990592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.990594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.990603: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990603: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990605: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990612: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990613: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990838: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 23:09:53.990612: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990659: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.990672: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990838: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990971: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 23:09:53.990845: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990847: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990974: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 23:09:53.990844: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990970: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990973: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990974: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:09:53.991076: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 23:09:53.990847: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990979: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 23:09:53.990849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990987: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.990987: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990854: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990855: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991077: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 23:09:53.990982: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 23:09:53.990864: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990865: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990863: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990993: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.990995: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990868: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990868: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991079: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 23:09:53.990995: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990896: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.990996: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.990997: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.991001: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990910: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991081: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.991092: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991083: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.991085: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.991088: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.991092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.991104: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991105: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991108: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991110: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991111: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991112: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.991113: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:54.010928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.010967: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.010981: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.011020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.011018: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.011039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.011048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.011052: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.013162: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.013165: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.013165: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.013168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.013168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.013168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.013169: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.013177: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:54.013185: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:54.013186: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:54.013188: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:54.013189: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:54.013190: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:54.013192: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:54.013201: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:54.013216: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:54.111573: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.111588: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.111601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.111608: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.111615: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.111620: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.111628: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.111633: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.113561: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.113562: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.113563: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.113564: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.113566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.113566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.113567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.113578: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:54.113579: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:54.113581: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:54.113583: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:54.113584: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:54.113585: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:54.113587: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:54.113618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:54.113633: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +2: Successfully preprocessed all matching files. +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +3: Building extension module utils... +3: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: +6: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Loading extension module utils... +0: Loading extension module utils... +2: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +7: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: +0: +0: Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... +0: +0: +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils...Loading extension module utils... +2: +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils... +2: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Loading extension module utils... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +3: launch.sh: line 53: 100420 Killed python -u -m torch.distributed.run --nnodes $SLURM_JOB_NUM_NODES --nproc_per_node $SLURM_GPUS_ON_NODE --node_rank=$SLURM_PROCID --master_addr $MASTER_NODE --master_port $MASTER_PORT "$@" +5: launch.sh: line 53: 97386 Killed python -u -m torch.distributed.run --nnodes $SLURM_JOB_NUM_NODES --nproc_per_node $SLURM_GPUS_ON_NODE --node_rank=$SLURM_PROCID --master_addr $MASTER_NODE --master_port $MASTER_PORT "$@" +0: launch.sh: line 53: 101208 Killed python -u -m torch.distributed.run --nnodes $SLURM_JOB_NUM_NODES --nproc_per_node $SLURM_GPUS_ON_NODE --node_rank=$SLURM_PROCID --master_addr $MASTER_NODE --master_port $MASTER_PORT "$@" +4: launch.sh: line 53: 95373 Killed python -u -m torch.distributed.run --nnodes $SLURM_JOB_NUM_NODES --nproc_per_node $SLURM_GPUS_ON_NODE --node_rank=$SLURM_PROCID --master_addr $MASTER_NODE --master_port $MASTER_PORT "$@" +1: launch.sh: line 53: 101466 Killed python -u -m torch.distributed.run --nnodes $SLURM_JOB_NUM_NODES --nproc_per_node $SLURM_GPUS_ON_NODE --node_rank=$SLURM_PROCID --master_addr $MASTER_NODE --master_port $MASTER_PORT "$@" +6: launch.sh: line 53: 97871 Killed python -u -m torch.distributed.run --nnodes $SLURM_JOB_NUM_NODES --nproc_per_node $SLURM_GPUS_ON_NODE --node_rank=$SLURM_PROCID --master_addr $MASTER_NODE --master_port $MASTER_PORT "$@" +7: launch.sh: line 53: 104522 Killed python -u -m torch.distributed.run --nnodes $SLURM_JOB_NUM_NODES --nproc_per_node $SLURM_GPUS_ON_NODE --node_rank=$SLURM_PROCID --master_addr $MASTER_NODE --master_port $MASTER_PORT "$@" +2: launch.sh: line 53: 94317 Killed python -u -m torch.distributed.run --nnodes $SLURM_JOB_NUM_NODES --nproc_per_node $SLURM_GPUS_ON_NODE --node_rank=$SLURM_PROCID --master_addr $MASTER_NODE --master_port $MASTER_PORT "$@" +srun: error: nid007226: task 3: Exited with exit code 137 +srun: launch/slurm: _step_signal: Terminating StepId=3326647.0 +srun: error: nid007224: task 1: Exited with exit code 137 +srun: error: nid007227: task 4: Exited with exit code 137 +srun: error: nid007223: task 0: Exited with exit code 137 +srun: error: nid007228: task 5: Exited with exit code 137 +srun: error: nid007229: task 6: Exited with exit code 137 +srun: error: nid007230: task 7: Exited with exit code 137 +srun: error: nid007225: task 2: Exited with exit code 137 diff --git a/221m32b400m/3326647.out b/221m32b400m/3326647.out new file mode 100644 index 0000000000000000000000000000000000000000..c919aafeb250025964223d58b19e9ee00727b080 --- /dev/null +++ b/221m32b400m/3326647.out @@ -0,0 +1,16112 @@ +Model parameters: d_model 896 ffw_size 3584 kv_size 64 n_heads 14 n_layers 18 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 18 --hidden-size 896 --num-attention-heads 14 --kv-channels 64 --ffn-hidden-size 3584 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 44_416_143 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-221m91b400m --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 44_416_143 --lr-warmup-samples 444_161 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 10000 --eval-interval 10000 --eval-iters 1 --tensorboard-dir tensorboard_221m91b400m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_221m91b400m --load checkpoints_221m91b400m --train-weighted-split-paths-path train400m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3326647.json --zero-stage 0 +START 3326647: Thu 16 Mar 2023 11:09:24 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 47.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 48.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 44.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 50.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 45.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 45.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 47.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 40.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 43.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 54.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 43.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 41.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 40.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 42.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 41.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 43.0c 78.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 38.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +5: Launching on nid007228 (5/8), master nid007223 port 9999, GPUs 8, CUDA: True +2: Launching on nid007225 (2/8), master nid007223 port 9999, GPUs 8, CUDA: True +7: Launching on nid007230 (7/8), master nid007223 port 9999, GPUs 8, CUDA: True +3: Launching on nid007226 (3/8), master nid007223 port 9999, GPUs 8, CUDA: True +6: Launching on nid007229 (6/8), master nid007223 port 9999, GPUs 8, CUDA: True +4: Launching on nid007227 (4/8), master nid007223 port 9999, GPUs 8, CUDA: True +0: Launching on nid007223 (0/8), master nid007223 port 9999, GPUs 8, CUDA: True +1: Launching on nid007224 (1/8), master nid007223 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3326647.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 10000 +0: eval_iters ...................................... 1 +0: eval_only ....................................... None +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3584 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 896 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-221m91b400m +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_221m91b400m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 44416143 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 444161 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... None +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 14 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 18 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... False +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. None +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_221m91b400m +0: save_interval ................................... 10000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_221m91b400m +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 44416143 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 23:11:00,581] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.088 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 102 +0: [1/1] c++ scaled_masked_softmax_hip.cuda.o scaled_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 27.175 seconds +0: time to initialize megatron (seconds): 2.582 +0: [after megatron is initialized] datetime: 2023-03-16 23:11:30 +0: building GPT model ... +0: [2023-03-16 23:11:30,719] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 23:11:30,720] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 23:11:30,720] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.64 GB, percent = 6.1% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-16 23:11:32,744] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=25 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: undo +0: 22: MixedFusedLayerNorm +0: 23: EmbeddingPipe +0: 24: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 23:11:33,135] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 23:11:33,136] [INFO] [utils.py:828:see_memory_usage] MA 0.42 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-16 23:11:33,136] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.65 GB, percent = 6.1% +0: setting training iterations to 173500 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 23:11:33,138] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 23:11:46,671] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 23:11:46,671] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 23:11:46,671] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 23:11:46,677] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 23:11:46,678] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 23:11:46,799] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 23:11:46,800] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-16 23:11:46,800] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.34 GB, percent = 6.2% +3: ninja: no work to do. +3: Time to load utils op: 0.4148690700531006 seconds +0: Time to load utils op: 0.2997713088989258 seconds +0: Time to load utils op: 0.5227842330932617 seconds +2: Time to load utils op: 0.5251212120056152 seconds +0: [2023-03-16 23:11:47,207] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 23:11:47,207] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.41 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-16 23:11:47,208] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.34 GB, percent = 6.2% +0: Time to load utils op: 0.5031149387359619 seconds +0: Time to load utils op: 0.5036935806274414 seconds +0: Time to load utils op: 0.5034465789794922 seconds +0: Time to load utils op: 0.5036520957946777 seconds +0: Time to load utils op: 0.5043058395385742 seconds +0: Time to load utils op: 0.5043418407440186 seconds +2: Time to load utils op: 0.5029199123382568 seconds +2: Time to load utils op: 0.5032899379730225 seconds +2: Time to load utils op: 0.5031754970550537 seconds +2: Time to load utils op: 0.5025875568389893 seconds +2: Time to load utils op: 0.503870964050293 secondsTime to load utils op: 0.5030961036682129 seconds +2: +2: Time to load utils op: 0.5035459995269775 seconds +3: Time to load utils op: 0.5035269260406494 seconds +3: Time to load utils op: 0.5036463737487793 seconds +3: Time to load utils op: 0.5038325786590576 seconds +3: Time to load utils op: 0.5041782855987549 seconds +1: Time to load utils op: 0.5123147964477539 seconds +1: Time to load utils op: 0.5123429298400879 seconds +1: Time to load utils op: 0.5123698711395264 secondsTime to load utils op: 0.5123753547668457 seconds +1: +1: Time to load utils op: 0.5123786926269531 secondsTime to load utils op: 0.512378454208374 seconds +1: +1: Time to load utils op: 0.5123891830444336 seconds +1: Time to load utils op: 0.5124120712280273 seconds +3: Time to load utils op: 0.0003960132598876953 seconds +3: Time to load utils op: 0.0005102157592773438 seconds +3: Time to load utils op: 0.00035643577575683594 seconds +3: Time to load utils op: 0.0003528594970703125 seconds +3: Time to load utils op: 0.0003707408905029297 seconds +3: Time to load utils op: 0.3021388053894043 seconds +3: Time to load utils op: 0.30240678787231445 seconds +3: Time to load utils op: 0.3021976947784424 seconds +4: Time to load utils op: 0.3125293254852295 secondsTime to load utils op: 0.312530517578125 seconds +4: +4: Time to load utils op: 0.3125419616699219 seconds +4: Time to load utils op: 0.3125765323638916 secondsTime to load utils op: 0.3125801086425781 seconds +4: +4: Time to load utils op: 0.31259942054748535 seconds +7: Time to load utils op: 0.3097221851348877 seconds +3: Time to load utils op: 0.0003905296325683594 seconds +4: Time to load utils op: 0.3125936985015869 seconds +5: Time to load utils op: 0.3126680850982666 secondsTime to load utils op: 0.3126680850982666 seconds +5: +4: Time to load utils op: 0.3125903606414795 seconds +7: Time to load utils op: 0.30974888801574707 seconds +7: Time to load utils op: 0.3030707836151123 seconds +5: Time to load utils op: 0.31272101402282715 seconds +5: Time to load utils op: 0.312732458114624 seconds +5: Time to load utils op: 0.3127422332763672 seconds +5: Time to load utils op: 0.3127408027648926 seconds +5: Time to load utils op: 0.3127412796020508 seconds +5: Time to load utils op: 0.31274914741516113 seconds +7: Time to load utils op: 0.30278515815734863 seconds +7: Time to load utils op: 0.30324602127075195 seconds +3: Time to load utils op: 0.0003514289855957031 seconds +7: Time to load utils op: 0.3032197952270508 seconds +3: Time to load utils op: 0.00044083595275878906 seconds +7: Time to load utils op: 0.30388593673706055 seconds +6: Time to load utils op: 0.31107187271118164 seconds +6: Time to load utils op: 0.3111088275909424 seconds +6: Time to load utils op: 0.3111002445220947 seconds +6: Time to load utils op: 0.3111262321472168 seconds +6: Time to load utils op: 0.3111429214477539 secondsTime to load utils op: 0.31114864349365234 seconds +6: Time to load utils op: 0.31114792823791504 seconds +6: +6: Time to load utils op: 0.3111553192138672 seconds +7: Time to load utils op: 0.30416083335876465 seconds +0: Time to load utils op: 0.00046515464782714844 seconds +0: Time to load utils op: 0.0004794597625732422 secondsTime to load utils op: 0.0004775524139404297 secondsTime to load utils op: 0.000457763671875 secondsTime to load utils op: 0.00048351287841796875 seconds +0: +0: +0: +0: Time to load utils op: 0.0004119873046875 seconds +0: Time to load utils op: 0.0004296302795410156 seconds +6: Time to load utils op: 0.000957489013671875 seconds +6: Time to load utils op: 0.001256704330444336 seconds +5: Time to load utils op: 0.000997304916381836 seconds +6: Time to load utils op: 0.001306295394897461 seconds +5: Time to load utils op: 0.0009357929229736328 seconds +6: Time to load utils op: 0.0013048648834228516 seconds +5: Time to load utils op: 0.0009565353393554688 seconds +6: Time to load utils op: 0.0012545585632324219 seconds +6: Time to load utils op: 0.0012695789337158203 secondsTime to load utils op: 0.0013568401336669922 seconds +6: +6: Time to load utils op: 0.0013060569763183594 seconds +5: Time to load utils op: 0.0011086463928222656 seconds +5: Time to load utils op: 0.0011441707611083984 seconds +5: Time to load utils op: 0.0011484622955322266 seconds +5: Time to load utils op: 0.0010869503021240234 seconds +5: Time to load utils op: 0.0011608600616455078 seconds +2: Time to load utils op: 0.0005002021789550781 seconds +2: Time to load utils op: 0.0003795623779296875 seconds +2: Time to load utils op: 0.0005147457122802734 seconds +7: Time to load utils op: 0.0003788471221923828 seconds +2: Time to load utils op: 0.0005359649658203125 seconds +7: Time to load utils op: 0.0006594657897949219 seconds +2: Time to load utils op: 0.0005590915679931641 seconds +2: Time to load utils op: 0.0005788803100585938 seconds +2: Time to load utils op: 0.0006189346313476562 seconds +2: Time to load utils op: 0.0005981922149658203 seconds +7: Time to load utils op: 0.00037741661071777344 seconds +7: Time to load utils op: 0.00040411949157714844 seconds +1: Time to load utils op: 0.0009322166442871094 seconds +7: Time to load utils op: 0.0004622936248779297 seconds +7: Time to load utils op: 0.0004494190216064453 seconds +7: Time to load utils op: 0.0004458427429199219 seconds +1: Time to load utils op: 0.0010573863983154297 seconds +1: Time to load utils op: 0.001100301742553711 seconds +1: Time to load utils op: 0.0013408660888671875 seconds +1: Time to load utils op: 0.0013172626495361328 seconds +1: Time to load utils op: 0.0013172626495361328 seconds +1: Time to load utils op: 0.001287698745727539 seconds +1: Time to load utils op: 0.0013737678527832031 seconds +7: Time to load utils op: 0.00041866302490234375 seconds +4: Time to load utils op: 0.0008921623229980469 seconds +4: Time to load utils op: 0.0009465217590332031 seconds +4: Time to load utils op: 0.001008749008178711 seconds +4: Time to load utils op: 0.0012004375457763672 seconds +4: Time to load utils op: 0.001260995864868164 seconds +4: Time to load utils op: 0.001287698745727539 seconds +4: Time to load utils op: 0.0013151168823242188 seconds +4: Time to load utils op: 0.0012726783752441406 seconds +0: [2023-03-16 23:11:47,383] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 23:11:47,383] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB +0: [2023-03-16 23:11:47,384] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% +0: [2023-03-16 23:11:47,494] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 23:11:47,495] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB +0: [2023-03-16 23:11:47,495] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% +0: [2023-03-16 23:11:47,602] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 23:11:47,603] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:47,603] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% +0: [2023-03-16 23:11:47,707] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 23:11:47,708] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:47,708] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% +0: [2023-03-16 23:11:47,814] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 23:11:47,815] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:47,815] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% +0: [2023-03-16 23:11:47,918] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 23:11:47,918] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:47,919] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% +0: [2023-03-16 23:11:48,027] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 23:11:48,028] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:48,028] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% +0: [2023-03-16 23:11:48,132] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 23:11:48,132] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:48,132] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% +0: [2023-03-16 23:11:48,133] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 23:11:48,133] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 23:11:48,133] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 23:11:48,133] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 23:11:48,133] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 23:11:48,134] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 23:11:48,135] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 23:11:48,135] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0004317760467529297 seconds +0: [2023-03-16 23:11:48,136] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-16 23:11:48,147] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=25 [0, 25) STAGE_PARAMS=220527104 (220.527M) TOTAL_PARAMS=220527104 (220.527M) UNIQUE_PARAMS=220527104 (220.527M) +4: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: WARNING: could not find the metadata file checkpoints_221m91b400m +0: will not load any checkpoints and will start from random +6: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-16 23:11:48,154] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 23:11:48,155] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 23:11:48,156] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m91b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: time (ms) | load-checkpoint: 6.17 +0: estimated model parameters: 0.220527104 +0: estimated model parameters without embeddings: 0.173619712 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 23:11:48 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 44416143 +0: validation: 4608 +0: test: 256 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.006575 seconds +0: number of documents: 835726 +0: > dataset split: +0: train: +0: document indices in [0, 835726) total of 835726 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_44416143ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_44416143ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_44416143ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.062 seconds +0: total number of samples: 44482924 +0: total number of epochs: 228 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.039400 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_4608ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_4608ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_4608ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.075 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-16 23:12:02 +0: done with setup ... +0: training ... +0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: +7: time (ms) | model-and-optimizer-setup: 18204.67 | train/valid/test-data-iterators-setup: 13200.85 +0: [000-000] 0.2205B / 0.1736B +0: [before the start of training step] datetime: 2023-03-16 23:12:02 +0: [Rank 0] (after 10 iterations) memory (MB) | allocated: 3312.30078125 | max allocated: 30164.70654296875 | reserved: 30952.0 | max reserved: 30952.0 +7: iteration 10/ 173500 | consumed samples: 2560 | consumed tokens: 5242880 | elapsed time per iteration (s): 1.50 | learning rate: 1.153E-06 | global batch size: 256 | lm loss: 1.091616E+01 | grad norm: 17.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 171.039 | TFLOPs: 8.97 | +7: iteration 20/ 173500 | consumed samples: 5120 | consumed tokens: 10485760 | elapsed time per iteration (s): 0.44 | learning rate: 2.305E-06 | global batch size: 256 | lm loss: 1.031751E+01 | grad norm: 6.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.689 | TFLOPs: 30.52 | +7: iteration 30/ 173500 | consumed samples: 7680 | consumed tokens: 15728640 | elapsed time per iteration (s): 0.43 | learning rate: 3.458E-06 | global batch size: 256 | lm loss: 9.739371E+00 | grad norm: 2.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.933 | TFLOPs: 31.11 | +7: iteration 40/ 173500 | consumed samples: 10240 | consumed tokens: 20971520 | elapsed time per iteration (s): 0.43 | learning rate: 4.611E-06 | global batch size: 256 | lm loss: 9.494318E+00 | grad norm: 2.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.637 | TFLOPs: 31.04 | +7: iteration 50/ 173500 | consumed samples: 12800 | consumed tokens: 26214400 | elapsed time per iteration (s): 0.44 | learning rate: 5.764E-06 | global batch size: 256 | lm loss: 9.345955E+00 | grad norm: 1.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.207 | TFLOPs: 30.70 | +7: iteration 60/ 173500 | consumed samples: 15360 | consumed tokens: 31457280 | elapsed time per iteration (s): 0.44 | learning rate: 6.916E-06 | global batch size: 256 | lm loss: 9.189825E+00 | grad norm: 1.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.376 | TFLOPs: 30.56 | +7: iteration 70/ 173500 | consumed samples: 17920 | consumed tokens: 36700160 | elapsed time per iteration (s): 0.44 | learning rate: 8.069E-06 | global batch size: 256 | lm loss: 9.032596E+00 | grad norm: 1.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.156 | TFLOPs: 30.28 | +7: iteration 80/ 173500 | consumed samples: 20480 | consumed tokens: 41943040 | elapsed time per iteration (s): 0.44 | learning rate: 9.222E-06 | global batch size: 256 | lm loss: 8.858505E+00 | grad norm: 1.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.330 | TFLOPs: 30.24 | +7: iteration 90/ 173500 | consumed samples: 23040 | consumed tokens: 47185920 | elapsed time per iteration (s): 0.45 | learning rate: 1.037E-05 | global batch size: 256 | lm loss: 8.706727E+00 | grad norm: 1.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.571 | TFLOPs: 30.04 | +7: iteration 100/ 173500 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (s): 0.44 | learning rate: 1.153E-05 | global batch size: 256 | lm loss: 8.542883E+00 | grad norm: 1.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.718 | TFLOPs: 30.73 | +7: iteration 110/ 173500 | consumed samples: 28160 | consumed tokens: 57671680 | elapsed time per iteration (s): 0.45 | learning rate: 1.268E-05 | global batch size: 256 | lm loss: 8.390529E+00 | grad norm: 1.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.311 | TFLOPs: 29.98 | +7: iteration 120/ 173500 | consumed samples: 30720 | consumed tokens: 62914560 | elapsed time per iteration (s): 0.44 | learning rate: 1.383E-05 | global batch size: 256 | lm loss: 8.249837E+00 | grad norm: 1.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.756 | TFLOPs: 30.37 | +7: iteration 130/ 173500 | consumed samples: 33280 | consumed tokens: 68157440 | elapsed time per iteration (s): 0.44 | learning rate: 1.499E-05 | global batch size: 256 | lm loss: 8.095625E+00 | grad norm: 1.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.439 | TFLOPs: 30.61 | +7: iteration 140/ 173500 | consumed samples: 35840 | consumed tokens: 73400320 | elapsed time per iteration (s): 0.45 | learning rate: 1.614E-05 | global batch size: 256 | lm loss: 7.969907E+00 | grad norm: 1.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.561 | TFLOPs: 30.04 | +7: iteration 150/ 173500 | consumed samples: 38400 | consumed tokens: 78643200 | elapsed time per iteration (s): 0.43 | learning rate: 1.729E-05 | global batch size: 256 | lm loss: 7.812309E+00 | grad norm: 1.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.140 | TFLOPs: 31.28 | +7: iteration 160/ 173500 | consumed samples: 40960 | consumed tokens: 83886080 | elapsed time per iteration (s): 0.44 | learning rate: 1.844E-05 | global batch size: 256 | lm loss: 7.683726E+00 | grad norm: 1.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.926 | TFLOPs: 30.22 | +7: iteration 170/ 173500 | consumed samples: 43520 | consumed tokens: 89128960 | elapsed time per iteration (s): 0.45 | learning rate: 1.960E-05 | global batch size: 256 | lm loss: 7.552599E+00 | grad norm: 1.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.631 | TFLOPs: 29.89 | +7: iteration 180/ 173500 | consumed samples: 46080 | consumed tokens: 94371840 | elapsed time per iteration (s): 0.45 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 7.451283E+00 | grad norm: 0.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.149 | TFLOPs: 30.02 | +7: iteration 190/ 173500 | consumed samples: 48640 | consumed tokens: 99614720 | elapsed time per iteration (s): 0.43 | learning rate: 2.190E-05 | global batch size: 256 | lm loss: 7.355374E+00 | grad norm: 1.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.735 | TFLOPs: 30.99 | +7: iteration 200/ 173500 | consumed samples: 51200 | consumed tokens: 104857600 | elapsed time per iteration (s): 0.43 | learning rate: 2.305E-05 | global batch size: 256 | lm loss: 7.255829E+00 | grad norm: 0.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.865 | TFLOPs: 31.11 | +7: iteration 210/ 173500 | consumed samples: 53760 | consumed tokens: 110100480 | elapsed time per iteration (s): 0.43 | learning rate: 2.421E-05 | global batch size: 256 | lm loss: 7.180691E+00 | grad norm: 1.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.949 | TFLOPs: 31.16 | +7: iteration 220/ 173500 | consumed samples: 56320 | consumed tokens: 115343360 | elapsed time per iteration (s): 0.46 | learning rate: 2.536E-05 | global batch size: 256 | lm loss: 7.086536E+00 | grad norm: 1.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.700 | TFLOPs: 29.42 | +7: iteration 230/ 173500 | consumed samples: 58880 | consumed tokens: 120586240 | elapsed time per iteration (s): 0.45 | learning rate: 2.651E-05 | global batch size: 256 | lm loss: 7.042834E+00 | grad norm: 1.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.721 | TFLOPs: 29.73 | +7: iteration 240/ 173500 | consumed samples: 61440 | consumed tokens: 125829120 | elapsed time per iteration (s): 0.45 | learning rate: 2.767E-05 | global batch size: 256 | lm loss: 6.990680E+00 | grad norm: 1.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.370 | TFLOPs: 30.08 | +7: iteration 250/ 173500 | consumed samples: 64000 | consumed tokens: 131072000 | elapsed time per iteration (s): 0.44 | learning rate: 2.882E-05 | global batch size: 256 | lm loss: 6.944079E+00 | grad norm: 0.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.550 | TFLOPs: 30.41 | +7: iteration 260/ 173500 | consumed samples: 66560 | consumed tokens: 136314880 | elapsed time per iteration (s): 0.43 | learning rate: 2.997E-05 | global batch size: 256 | lm loss: 6.876000E+00 | grad norm: 1.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.993 | TFLOPs: 31.43 | +7: iteration 270/ 173500 | consumed samples: 69120 | consumed tokens: 141557760 | elapsed time per iteration (s): 0.44 | learning rate: 3.112E-05 | global batch size: 256 | lm loss: 6.842333E+00 | grad norm: 1.167 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.555 | TFLOPs: 30.30 | +7: iteration 280/ 173500 | consumed samples: 71680 | consumed tokens: 146800640 | elapsed time per iteration (s): 0.43 | learning rate: 3.228E-05 | global batch size: 256 | lm loss: 6.804465E+00 | grad norm: 0.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.518 | TFLOPs: 30.88 | +7: iteration 290/ 173500 | consumed samples: 74240 | consumed tokens: 152043520 | elapsed time per iteration (s): 0.44 | learning rate: 3.343E-05 | global batch size: 256 | lm loss: 6.751941E+00 | grad norm: 0.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.565 | TFLOPs: 30.57 | +7: iteration 300/ 173500 | consumed samples: 76800 | consumed tokens: 157286400 | elapsed time per iteration (s): 0.44 | learning rate: 3.458E-05 | global batch size: 256 | lm loss: 6.720788E+00 | grad norm: 1.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.325 | TFLOPs: 30.71 | +7: iteration 310/ 173500 | consumed samples: 79360 | consumed tokens: 162529280 | elapsed time per iteration (s): 0.44 | learning rate: 3.573E-05 | global batch size: 256 | lm loss: 6.693474E+00 | grad norm: 0.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.509 | TFLOPs: 30.51 | +7: iteration 320/ 173500 | consumed samples: 81920 | consumed tokens: 167772160 | elapsed time per iteration (s): 0.43 | learning rate: 3.689E-05 | global batch size: 256 | lm loss: 6.634674E+00 | grad norm: 0.815 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.075 | TFLOPs: 31.01 | +7: iteration 330/ 173500 | consumed samples: 84480 | consumed tokens: 173015040 | elapsed time per iteration (s): 0.44 | learning rate: 3.804E-05 | global batch size: 256 | lm loss: 6.633344E+00 | grad norm: 0.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.295 | TFLOPs: 30.87 | +7: iteration 340/ 173500 | consumed samples: 87040 | consumed tokens: 178257920 | elapsed time per iteration (s): 0.44 | learning rate: 3.919E-05 | global batch size: 256 | lm loss: 6.582374E+00 | grad norm: 1.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.271 | TFLOPs: 30.34 | +7: iteration 350/ 173500 | consumed samples: 89600 | consumed tokens: 183500800 | elapsed time per iteration (s): 0.44 | learning rate: 4.035E-05 | global batch size: 256 | lm loss: 6.572448E+00 | grad norm: 0.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.564 | TFLOPs: 30.46 | +7: iteration 360/ 173500 | consumed samples: 92160 | consumed tokens: 188743680 | elapsed time per iteration (s): 0.45 | learning rate: 4.150E-05 | global batch size: 256 | lm loss: 6.545484E+00 | grad norm: 0.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.054 | TFLOPs: 30.07 | +7: iteration 370/ 173500 | consumed samples: 94720 | consumed tokens: 193986560 | elapsed time per iteration (s): 0.44 | learning rate: 4.265E-05 | global batch size: 256 | lm loss: 6.519343E+00 | grad norm: 0.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.756 | TFLOPs: 30.58 | +7: iteration 380/ 173500 | consumed samples: 97280 | consumed tokens: 199229440 | elapsed time per iteration (s): 0.44 | learning rate: 4.380E-05 | global batch size: 256 | lm loss: 6.495982E+00 | grad norm: 1.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.751 | TFLOPs: 30.79 | +7: iteration 390/ 173500 | consumed samples: 99840 | consumed tokens: 204472320 | elapsed time per iteration (s): 0.43 | learning rate: 4.496E-05 | global batch size: 256 | lm loss: 6.468490E+00 | grad norm: 0.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.257 | TFLOPs: 31.07 | +7: iteration 400/ 173500 | consumed samples: 102400 | consumed tokens: 209715200 | elapsed time per iteration (s): 0.45 | learning rate: 4.611E-05 | global batch size: 256 | lm loss: 6.444469E+00 | grad norm: 1.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.631 | TFLOPs: 30.15 | +7: iteration 410/ 173500 | consumed samples: 104960 | consumed tokens: 214958080 | elapsed time per iteration (s): 0.44 | learning rate: 4.726E-05 | global batch size: 256 | lm loss: 6.426821E+00 | grad norm: 1.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.966 | TFLOPs: 30.59 | +7: iteration 420/ 173500 | consumed samples: 107520 | consumed tokens: 220200960 | elapsed time per iteration (s): 0.44 | learning rate: 4.841E-05 | global batch size: 256 | lm loss: 6.406419E+00 | grad norm: 1.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.685 | TFLOPs: 30.73 | +7: iteration 430/ 173500 | consumed samples: 110080 | consumed tokens: 225443840 | elapsed time per iteration (s): 0.44 | learning rate: 4.957E-05 | global batch size: 256 | lm loss: 6.398855E+00 | grad norm: 1.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.732 | TFLOPs: 30.73 | +7: iteration 440/ 173500 | consumed samples: 112640 | consumed tokens: 230686720 | elapsed time per iteration (s): 0.45 | learning rate: 5.072E-05 | global batch size: 256 | lm loss: 6.376783E+00 | grad norm: 0.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.789 | TFLOPs: 29.53 | +7: iteration 450/ 173500 | consumed samples: 115200 | consumed tokens: 235929600 | elapsed time per iteration (s): 0.44 | learning rate: 5.187E-05 | global batch size: 256 | lm loss: 6.345536E+00 | grad norm: 1.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.753 | TFLOPs: 30.31 | +7: iteration 460/ 173500 | consumed samples: 117760 | consumed tokens: 241172480 | elapsed time per iteration (s): 0.44 | learning rate: 5.303E-05 | global batch size: 256 | lm loss: 6.341280E+00 | grad norm: 1.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.594 | TFLOPs: 30.67 | +7: iteration 470/ 173500 | consumed samples: 120320 | consumed tokens: 246415360 | elapsed time per iteration (s): 0.44 | learning rate: 5.418E-05 | global batch size: 256 | lm loss: 6.328714E+00 | grad norm: 1.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.067 | TFLOPs: 30.75 | +7: iteration 480/ 173500 | consumed samples: 122880 | consumed tokens: 251658240 | elapsed time per iteration (s): 0.44 | learning rate: 5.533E-05 | global batch size: 256 | lm loss: 6.302365E+00 | grad norm: 1.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.448 | TFLOPs: 30.61 | +7: iteration 490/ 173500 | consumed samples: 125440 | consumed tokens: 256901120 | elapsed time per iteration (s): 0.44 | learning rate: 5.648E-05 | global batch size: 256 | lm loss: 6.289063E+00 | grad norm: 1.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.742 | TFLOPs: 30.52 | +7: iteration 500/ 173500 | consumed samples: 128000 | consumed tokens: 262144000 | elapsed time per iteration (s): 0.42 | learning rate: 5.764E-05 | global batch size: 256 | lm loss: 6.276648E+00 | grad norm: 0.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.445 | TFLOPs: 31.66 | +7: iteration 510/ 173500 | consumed samples: 130560 | consumed tokens: 267386880 | elapsed time per iteration (s): 0.44 | learning rate: 5.879E-05 | global batch size: 256 | lm loss: 6.267680E+00 | grad norm: 1.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.601 | TFLOPs: 30.78 | +7: iteration 520/ 173500 | consumed samples: 133120 | consumed tokens: 272629760 | elapsed time per iteration (s): 0.44 | learning rate: 5.994E-05 | global batch size: 256 | lm loss: 6.255700E+00 | grad norm: 1.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.502 | TFLOPs: 30.51 | +7: iteration 530/ 173500 | consumed samples: 135680 | consumed tokens: 277872640 | elapsed time per iteration (s): 0.44 | learning rate: 6.109E-05 | global batch size: 256 | lm loss: 6.231670E+00 | grad norm: 1.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.723 | TFLOPs: 30.78 | +7: iteration 540/ 173500 | consumed samples: 138240 | consumed tokens: 283115520 | elapsed time per iteration (s): 0.44 | learning rate: 6.225E-05 | global batch size: 256 | lm loss: 6.195055E+00 | grad norm: 1.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.422 | TFLOPs: 30.66 | +7: iteration 550/ 173500 | consumed samples: 140800 | consumed tokens: 288358400 | elapsed time per iteration (s): 0.45 | learning rate: 6.340E-05 | global batch size: 256 | lm loss: 6.189794E+00 | grad norm: 1.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.593 | TFLOPs: 29.99 | +7: iteration 560/ 173500 | consumed samples: 143360 | consumed tokens: 293601280 | elapsed time per iteration (s): 0.44 | learning rate: 6.455E-05 | global batch size: 256 | lm loss: 6.180946E+00 | grad norm: 1.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.813 | TFLOPs: 30.37 | +7: iteration 570/ 173500 | consumed samples: 145920 | consumed tokens: 298844160 | elapsed time per iteration (s): 0.44 | learning rate: 6.571E-05 | global batch size: 256 | lm loss: 6.170764E+00 | grad norm: 1.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.507 | TFLOPs: 30.77 | +7: iteration 580/ 173500 | consumed samples: 148480 | consumed tokens: 304087040 | elapsed time per iteration (s): 0.45 | learning rate: 6.686E-05 | global batch size: 256 | lm loss: 6.155415E+00 | grad norm: 1.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.359 | TFLOPs: 30.03 | +7: iteration 590/ 173500 | consumed samples: 151040 | consumed tokens: 309329920 | elapsed time per iteration (s): 0.43 | learning rate: 6.801E-05 | global batch size: 256 | lm loss: 6.132038E+00 | grad norm: 1.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.879 | TFLOPs: 31.00 | +7: iteration 600/ 173500 | consumed samples: 153600 | consumed tokens: 314572800 | elapsed time per iteration (s): 0.44 | learning rate: 6.916E-05 | global batch size: 256 | lm loss: 6.110637E+00 | grad norm: 1.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.493 | TFLOPs: 30.77 | +7: iteration 610/ 173500 | consumed samples: 156160 | consumed tokens: 319815680 | elapsed time per iteration (s): 0.43 | learning rate: 7.032E-05 | global batch size: 256 | lm loss: 6.102563E+00 | grad norm: 1.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.569 | TFLOPs: 31.04 | +7: iteration 620/ 173500 | consumed samples: 158720 | consumed tokens: 325058560 | elapsed time per iteration (s): 0.43 | learning rate: 7.147E-05 | global batch size: 256 | lm loss: 6.072300E+00 | grad norm: 1.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.556 | TFLOPs: 30.99 | +7: iteration 630/ 173500 | consumed samples: 161280 | consumed tokens: 330301440 | elapsed time per iteration (s): 0.43 | learning rate: 7.262E-05 | global batch size: 256 | lm loss: 6.066143E+00 | grad norm: 0.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.327 | TFLOPs: 31.55 | +7: iteration 640/ 173500 | consumed samples: 163840 | consumed tokens: 335544320 | elapsed time per iteration (s): 0.44 | learning rate: 7.378E-05 | global batch size: 256 | lm loss: 6.054982E+00 | grad norm: 1.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.429 | TFLOPs: 30.82 | +7: iteration 650/ 173500 | consumed samples: 166400 | consumed tokens: 340787200 | elapsed time per iteration (s): 0.43 | learning rate: 7.493E-05 | global batch size: 256 | lm loss: 6.024094E+00 | grad norm: 1.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.335 | TFLOPs: 31.08 | +7: iteration 660/ 173500 | consumed samples: 168960 | consumed tokens: 346030080 | elapsed time per iteration (s): 0.43 | learning rate: 7.608E-05 | global batch size: 256 | lm loss: 6.016464E+00 | grad norm: 1.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.477 | TFLOPs: 31.30 | +7: iteration 670/ 173500 | consumed samples: 171520 | consumed tokens: 351272960 | elapsed time per iteration (s): 0.43 | learning rate: 7.723E-05 | global batch size: 256 | lm loss: 5.994701E+00 | grad norm: 1.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.534 | TFLOPs: 30.98 | +7: iteration 680/ 173500 | consumed samples: 174080 | consumed tokens: 356515840 | elapsed time per iteration (s): 0.43 | learning rate: 7.839E-05 | global batch size: 256 | lm loss: 5.977779E+00 | grad norm: 1.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.178 | TFLOPs: 31.12 | +7: iteration 690/ 173500 | consumed samples: 176640 | consumed tokens: 361758720 | elapsed time per iteration (s): 0.44 | learning rate: 7.954E-05 | global batch size: 256 | lm loss: 5.960389E+00 | grad norm: 1.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.646 | TFLOPs: 30.41 | +7: iteration 700/ 173500 | consumed samples: 179200 | consumed tokens: 367001600 | elapsed time per iteration (s): 0.43 | learning rate: 8.069E-05 | global batch size: 256 | lm loss: 5.950315E+00 | grad norm: 1.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.472 | TFLOPs: 31.56 | +7: iteration 710/ 173500 | consumed samples: 181760 | consumed tokens: 372244480 | elapsed time per iteration (s): 0.43 | learning rate: 8.184E-05 | global batch size: 256 | lm loss: 5.915113E+00 | grad norm: 1.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.358 | TFLOPs: 31.24 | +7: iteration 720/ 173500 | consumed samples: 184320 | consumed tokens: 377487360 | elapsed time per iteration (s): 0.43 | learning rate: 8.300E-05 | global batch size: 256 | lm loss: 5.922691E+00 | grad norm: 1.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.038 | TFLOPs: 31.01 | +7: iteration 730/ 173500 | consumed samples: 186880 | consumed tokens: 382730240 | elapsed time per iteration (s): 0.43 | learning rate: 8.415E-05 | global batch size: 256 | lm loss: 5.881493E+00 | grad norm: 1.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.473 | TFLOPs: 31.30 | +7: iteration 740/ 173500 | consumed samples: 189440 | consumed tokens: 387973120 | elapsed time per iteration (s): 0.43 | learning rate: 8.530E-05 | global batch size: 256 | lm loss: 5.873824E+00 | grad norm: 1.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.084 | TFLOPs: 31.38 | +7: iteration 750/ 173500 | consumed samples: 192000 | consumed tokens: 393216000 | elapsed time per iteration (s): 1.15 | learning rate: 8.646E-05 | global batch size: 256 | lm loss: 5.866430E+00 | grad norm: 1.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 222.363 | TFLOPs: 11.67 | +7: iteration 760/ 173500 | consumed samples: 194560 | consumed tokens: 398458880 | elapsed time per iteration (s): 0.51 | learning rate: 8.761E-05 | global batch size: 256 | lm loss: 5.850316E+00 | grad norm: 1.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 501.482 | TFLOPs: 26.31 | +7: iteration 770/ 173500 | consumed samples: 197120 | consumed tokens: 403701760 | elapsed time per iteration (s): 0.60 | learning rate: 8.876E-05 | global batch size: 256 | lm loss: 5.832829E+00 | grad norm: 1.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 424.737 | TFLOPs: 22.29 | +7: iteration 780/ 173500 | consumed samples: 199680 | consumed tokens: 408944640 | elapsed time per iteration (s): 0.59 | learning rate: 8.991E-05 | global batch size: 256 | lm loss: 5.809539E+00 | grad norm: 1.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 435.521 | TFLOPs: 22.85 | +7: iteration 790/ 173500 | consumed samples: 202240 | consumed tokens: 414187520 | elapsed time per iteration (s): 0.43 | learning rate: 9.107E-05 | global batch size: 256 | lm loss: 5.789207E+00 | grad norm: 1.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.721 | TFLOPs: 31.41 | +7: iteration 800/ 173500 | consumed samples: 204800 | consumed tokens: 419430400 | elapsed time per iteration (s): 0.43 | learning rate: 9.222E-05 | global batch size: 256 | lm loss: 5.764358E+00 | grad norm: 1.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.456 | TFLOPs: 31.24 | +7: iteration 810/ 173500 | consumed samples: 207360 | consumed tokens: 424673280 | elapsed time per iteration (s): 0.44 | learning rate: 9.337E-05 | global batch size: 256 | lm loss: 5.756334E+00 | grad norm: 1.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.711 | TFLOPs: 30.36 | +7: iteration 820/ 173500 | consumed samples: 209920 | consumed tokens: 429916160 | elapsed time per iteration (s): 0.44 | learning rate: 9.452E-05 | global batch size: 256 | lm loss: 5.743480E+00 | grad norm: 1.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.294 | TFLOPs: 30.29 | +7: iteration 830/ 173500 | consumed samples: 212480 | consumed tokens: 435159040 | elapsed time per iteration (s): 0.42 | learning rate: 9.568E-05 | global batch size: 256 | lm loss: 5.724570E+00 | grad norm: 1.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.083 | TFLOPs: 32.17 | +7: iteration 840/ 173500 | consumed samples: 215040 | consumed tokens: 440401920 | elapsed time per iteration (s): 0.43 | learning rate: 9.683E-05 | global batch size: 256 | lm loss: 5.692999E+00 | grad norm: 1.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.465 | TFLOPs: 31.40 | +7: iteration 850/ 173500 | consumed samples: 217600 | consumed tokens: 445644800 | elapsed time per iteration (s): 0.43 | learning rate: 9.798E-05 | global batch size: 256 | lm loss: 5.695688E+00 | grad norm: 1.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.929 | TFLOPs: 31.21 | +7: iteration 860/ 173500 | consumed samples: 220160 | consumed tokens: 450887680 | elapsed time per iteration (s): 0.42 | learning rate: 9.914E-05 | global batch size: 256 | lm loss: 5.666520E+00 | grad norm: 1.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.066 | TFLOPs: 31.69 | +7: iteration 870/ 173500 | consumed samples: 222720 | consumed tokens: 456130560 | elapsed time per iteration (s): 0.43 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 5.651816E+00 | grad norm: 1.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.490 | TFLOPs: 31.09 | +7: iteration 880/ 173500 | consumed samples: 225280 | consumed tokens: 461373440 | elapsed time per iteration (s): 0.43 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 5.622213E+00 | grad norm: 1.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.567 | TFLOPs: 31.30 | +7: iteration 890/ 173500 | consumed samples: 227840 | consumed tokens: 466616320 | elapsed time per iteration (s): 0.42 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 5.631956E+00 | grad norm: 1.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.102 | TFLOPs: 31.75 | +7: iteration 900/ 173500 | consumed samples: 230400 | consumed tokens: 471859200 | elapsed time per iteration (s): 0.43 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 5.592014E+00 | grad norm: 1.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.305 | TFLOPs: 31.50 | +7: iteration 910/ 173500 | consumed samples: 232960 | consumed tokens: 477102080 | elapsed time per iteration (s): 0.42 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 5.574429E+00 | grad norm: 1.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.222 | TFLOPs: 31.76 | +7: iteration 920/ 173500 | consumed samples: 235520 | consumed tokens: 482344960 | elapsed time per iteration (s): 0.43 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 5.551343E+00 | grad norm: 1.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.983 | TFLOPs: 31.11 | +7: iteration 930/ 173500 | consumed samples: 238080 | consumed tokens: 487587840 | elapsed time per iteration (s): 0.43 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 5.542192E+00 | grad norm: 1.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.617 | TFLOPs: 31.57 | +7: iteration 940/ 173500 | consumed samples: 240640 | consumed tokens: 492830720 | elapsed time per iteration (s): 0.43 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 5.513817E+00 | grad norm: 1.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.166 | TFLOPs: 31.59 | +7: iteration 950/ 173500 | consumed samples: 243200 | consumed tokens: 498073600 | elapsed time per iteration (s): 0.43 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 5.493437E+00 | grad norm: 1.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.858 | TFLOPs: 31.16 | +7: iteration 960/ 173500 | consumed samples: 245760 | consumed tokens: 503316480 | elapsed time per iteration (s): 0.42 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 5.472451E+00 | grad norm: 1.001 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.031 | TFLOPs: 31.90 | +7: iteration 970/ 173500 | consumed samples: 248320 | consumed tokens: 508559360 | elapsed time per iteration (s): 0.44 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 5.458516E+00 | grad norm: 1.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.407 | TFLOPs: 30.72 | +7: iteration 980/ 173500 | consumed samples: 250880 | consumed tokens: 513802240 | elapsed time per iteration (s): 0.43 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 5.453291E+00 | grad norm: 1.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.444 | TFLOPs: 31.45 | +7: iteration 990/ 173500 | consumed samples: 253440 | consumed tokens: 519045120 | elapsed time per iteration (s): 0.42 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 5.427925E+00 | grad norm: 1.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.371 | TFLOPs: 32.13 | +7: iteration 1000/ 173500 | consumed samples: 256000 | consumed tokens: 524288000 | elapsed time per iteration (s): 0.44 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 5.410461E+00 | grad norm: 1.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.236 | TFLOPs: 30.86 | +7: iteration 1010/ 173500 | consumed samples: 258560 | consumed tokens: 529530880 | elapsed time per iteration (s): 0.43 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 5.405889E+00 | grad norm: 1.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.202 | TFLOPs: 31.44 | +7: iteration 1020/ 173500 | consumed samples: 261120 | consumed tokens: 534773760 | elapsed time per iteration (s): 0.43 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 5.365507E+00 | grad norm: 1.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.542 | TFLOPs: 30.98 | +7: iteration 1030/ 173500 | consumed samples: 263680 | consumed tokens: 540016640 | elapsed time per iteration (s): 0.42 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 5.362997E+00 | grad norm: 1.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.707 | TFLOPs: 31.68 | +7: iteration 1040/ 173500 | consumed samples: 266240 | consumed tokens: 545259520 | elapsed time per iteration (s): 0.42 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 5.340602E+00 | grad norm: 1.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.665 | TFLOPs: 31.73 | +7: iteration 1050/ 173500 | consumed samples: 268800 | consumed tokens: 550502400 | elapsed time per iteration (s): 0.42 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 5.319607E+00 | grad norm: 1.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.545 | TFLOPs: 31.82 | +7: iteration 1060/ 173500 | consumed samples: 271360 | consumed tokens: 555745280 | elapsed time per iteration (s): 0.43 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 5.299686E+00 | grad norm: 0.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.129 | TFLOPs: 31.59 | +7: iteration 1070/ 173500 | consumed samples: 273920 | consumed tokens: 560988160 | elapsed time per iteration (s): 0.42 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 5.293864E+00 | grad norm: 1.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.780 | TFLOPs: 31.73 | +7: iteration 1080/ 173500 | consumed samples: 276480 | consumed tokens: 566231040 | elapsed time per iteration (s): 0.43 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 5.277452E+00 | grad norm: 1.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.167 | TFLOPs: 31.49 | +7: iteration 1090/ 173500 | consumed samples: 279040 | consumed tokens: 571473920 | elapsed time per iteration (s): 0.43 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 5.249208E+00 | grad norm: 1.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.201 | TFLOPs: 31.33 | +7: iteration 1100/ 173500 | consumed samples: 281600 | consumed tokens: 576716800 | elapsed time per iteration (s): 0.42 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 5.237472E+00 | grad norm: 1.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.947 | TFLOPs: 31.90 | +7: iteration 1110/ 173500 | consumed samples: 284160 | consumed tokens: 581959680 | elapsed time per iteration (s): 0.42 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 5.231892E+00 | grad norm: 1.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.945 | TFLOPs: 31.85 | +7: iteration 1120/ 173500 | consumed samples: 286720 | consumed tokens: 587202560 | elapsed time per iteration (s): 0.42 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 5.209666E+00 | grad norm: 1.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.355 | TFLOPs: 31.97 | +7: iteration 1130/ 173500 | consumed samples: 289280 | consumed tokens: 592445440 | elapsed time per iteration (s): 0.42 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 5.196267E+00 | grad norm: 1.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.301 | TFLOPs: 31.65 | +7: iteration 1140/ 173500 | consumed samples: 291840 | consumed tokens: 597688320 | elapsed time per iteration (s): 0.42 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 5.163800E+00 | grad norm: 1.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.483 | TFLOPs: 31.87 | +7: iteration 1150/ 173500 | consumed samples: 294400 | consumed tokens: 602931200 | elapsed time per iteration (s): 0.42 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 5.152744E+00 | grad norm: 1.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.641 | TFLOPs: 31.88 | +7: iteration 1160/ 173500 | consumed samples: 296960 | consumed tokens: 608174080 | elapsed time per iteration (s): 0.42 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 5.117976E+00 | grad norm: 1.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.389 | TFLOPs: 31.76 | +7: iteration 1170/ 173500 | consumed samples: 299520 | consumed tokens: 613416960 | elapsed time per iteration (s): 0.43 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 5.103355E+00 | grad norm: 1.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.989 | TFLOPs: 31.38 | +7: iteration 1180/ 173500 | consumed samples: 302080 | consumed tokens: 618659840 | elapsed time per iteration (s): 0.42 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 5.086180E+00 | grad norm: 1.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.078 | TFLOPs: 32.06 | +7: iteration 1190/ 173500 | consumed samples: 304640 | consumed tokens: 623902720 | elapsed time per iteration (s): 0.43 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 5.069034E+00 | grad norm: 1.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.094 | TFLOPs: 31.38 | +7: iteration 1200/ 173500 | consumed samples: 307200 | consumed tokens: 629145600 | elapsed time per iteration (s): 0.43 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 5.067905E+00 | grad norm: 1.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.797 | TFLOPs: 30.89 | +7: iteration 1210/ 173500 | consumed samples: 309760 | consumed tokens: 634388480 | elapsed time per iteration (s): 0.42 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 5.034091E+00 | grad norm: 1.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.593 | TFLOPs: 31.72 | +7: iteration 1220/ 173500 | consumed samples: 312320 | consumed tokens: 639631360 | elapsed time per iteration (s): 0.43 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 5.020522E+00 | grad norm: 1.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.326 | TFLOPs: 31.08 | +7: iteration 1230/ 173500 | consumed samples: 314880 | consumed tokens: 644874240 | elapsed time per iteration (s): 0.42 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 4.989138E+00 | grad norm: 1.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.905 | TFLOPs: 31.63 | +7: iteration 1240/ 173500 | consumed samples: 317440 | consumed tokens: 650117120 | elapsed time per iteration (s): 0.42 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 4.975047E+00 | grad norm: 1.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.886 | TFLOPs: 31.63 | +7: iteration 1250/ 173500 | consumed samples: 320000 | consumed tokens: 655360000 | elapsed time per iteration (s): 0.43 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 4.954251E+00 | grad norm: 1.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.431 | TFLOPs: 30.98 | +7: iteration 1260/ 173500 | consumed samples: 322560 | consumed tokens: 660602880 | elapsed time per iteration (s): 0.43 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 4.923255E+00 | grad norm: 1.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.341 | TFLOPs: 31.29 | +7: iteration 1270/ 173500 | consumed samples: 325120 | consumed tokens: 665845760 | elapsed time per iteration (s): 0.42 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 4.912727E+00 | grad norm: 1.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.889 | TFLOPs: 32.05 | +7: iteration 1280/ 173500 | consumed samples: 327680 | consumed tokens: 671088640 | elapsed time per iteration (s): 0.42 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 4.890718E+00 | grad norm: 1.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.395 | TFLOPs: 31.71 | +7: iteration 1290/ 173500 | consumed samples: 330240 | consumed tokens: 676331520 | elapsed time per iteration (s): 0.42 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 4.870580E+00 | grad norm: 1.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.227 | TFLOPs: 31.76 | +7: iteration 1300/ 173500 | consumed samples: 332800 | consumed tokens: 681574400 | elapsed time per iteration (s): 0.43 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 4.852489E+00 | grad norm: 1.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.807 | TFLOPs: 31.37 | +7: iteration 1310/ 173500 | consumed samples: 335360 | consumed tokens: 686817280 | elapsed time per iteration (s): 0.42 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 4.847307E+00 | grad norm: 1.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.270 | TFLOPs: 32.02 | +7: iteration 1320/ 173500 | consumed samples: 337920 | consumed tokens: 692060160 | elapsed time per iteration (s): 0.43 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 4.817994E+00 | grad norm: 1.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.003 | TFLOPs: 31.38 | +7: iteration 1330/ 173500 | consumed samples: 340480 | consumed tokens: 697303040 | elapsed time per iteration (s): 0.42 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 4.797890E+00 | grad norm: 1.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.944 | TFLOPs: 31.64 | +7: iteration 1340/ 173500 | consumed samples: 343040 | consumed tokens: 702545920 | elapsed time per iteration (s): 0.42 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 4.794033E+00 | grad norm: 1.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.614 | TFLOPs: 31.93 | +7: iteration 1350/ 173500 | consumed samples: 345600 | consumed tokens: 707788800 | elapsed time per iteration (s): 0.43 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 4.760350E+00 | grad norm: 1.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.407 | TFLOPs: 31.08 | +7: iteration 1360/ 173500 | consumed samples: 348160 | consumed tokens: 713031680 | elapsed time per iteration (s): 0.43 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 4.764494E+00 | grad norm: 1.147 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.524 | TFLOPs: 31.35 | +7: iteration 1370/ 173500 | consumed samples: 350720 | consumed tokens: 718274560 | elapsed time per iteration (s): 0.43 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 4.739080E+00 | grad norm: 0.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.040 | TFLOPs: 31.48 | +7: iteration 1380/ 173500 | consumed samples: 353280 | consumed tokens: 723517440 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 4.722535E+00 | grad norm: 0.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.135 | TFLOPs: 31.12 | +7: iteration 1390/ 173500 | consumed samples: 355840 | consumed tokens: 728760320 | elapsed time per iteration (s): 0.42 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 4.715213E+00 | grad norm: 1.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.312 | TFLOPs: 31.97 | +7: iteration 1400/ 173500 | consumed samples: 358400 | consumed tokens: 734003200 | elapsed time per iteration (s): 0.42 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 4.687886E+00 | grad norm: 1.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.173 | TFLOPs: 31.70 | +7: iteration 1410/ 173500 | consumed samples: 360960 | consumed tokens: 739246080 | elapsed time per iteration (s): 0.43 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 4.684283E+00 | grad norm: 1.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.686 | TFLOPs: 31.41 | +7: iteration 1420/ 173500 | consumed samples: 363520 | consumed tokens: 744488960 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 4.673464E+00 | grad norm: 1.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.558 | TFLOPs: 31.93 | +7: iteration 1430/ 173500 | consumed samples: 366080 | consumed tokens: 749731840 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 4.659826E+00 | grad norm: 1.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.558 | TFLOPs: 31.62 | +7: iteration 1440/ 173500 | consumed samples: 368640 | consumed tokens: 754974720 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 4.651881E+00 | grad norm: 1.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.209 | TFLOPs: 31.75 | +7: iteration 1450/ 173500 | consumed samples: 371200 | consumed tokens: 760217600 | elapsed time per iteration (s): 0.43 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 4.635293E+00 | grad norm: 0.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.180 | TFLOPs: 31.39 | +7: iteration 1460/ 173500 | consumed samples: 373760 | consumed tokens: 765460480 | elapsed time per iteration (s): 0.44 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 4.602312E+00 | grad norm: 0.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.545 | TFLOPs: 30.62 | +7: iteration 1470/ 173500 | consumed samples: 376320 | consumed tokens: 770703360 | elapsed time per iteration (s): 0.43 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 4.591777E+00 | grad norm: 0.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.897 | TFLOPs: 31.37 | +7: iteration 1480/ 173500 | consumed samples: 378880 | consumed tokens: 775946240 | elapsed time per iteration (s): 0.42 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.574226E+00 | grad norm: 1.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.333 | TFLOPs: 31.76 | +7: iteration 1490/ 173500 | consumed samples: 381440 | consumed tokens: 781189120 | elapsed time per iteration (s): 0.43 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 4.586578E+00 | grad norm: 1.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.433 | TFLOPs: 31.40 | +7: iteration 1500/ 173500 | consumed samples: 384000 | consumed tokens: 786432000 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.578577E+00 | grad norm: 0.984 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.067 | TFLOPs: 32.17 | +7: iteration 1510/ 173500 | consumed samples: 386560 | consumed tokens: 791674880 | elapsed time per iteration (s): 0.44 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.555406E+00 | grad norm: 0.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.013 | TFLOPs: 30.80 | +7: iteration 1520/ 173500 | consumed samples: 389120 | consumed tokens: 796917760 | elapsed time per iteration (s): 0.43 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.544878E+00 | grad norm: 0.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.109 | TFLOPs: 31.59 | +7: iteration 1530/ 173500 | consumed samples: 391680 | consumed tokens: 802160640 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.531528E+00 | grad norm: 1.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.420 | TFLOPs: 31.61 | +7: iteration 1540/ 173500 | consumed samples: 394240 | consumed tokens: 807403520 | elapsed time per iteration (s): 0.42 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.524263E+00 | grad norm: 1.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.657 | TFLOPs: 31.73 | +7: iteration 1550/ 173500 | consumed samples: 396800 | consumed tokens: 812646400 | elapsed time per iteration (s): 0.42 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.510517E+00 | grad norm: 0.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.055 | TFLOPs: 31.85 | +7: iteration 1560/ 173500 | consumed samples: 399360 | consumed tokens: 817889280 | elapsed time per iteration (s): 0.42 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.501853E+00 | grad norm: 0.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.411 | TFLOPs: 31.71 | +7: iteration 1570/ 173500 | consumed samples: 401920 | consumed tokens: 823132160 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.506856E+00 | grad norm: 1.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.609 | TFLOPs: 31.72 | +7: iteration 1580/ 173500 | consumed samples: 404480 | consumed tokens: 828375040 | elapsed time per iteration (s): 0.42 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.475902E+00 | grad norm: 0.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.230 | TFLOPs: 31.97 | +7: iteration 1590/ 173500 | consumed samples: 407040 | consumed tokens: 833617920 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.476100E+00 | grad norm: 0.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.042 | TFLOPs: 31.64 | +7: iteration 1600/ 173500 | consumed samples: 409600 | consumed tokens: 838860800 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.446387E+00 | grad norm: 0.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.496 | TFLOPs: 31.72 | +7: iteration 1610/ 173500 | consumed samples: 412160 | consumed tokens: 844103680 | elapsed time per iteration (s): 0.43 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.457870E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.502 | TFLOPs: 31.56 | +7: iteration 1620/ 173500 | consumed samples: 414720 | consumed tokens: 849346560 | elapsed time per iteration (s): 0.42 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.434789E+00 | grad norm: 0.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.126 | TFLOPs: 32.01 | +7: iteration 1630/ 173500 | consumed samples: 417280 | consumed tokens: 854589440 | elapsed time per iteration (s): 0.42 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.435696E+00 | grad norm: 0.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.075 | TFLOPs: 31.64 | +7: iteration 1640/ 173500 | consumed samples: 419840 | consumed tokens: 859832320 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.421764E+00 | grad norm: 0.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.974 | TFLOPs: 31.48 | +7: iteration 1650/ 173500 | consumed samples: 422400 | consumed tokens: 865075200 | elapsed time per iteration (s): 0.42 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.416389E+00 | grad norm: 0.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.031 | TFLOPs: 32.01 | +7: iteration 1660/ 173500 | consumed samples: 424960 | consumed tokens: 870318080 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.406142E+00 | grad norm: 0.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.060 | TFLOPs: 31.69 | +7: iteration 1670/ 173500 | consumed samples: 427520 | consumed tokens: 875560960 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.402468E+00 | grad norm: 0.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.730 | TFLOPs: 32.10 | +7: iteration 1680/ 173500 | consumed samples: 430080 | consumed tokens: 880803840 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.385918E+00 | grad norm: 0.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.039 | TFLOPs: 31.43 | +7: iteration 1690/ 173500 | consumed samples: 432640 | consumed tokens: 886046720 | elapsed time per iteration (s): 0.42 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.382139E+00 | grad norm: 0.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.623 | TFLOPs: 32.04 | +7: iteration 1700/ 173500 | consumed samples: 435200 | consumed tokens: 891289600 | elapsed time per iteration (s): 0.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.382428E+00 | grad norm: 0.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.879 | TFLOPs: 31.42 | +7: iteration 1710/ 173500 | consumed samples: 437760 | consumed tokens: 896532480 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.360862E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.396 | TFLOPs: 31.71 | +7: iteration 1720/ 173500 | consumed samples: 440320 | consumed tokens: 901775360 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.353530E+00 | grad norm: 0.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.416 | TFLOPs: 31.71 | +7: iteration 1730/ 173500 | consumed samples: 442880 | consumed tokens: 907018240 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.357696E+00 | grad norm: 0.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.150 | TFLOPs: 31.96 | +7: iteration 1740/ 173500 | consumed samples: 445440 | consumed tokens: 912261120 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.345307E+00 | grad norm: 0.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.548 | TFLOPs: 31.72 | +7: iteration 1750/ 173500 | consumed samples: 448000 | consumed tokens: 917504000 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.331901E+00 | grad norm: 0.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.101 | TFLOPs: 31.80 | +7: iteration 1760/ 173500 | consumed samples: 450560 | consumed tokens: 922746880 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.322309E+00 | grad norm: 0.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.994 | TFLOPs: 32.22 | +7: iteration 1770/ 173500 | consumed samples: 453120 | consumed tokens: 927989760 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.313245E+00 | grad norm: 0.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.337 | TFLOPs: 31.81 | +7: iteration 1780/ 173500 | consumed samples: 455680 | consumed tokens: 933232640 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.318204E+00 | grad norm: 0.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.089 | TFLOPs: 32.22 | +7: iteration 1790/ 173500 | consumed samples: 458240 | consumed tokens: 938475520 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.307885E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.104 | TFLOPs: 30.17 | +7: iteration 1800/ 173500 | consumed samples: 460800 | consumed tokens: 943718400 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.280858E+00 | grad norm: 0.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.086 | TFLOPs: 29.96 | +7: iteration 1810/ 173500 | consumed samples: 463360 | consumed tokens: 948961280 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.285321E+00 | grad norm: 0.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.983 | TFLOPs: 30.33 | +7: iteration 1820/ 173500 | consumed samples: 465920 | consumed tokens: 954204160 | elapsed time per iteration (s): 0.49 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.251101E+00 | grad norm: 0.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 521.186 | TFLOPs: 27.35 | +7: iteration 1830/ 173500 | consumed samples: 468480 | consumed tokens: 959447040 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.261727E+00 | grad norm: 0.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.012 | TFLOPs: 30.17 | +7: iteration 1840/ 173500 | consumed samples: 471040 | consumed tokens: 964689920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.269814E+00 | grad norm: 0.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.839 | TFLOPs: 31.26 | +7: iteration 1850/ 173500 | consumed samples: 473600 | consumed tokens: 969932800 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.252392E+00 | grad norm: 0.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.043 | TFLOPs: 31.75 | +7: iteration 1860/ 173500 | consumed samples: 476160 | consumed tokens: 975175680 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.236228E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.949 | TFLOPs: 31.79 | +7: iteration 1870/ 173500 | consumed samples: 478720 | consumed tokens: 980418560 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.245671E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.627 | TFLOPs: 31.41 | +7: iteration 1880/ 173500 | consumed samples: 481280 | consumed tokens: 985661440 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.232524E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.403 | TFLOPs: 30.56 | +7: iteration 1890/ 173500 | consumed samples: 483840 | consumed tokens: 990904320 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.228843E+00 | grad norm: 0.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.931 | TFLOPs: 31.32 | +7: iteration 1900/ 173500 | consumed samples: 486400 | consumed tokens: 996147200 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.226831E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.530 | TFLOPs: 31.77 | +7: iteration 1910/ 173500 | consumed samples: 488960 | consumed tokens: 1001390080 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.225605E+00 | grad norm: 0.951 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.403 | TFLOPs: 31.71 | +7: iteration 1920/ 173500 | consumed samples: 491520 | consumed tokens: 1006632960 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.225905E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.415 | TFLOPs: 31.61 | +7: iteration 1930/ 173500 | consumed samples: 494080 | consumed tokens: 1011875840 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.220687E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.160 | TFLOPs: 32.07 | +7: iteration 1940/ 173500 | consumed samples: 496640 | consumed tokens: 1017118720 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.207380E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.088 | TFLOPs: 31.70 | +7: iteration 1950/ 173500 | consumed samples: 499200 | consumed tokens: 1022361600 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.200599E+00 | grad norm: 0.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.703 | TFLOPs: 31.73 | +7: iteration 1960/ 173500 | consumed samples: 501760 | consumed tokens: 1027604480 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.181387E+00 | grad norm: 0.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.041 | TFLOPs: 32.06 | +7: iteration 1970/ 173500 | consumed samples: 504320 | consumed tokens: 1032847360 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.179651E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.817 | TFLOPs: 31.26 | +7: iteration 1980/ 173500 | consumed samples: 506880 | consumed tokens: 1038090240 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.170005E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.122 | TFLOPs: 32.17 | +7: iteration 1990/ 173500 | consumed samples: 509440 | consumed tokens: 1043333120 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.166014E+00 | grad norm: 0.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.829 | TFLOPs: 31.31 | +0: [2023-03-16 23:26:46,698] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[0.00019999894289482022, 0.00019999894289482022, 0.00019999894289482022], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 2000/ 173500 | consumed samples: 512000 | consumed tokens: 1048576000 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.164322E+00 | grad norm: 0.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.172 | TFLOPs: 31.59 | +0: steps: 2000 loss: 4.1762 iter time (s): 0.440 samples/sec: 582.083 +7: iteration 2010/ 173500 | consumed samples: 514560 | consumed tokens: 1053818880 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.164791E+00 | grad norm: 0.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.672 | TFLOPs: 31.83 | +7: iteration 2020/ 173500 | consumed samples: 517120 | consumed tokens: 1059061760 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.155306E+00 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.123 | TFLOPs: 31.59 | +7: iteration 2030/ 173500 | consumed samples: 519680 | consumed tokens: 1064304640 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.148676E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.019 | TFLOPs: 31.95 | +7: iteration 2040/ 173500 | consumed samples: 522240 | consumed tokens: 1069547520 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.150478E+00 | grad norm: 0.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.936 | TFLOPs: 31.58 | +7: iteration 2050/ 173500 | consumed samples: 524800 | consumed tokens: 1074790400 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.150117E+00 | grad norm: 0.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.601 | TFLOPs: 31.77 | +7: iteration 2060/ 173500 | consumed samples: 527360 | consumed tokens: 1080033280 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.150768E+00 | grad norm: 0.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.573 | TFLOPs: 31.93 | +7: iteration 2070/ 173500 | consumed samples: 529920 | consumed tokens: 1085276160 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.132994E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.758 | TFLOPs: 31.63 | +7: iteration 2080/ 173500 | consumed samples: 532480 | consumed tokens: 1090519040 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.125105E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.566 | TFLOPs: 31.72 | +7: iteration 2090/ 173500 | consumed samples: 535040 | consumed tokens: 1095761920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.130916E+00 | grad norm: 0.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.806 | TFLOPs: 31.58 | +7: iteration 2100/ 173500 | consumed samples: 537600 | consumed tokens: 1101004800 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.121494E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.210 | TFLOPs: 31.70 | +7: iteration 2110/ 173500 | consumed samples: 540160 | consumed tokens: 1106247680 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.112392E+00 | grad norm: 0.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.826 | TFLOPs: 31.73 | +7: iteration 2120/ 173500 | consumed samples: 542720 | consumed tokens: 1111490560 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.118289E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.901 | TFLOPs: 31.48 | +7: iteration 2130/ 173500 | consumed samples: 545280 | consumed tokens: 1116733440 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.115307E+00 | grad norm: 0.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.364 | TFLOPs: 31.92 | +7: iteration 2140/ 173500 | consumed samples: 547840 | consumed tokens: 1121976320 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.105877E+00 | grad norm: 0.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.003 | TFLOPs: 31.69 | +7: iteration 2150/ 173500 | consumed samples: 550400 | consumed tokens: 1127219200 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.096083E+00 | grad norm: 0.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.833 | TFLOPs: 31.68 | +7: iteration 2160/ 173500 | consumed samples: 552960 | consumed tokens: 1132462080 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.082313E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.727 | TFLOPs: 31.68 | +7: iteration 2170/ 173500 | consumed samples: 555520 | consumed tokens: 1137704960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.088729E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.415 | TFLOPs: 31.45 | +7: iteration 2180/ 173500 | consumed samples: 558080 | consumed tokens: 1142947840 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.097598E+00 | grad norm: 0.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.316 | TFLOPs: 31.71 | +7: iteration 2190/ 173500 | consumed samples: 560640 | consumed tokens: 1148190720 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.100043E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.180 | TFLOPs: 32.12 | +7: iteration 2200/ 173500 | consumed samples: 563200 | consumed tokens: 1153433600 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.082574E+00 | grad norm: 0.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.943 | TFLOPs: 31.64 | +7: iteration 2210/ 173500 | consumed samples: 565760 | consumed tokens: 1158676480 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.075975E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.136 | TFLOPs: 32.12 | +7: iteration 2220/ 173500 | consumed samples: 568320 | consumed tokens: 1163919360 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.069316E+00 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.246 | TFLOPs: 31.97 | +7: iteration 2230/ 173500 | consumed samples: 570880 | consumed tokens: 1169162240 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.059424E+00 | grad norm: 0.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.735 | TFLOPs: 31.83 | +7: iteration 2240/ 173500 | consumed samples: 573440 | consumed tokens: 1174405120 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.050583E+00 | grad norm: 0.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.148 | TFLOPs: 31.91 | +7: iteration 2250/ 173500 | consumed samples: 576000 | consumed tokens: 1179648000 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.066551E+00 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.690 | TFLOPs: 31.20 | +7: iteration 2260/ 173500 | consumed samples: 578560 | consumed tokens: 1184890880 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.048772E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.021 | TFLOPs: 31.59 | +7: iteration 2270/ 173500 | consumed samples: 581120 | consumed tokens: 1190133760 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.038480E+00 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.376 | TFLOPs: 31.40 | +7: iteration 2280/ 173500 | consumed samples: 583680 | consumed tokens: 1195376640 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.047002E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.958 | TFLOPs: 31.43 | +7: iteration 2290/ 173500 | consumed samples: 586240 | consumed tokens: 1200619520 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.046938E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.804 | TFLOPs: 31.73 | +7: iteration 2300/ 173500 | consumed samples: 588800 | consumed tokens: 1205862400 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.035880E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.952 | TFLOPs: 32.06 | +7: iteration 2310/ 173500 | consumed samples: 591360 | consumed tokens: 1211105280 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.039137E+00 | grad norm: 0.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.865 | TFLOPs: 31.74 | +7: iteration 2320/ 173500 | consumed samples: 593920 | consumed tokens: 1216348160 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.031695E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.794 | TFLOPs: 31.42 | +7: iteration 2330/ 173500 | consumed samples: 596480 | consumed tokens: 1221591040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.037022E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.770 | TFLOPs: 31.15 | +7: iteration 2340/ 173500 | consumed samples: 599040 | consumed tokens: 1226833920 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.015466E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.144 | TFLOPs: 31.80 | +7: iteration 2350/ 173500 | consumed samples: 601600 | consumed tokens: 1232076800 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.023634E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.065 | TFLOPs: 31.17 | +7: iteration 2360/ 173500 | consumed samples: 604160 | consumed tokens: 1237319680 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.017078E+00 | grad norm: 0.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.465 | TFLOPs: 31.09 | +7: iteration 2370/ 173500 | consumed samples: 606720 | consumed tokens: 1242562560 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.017395E+00 | grad norm: 0.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.604 | TFLOPs: 31.41 | +7: iteration 2380/ 173500 | consumed samples: 609280 | consumed tokens: 1247805440 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.013470E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.390 | TFLOPs: 31.82 | +7: iteration 2390/ 173500 | consumed samples: 611840 | consumed tokens: 1253048320 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.007680E+00 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.493 | TFLOPs: 31.30 | +7: iteration 2400/ 173500 | consumed samples: 614400 | consumed tokens: 1258291200 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.005873E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.431 | TFLOPs: 30.98 | +7: iteration 2410/ 173500 | consumed samples: 616960 | consumed tokens: 1263534080 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.997395E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.695 | TFLOPs: 30.94 | +7: iteration 2420/ 173500 | consumed samples: 619520 | consumed tokens: 1268776960 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.998082E+00 | grad norm: 0.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.305 | TFLOPs: 31.97 | +7: iteration 2430/ 173500 | consumed samples: 622080 | consumed tokens: 1274019840 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.990078E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.089 | TFLOPs: 31.75 | +7: iteration 2440/ 173500 | consumed samples: 624640 | consumed tokens: 1279262720 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.006165E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.094 | TFLOPs: 31.80 | +7: iteration 2450/ 173500 | consumed samples: 627200 | consumed tokens: 1284505600 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.996013E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.623 | TFLOPs: 31.46 | +7: iteration 2460/ 173500 | consumed samples: 629760 | consumed tokens: 1289748480 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.975197E+00 | grad norm: 0.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.321 | TFLOPs: 31.18 | +7: iteration 2470/ 173500 | consumed samples: 632320 | consumed tokens: 1294991360 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.993265E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.551 | TFLOPs: 31.82 | +7: iteration 2480/ 173500 | consumed samples: 634880 | consumed tokens: 1300234240 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.985409E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.143 | TFLOPs: 30.91 | +7: iteration 2490/ 173500 | consumed samples: 637440 | consumed tokens: 1305477120 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.969620E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.779 | TFLOPs: 31.26 | +7: iteration 2500/ 173500 | consumed samples: 640000 | consumed tokens: 1310720000 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.974516E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.003 | TFLOPs: 31.85 | +7: iteration 2510/ 173500 | consumed samples: 642560 | consumed tokens: 1315962880 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.981548E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.253 | TFLOPs: 31.81 | +7: iteration 2520/ 173500 | consumed samples: 645120 | consumed tokens: 1321205760 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.975691E+00 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.282 | TFLOPs: 31.39 | +7: iteration 2530/ 173500 | consumed samples: 647680 | consumed tokens: 1326448640 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.962395E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.236 | TFLOPs: 31.23 | +7: iteration 2540/ 173500 | consumed samples: 650240 | consumed tokens: 1331691520 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.958034E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.680 | TFLOPs: 31.78 | +7: iteration 2550/ 173500 | consumed samples: 652800 | consumed tokens: 1336934400 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.940331E+00 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.708 | TFLOPs: 31.89 | +7: iteration 2560/ 173500 | consumed samples: 655360 | consumed tokens: 1342177280 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.961070E+00 | grad norm: 0.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.229 | TFLOPs: 31.81 | +7: iteration 2570/ 173500 | consumed samples: 657920 | consumed tokens: 1347420160 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.946764E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.445 | TFLOPs: 31.66 | +7: iteration 2580/ 173500 | consumed samples: 660480 | consumed tokens: 1352663040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.946391E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.097 | TFLOPs: 31.59 | +7: iteration 2590/ 173500 | consumed samples: 663040 | consumed tokens: 1357905920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.983442E+00 | grad norm: 0.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.519 | TFLOPs: 31.25 | +7: iteration 2600/ 173500 | consumed samples: 665600 | consumed tokens: 1363148800 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.969021E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.589 | TFLOPs: 31.25 | +7: iteration 2610/ 173500 | consumed samples: 668160 | consumed tokens: 1368391680 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.949662E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.258 | TFLOPs: 30.71 | +7: iteration 2620/ 173500 | consumed samples: 670720 | consumed tokens: 1373634560 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.940140E+00 | grad norm: 0.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.496 | TFLOPs: 32.03 | +7: iteration 2630/ 173500 | consumed samples: 673280 | consumed tokens: 1378877440 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.949547E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.934 | TFLOPs: 31.01 | +7: iteration 2640/ 173500 | consumed samples: 675840 | consumed tokens: 1384120320 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.942000E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.838 | TFLOPs: 31.94 | +7: iteration 2650/ 173500 | consumed samples: 678400 | consumed tokens: 1389363200 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.933393E+00 | grad norm: 0.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.688 | TFLOPs: 31.78 | +7: iteration 2660/ 173500 | consumed samples: 680960 | consumed tokens: 1394606080 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.925486E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.342 | TFLOPs: 31.08 | +7: iteration 2670/ 173500 | consumed samples: 683520 | consumed tokens: 1399848960 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.916146E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.897 | TFLOPs: 32.00 | +7: iteration 2680/ 173500 | consumed samples: 686080 | consumed tokens: 1405091840 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.927297E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.256 | TFLOPs: 31.60 | +7: iteration 2690/ 173500 | consumed samples: 688640 | consumed tokens: 1410334720 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.925549E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.793 | TFLOPs: 31.63 | +7: iteration 2700/ 173500 | consumed samples: 691200 | consumed tokens: 1415577600 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.919344E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.476 | TFLOPs: 31.45 | +7: iteration 2710/ 173500 | consumed samples: 693760 | consumed tokens: 1420820480 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.904567E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.088 | TFLOPs: 31.54 | +7: iteration 2720/ 173500 | consumed samples: 696320 | consumed tokens: 1426063360 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.919092E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.750 | TFLOPs: 31.78 | +7: iteration 2730/ 173500 | consumed samples: 698880 | consumed tokens: 1431306240 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.907750E+00 | grad norm: 0.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.437 | TFLOPs: 31.45 | +7: iteration 2740/ 173500 | consumed samples: 701440 | consumed tokens: 1436549120 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.900991E+00 | grad norm: 0.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.615 | TFLOPs: 31.20 | +7: iteration 2750/ 173500 | consumed samples: 704000 | consumed tokens: 1441792000 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.897715E+00 | grad norm: 0.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.938 | TFLOPs: 30.69 | +7: iteration 2760/ 173500 | consumed samples: 706560 | consumed tokens: 1447034880 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.912563E+00 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.624 | TFLOPs: 31.72 | +7: iteration 2770/ 173500 | consumed samples: 709120 | consumed tokens: 1452277760 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.907792E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.740 | TFLOPs: 31.68 | +7: iteration 2780/ 173500 | consumed samples: 711680 | consumed tokens: 1457520640 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.891982E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.999 | TFLOPs: 31.74 | +7: iteration 2790/ 173500 | consumed samples: 714240 | consumed tokens: 1462763520 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.892207E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.484 | TFLOPs: 31.35 | +7: iteration 2800/ 173500 | consumed samples: 716800 | consumed tokens: 1468006400 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.886278E+00 | grad norm: 0.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.879 | TFLOPs: 31.79 | +7: iteration 2810/ 173500 | consumed samples: 719360 | consumed tokens: 1473249280 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.881386E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.426 | TFLOPs: 31.40 | +7: iteration 2820/ 173500 | consumed samples: 721920 | consumed tokens: 1478492160 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.881649E+00 | grad norm: 0.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.121 | TFLOPs: 31.38 | +7: iteration 2830/ 173500 | consumed samples: 724480 | consumed tokens: 1483735040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.881894E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.581 | TFLOPs: 31.09 | +7: iteration 2840/ 173500 | consumed samples: 727040 | consumed tokens: 1488977920 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.898666E+00 | grad norm: 0.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.377 | TFLOPs: 31.76 | +7: iteration 2850/ 173500 | consumed samples: 729600 | consumed tokens: 1494220800 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.891762E+00 | grad norm: 0.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.562 | TFLOPs: 31.72 | +7: iteration 2860/ 173500 | consumed samples: 732160 | consumed tokens: 1499463680 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.879359E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.624 | TFLOPs: 31.67 | +7: iteration 2870/ 173500 | consumed samples: 734720 | consumed tokens: 1504706560 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.875740E+00 | grad norm: 0.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.551 | TFLOPs: 31.09 | +7: iteration 2880/ 173500 | consumed samples: 737280 | consumed tokens: 1509949440 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.858747E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.223 | TFLOPs: 31.91 | +7: iteration 2890/ 173500 | consumed samples: 739840 | consumed tokens: 1515192320 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.867318E+00 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.948 | TFLOPs: 31.79 | +7: iteration 2900/ 173500 | consumed samples: 742400 | consumed tokens: 1520435200 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.865944E+00 | grad norm: 0.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.260 | TFLOPs: 31.60 | +7: iteration 2910/ 173500 | consumed samples: 744960 | consumed tokens: 1525678080 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.869138E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.510 | TFLOPs: 31.77 | +7: iteration 2920/ 173500 | consumed samples: 747520 | consumed tokens: 1530920960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.865798E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.776 | TFLOPs: 31.42 | +7: iteration 2930/ 173500 | consumed samples: 750080 | consumed tokens: 1536163840 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.860381E+00 | grad norm: 0.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.161 | TFLOPs: 31.38 | +7: iteration 2940/ 173500 | consumed samples: 752640 | consumed tokens: 1541406720 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.872655E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.281 | TFLOPs: 31.97 | +7: iteration 2950/ 173500 | consumed samples: 755200 | consumed tokens: 1546649600 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.861592E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.505 | TFLOPs: 30.98 | +7: iteration 2960/ 173500 | consumed samples: 757760 | consumed tokens: 1551892480 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.866153E+00 | grad norm: 0.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.285 | TFLOPs: 31.34 | +7: iteration 2970/ 173500 | consumed samples: 760320 | consumed tokens: 1557135360 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.872734E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.666 | TFLOPs: 30.47 | +7: iteration 2980/ 173500 | consumed samples: 762880 | consumed tokens: 1562378240 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.863243E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.160 | TFLOPs: 31.80 | +7: iteration 2990/ 173500 | consumed samples: 765440 | consumed tokens: 1567621120 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.858627E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.595 | TFLOPs: 31.35 | +7: iteration 3000/ 173500 | consumed samples: 768000 | consumed tokens: 1572864000 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.830575E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.076 | TFLOPs: 30.80 | +7: iteration 3010/ 173500 | consumed samples: 770560 | consumed tokens: 1578106880 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.850412E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.965 | TFLOPs: 30.95 | +7: iteration 3020/ 173500 | consumed samples: 773120 | consumed tokens: 1583349760 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.849297E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.164 | TFLOPs: 31.33 | +7: iteration 3030/ 173500 | consumed samples: 775680 | consumed tokens: 1588592640 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.833316E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.441 | TFLOPs: 31.66 | +7: iteration 3040/ 173500 | consumed samples: 778240 | consumed tokens: 1593835520 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.841031E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.405 | TFLOPs: 31.76 | +7: iteration 3050/ 173500 | consumed samples: 780800 | consumed tokens: 1599078400 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.832397E+00 | grad norm: 0.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.712 | TFLOPs: 31.36 | +7: iteration 3060/ 173500 | consumed samples: 783360 | consumed tokens: 1604321280 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.833721E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.474 | TFLOPs: 31.03 | +7: iteration 3070/ 173500 | consumed samples: 785920 | consumed tokens: 1609564160 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.829943E+00 | grad norm: 0.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.581 | TFLOPs: 31.72 | +7: iteration 3080/ 173500 | consumed samples: 788480 | consumed tokens: 1614807040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.837393E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.542 | TFLOPs: 31.19 | +7: iteration 3090/ 173500 | consumed samples: 791040 | consumed tokens: 1620049920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.833812E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.629 | TFLOPs: 31.51 | +7: iteration 3100/ 173500 | consumed samples: 793600 | consumed tokens: 1625292800 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.842229E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.102 | TFLOPs: 31.17 | +7: iteration 3110/ 173500 | consumed samples: 796160 | consumed tokens: 1630535680 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.821613E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.645 | TFLOPs: 31.78 | +7: iteration 3120/ 173500 | consumed samples: 798720 | consumed tokens: 1635778560 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.831325E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.696 | TFLOPs: 31.52 | +7: iteration 3130/ 173500 | consumed samples: 801280 | consumed tokens: 1641021440 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.808165E+00 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.675 | TFLOPs: 30.78 | +7: iteration 3140/ 173500 | consumed samples: 803840 | consumed tokens: 1646264320 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.822429E+00 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.048 | TFLOPs: 31.64 | +7: iteration 3150/ 173500 | consumed samples: 806400 | consumed tokens: 1651507200 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.822335E+00 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.256 | TFLOPs: 30.86 | +7: iteration 3160/ 173500 | consumed samples: 808960 | consumed tokens: 1656750080 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.832071E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.781 | TFLOPs: 30.37 | +7: iteration 3170/ 173500 | consumed samples: 811520 | consumed tokens: 1661992960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.808773E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.349 | TFLOPs: 31.45 | +7: iteration 3180/ 173500 | consumed samples: 814080 | consumed tokens: 1667235840 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.810248E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.445 | TFLOPs: 31.40 | +7: iteration 3190/ 173500 | consumed samples: 816640 | consumed tokens: 1672478720 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.813251E+00 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.151 | TFLOPs: 31.07 | +7: iteration 3200/ 173500 | consumed samples: 819200 | consumed tokens: 1677721600 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.799815E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.533 | TFLOPs: 31.46 | +7: iteration 3210/ 173500 | consumed samples: 821760 | consumed tokens: 1682964480 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.793602E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.177 | TFLOPs: 31.70 | +7: iteration 3220/ 173500 | consumed samples: 824320 | consumed tokens: 1688207360 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.869514E+00 | grad norm: 3.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.069 | TFLOPs: 31.01 | +7: iteration 3230/ 173500 | consumed samples: 826880 | consumed tokens: 1693450240 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.101602E+00 | grad norm: 2.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.454 | TFLOPs: 31.45 | +7: iteration 3240/ 173500 | consumed samples: 829440 | consumed tokens: 1698693120 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.993789E+00 | grad norm: 1.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.789 | TFLOPs: 30.84 | +7: iteration 3250/ 173500 | consumed samples: 832000 | consumed tokens: 1703936000 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.933115E+00 | grad norm: 1.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.248 | TFLOPs: 31.23 | +7: iteration 3260/ 173500 | consumed samples: 834560 | consumed tokens: 1709178880 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.872916E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.813 | TFLOPs: 31.63 | +7: iteration 3270/ 173500 | consumed samples: 837120 | consumed tokens: 1714421760 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.834515E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.307 | TFLOPs: 30.97 | +7: iteration 3280/ 173500 | consumed samples: 839680 | consumed tokens: 1719664640 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.820752E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.620 | TFLOPs: 31.15 | +7: iteration 3290/ 173500 | consumed samples: 842240 | consumed tokens: 1724907520 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.814715E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.593 | TFLOPs: 31.56 | +7: iteration 3300/ 173500 | consumed samples: 844800 | consumed tokens: 1730150400 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.798914E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.744 | TFLOPs: 31.42 | +7: iteration 3310/ 173500 | consumed samples: 847360 | consumed tokens: 1735393280 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.803678E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.180 | TFLOPs: 31.28 | +7: iteration 3320/ 173500 | consumed samples: 849920 | consumed tokens: 1740636160 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.796717E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.591 | TFLOPs: 30.62 | +7: iteration 3330/ 173500 | consumed samples: 852480 | consumed tokens: 1745879040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.784473E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.487 | TFLOPs: 31.56 | +7: iteration 3340/ 173500 | consumed samples: 855040 | consumed tokens: 1751121920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.809692E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.761 | TFLOPs: 31.57 | +7: iteration 3350/ 173500 | consumed samples: 857600 | consumed tokens: 1756364800 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.796765E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.142 | TFLOPs: 30.33 | +7: iteration 3360/ 173500 | consumed samples: 860160 | consumed tokens: 1761607680 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.794554E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.052 | TFLOPs: 30.85 | +7: iteration 3370/ 173500 | consumed samples: 862720 | consumed tokens: 1766850560 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.781313E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.380 | TFLOPs: 31.08 | +7: iteration 3380/ 173500 | consumed samples: 865280 | consumed tokens: 1772093440 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.784670E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.942 | TFLOPs: 30.59 | +7: iteration 3390/ 173500 | consumed samples: 867840 | consumed tokens: 1777336320 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.789180E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.448 | TFLOPs: 30.87 | +7: iteration 3400/ 173500 | consumed samples: 870400 | consumed tokens: 1782579200 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.780103E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.096 | TFLOPs: 30.54 | +7: iteration 3410/ 173500 | consumed samples: 872960 | consumed tokens: 1787822080 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.788302E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.129 | TFLOPs: 30.02 | +7: iteration 3420/ 173500 | consumed samples: 875520 | consumed tokens: 1793064960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.782669E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.269 | TFLOPs: 31.02 | +7: iteration 3430/ 173500 | consumed samples: 878080 | consumed tokens: 1798307840 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.759518E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.455 | TFLOPs: 31.77 | +7: iteration 3440/ 173500 | consumed samples: 880640 | consumed tokens: 1803550720 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.774536E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.893 | TFLOPs: 31.42 | +7: iteration 3450/ 173500 | consumed samples: 883200 | consumed tokens: 1808793600 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.764154E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.004 | TFLOPs: 30.90 | +7: iteration 3460/ 173500 | consumed samples: 885760 | consumed tokens: 1814036480 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.768039E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.814 | TFLOPs: 31.37 | +7: iteration 3470/ 173500 | consumed samples: 888320 | consumed tokens: 1819279360 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.770459E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.581 | TFLOPs: 31.72 | +7: iteration 3480/ 173500 | consumed samples: 890880 | consumed tokens: 1824522240 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.767592E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.743 | TFLOPs: 31.00 | +7: iteration 3490/ 173500 | consumed samples: 893440 | consumed tokens: 1829765120 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.759336E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.964 | TFLOPs: 31.22 | +7: iteration 3500/ 173500 | consumed samples: 896000 | consumed tokens: 1835008000 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.774363E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.794 | TFLOPs: 31.68 | +7: iteration 3510/ 173500 | consumed samples: 898560 | consumed tokens: 1840250880 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.758509E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.942 | TFLOPs: 30.69 | +7: iteration 3520/ 173500 | consumed samples: 901120 | consumed tokens: 1845493760 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.759268E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.276 | TFLOPs: 31.76 | +7: iteration 3530/ 173500 | consumed samples: 903680 | consumed tokens: 1850736640 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.749870E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.135 | TFLOPs: 30.96 | +7: iteration 3540/ 173500 | consumed samples: 906240 | consumed tokens: 1855979520 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.751918E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.014 | TFLOPs: 31.27 | +7: iteration 3550/ 173500 | consumed samples: 908800 | consumed tokens: 1861222400 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.748963E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.275 | TFLOPs: 31.02 | +7: iteration 3560/ 173500 | consumed samples: 911360 | consumed tokens: 1866465280 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.746321E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.095 | TFLOPs: 31.75 | +7: iteration 3570/ 173500 | consumed samples: 913920 | consumed tokens: 1871708160 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.743822E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.582 | TFLOPs: 31.51 | +7: iteration 3580/ 173500 | consumed samples: 916480 | consumed tokens: 1876951040 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.752657E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.258 | TFLOPs: 31.07 | +7: iteration 3590/ 173500 | consumed samples: 919040 | consumed tokens: 1882193920 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.753701E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.281 | TFLOPs: 31.60 | +7: iteration 3600/ 173500 | consumed samples: 921600 | consumed tokens: 1887436800 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.738715E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.696 | TFLOPs: 31.31 | +7: iteration 3610/ 173500 | consumed samples: 924160 | consumed tokens: 1892679680 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.745165E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.856 | TFLOPs: 30.79 | +7: iteration 3620/ 173500 | consumed samples: 926720 | consumed tokens: 1897922560 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.745461E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.716 | TFLOPs: 31.41 | +7: iteration 3630/ 173500 | consumed samples: 929280 | consumed tokens: 1903165440 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.737286E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.702 | TFLOPs: 30.57 | +7: iteration 3640/ 173500 | consumed samples: 931840 | consumed tokens: 1908408320 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.754039E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.115 | TFLOPs: 30.70 | +7: iteration 3650/ 173500 | consumed samples: 934400 | consumed tokens: 1913651200 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.740160E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.977 | TFLOPs: 31.64 | +7: iteration 3660/ 173500 | consumed samples: 936960 | consumed tokens: 1918894080 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.729498E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.420 | TFLOPs: 30.82 | +7: iteration 3670/ 173500 | consumed samples: 939520 | consumed tokens: 1924136960 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.737051E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.439 | TFLOPs: 31.35 | +7: iteration 3680/ 173500 | consumed samples: 942080 | consumed tokens: 1929379840 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.730684E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.105 | TFLOPs: 31.64 | +7: iteration 3690/ 173500 | consumed samples: 944640 | consumed tokens: 1934622720 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.732690E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.673 | TFLOPs: 30.94 | +7: iteration 3700/ 173500 | consumed samples: 947200 | consumed tokens: 1939865600 | elapsed time per iteration (s): 0.45 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.736459E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.556 | TFLOPs: 29.78 | +7: iteration 3710/ 173500 | consumed samples: 949760 | consumed tokens: 1945108480 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.729023E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.151 | TFLOPs: 30.23 | +7: iteration 3720/ 173500 | consumed samples: 952320 | consumed tokens: 1950351360 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.731850E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.554 | TFLOPs: 30.62 | +7: iteration 3730/ 173500 | consumed samples: 954880 | consumed tokens: 1955594240 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.737431E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.808 | TFLOPs: 30.95 | +7: iteration 3740/ 173500 | consumed samples: 957440 | consumed tokens: 1960837120 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.729917E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.642 | TFLOPs: 31.57 | +7: iteration 3750/ 173500 | consumed samples: 960000 | consumed tokens: 1966080000 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.726548E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.194 | TFLOPs: 31.75 | +7: iteration 3760/ 173500 | consumed samples: 962560 | consumed tokens: 1971322880 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.735548E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.152 | TFLOPs: 31.49 | +7: iteration 3770/ 173500 | consumed samples: 965120 | consumed tokens: 1976565760 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.712889E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.792 | TFLOPs: 31.84 | +7: iteration 3780/ 173500 | consumed samples: 967680 | consumed tokens: 1981808640 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.726212E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.943 | TFLOPs: 31.22 | +7: iteration 3790/ 173500 | consumed samples: 970240 | consumed tokens: 1987051520 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.724542E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.901 | TFLOPs: 31.11 | +7: iteration 3800/ 173500 | consumed samples: 972800 | consumed tokens: 1992294400 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.721121E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.896 | TFLOPs: 31.21 | +7: iteration 3810/ 173500 | consumed samples: 975360 | consumed tokens: 1997537280 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.700520E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.180 | TFLOPs: 31.86 | +7: iteration 3820/ 173500 | consumed samples: 977920 | consumed tokens: 2002780160 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.724140E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.268 | TFLOPs: 30.76 | +7: iteration 3830/ 173500 | consumed samples: 980480 | consumed tokens: 2008023040 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.715448E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.268 | TFLOPs: 31.70 | +7: iteration 3840/ 173500 | consumed samples: 983040 | consumed tokens: 2013265920 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.716695E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.624 | TFLOPs: 31.46 | +7: iteration 3850/ 173500 | consumed samples: 985600 | consumed tokens: 2018508800 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.722637E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.544 | TFLOPs: 31.40 | +7: iteration 3860/ 173500 | consumed samples: 988160 | consumed tokens: 2023751680 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.709404E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.962 | TFLOPs: 31.48 | +7: iteration 3870/ 173500 | consumed samples: 990720 | consumed tokens: 2028994560 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.707555E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.404 | TFLOPs: 31.40 | +7: iteration 3880/ 173500 | consumed samples: 993280 | consumed tokens: 2034237440 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.709870E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.804 | TFLOPs: 31.47 | +7: iteration 3890/ 173500 | consumed samples: 995840 | consumed tokens: 2039480320 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.708128E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.481 | TFLOPs: 31.66 | +7: iteration 3900/ 173500 | consumed samples: 998400 | consumed tokens: 2044723200 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.708565E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.263 | TFLOPs: 31.39 | +7: iteration 3910/ 173500 | consumed samples: 1000960 | consumed tokens: 2049966080 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.692660E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.459 | TFLOPs: 30.51 | +7: iteration 3920/ 173500 | consumed samples: 1003520 | consumed tokens: 2055208960 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.697388E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.459 | TFLOPs: 31.09 | +7: iteration 3930/ 173500 | consumed samples: 1006080 | consumed tokens: 2060451840 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.695096E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.920 | TFLOPs: 31.37 | +7: iteration 3940/ 173500 | consumed samples: 1008640 | consumed tokens: 2065694720 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.711777E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.179 | TFLOPs: 31.39 | +7: iteration 3950/ 173500 | consumed samples: 1011200 | consumed tokens: 2070937600 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.702279E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.048 | TFLOPs: 31.38 | +7: iteration 3960/ 173500 | consumed samples: 1013760 | consumed tokens: 2076180480 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.690252E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.719 | TFLOPs: 30.99 | +7: iteration 3970/ 173500 | consumed samples: 1016320 | consumed tokens: 2081423360 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.699546E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.552 | TFLOPs: 31.35 | +7: iteration 3980/ 173500 | consumed samples: 1018880 | consumed tokens: 2086666240 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.681618E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.303 | TFLOPs: 31.39 | +7: iteration 3990/ 173500 | consumed samples: 1021440 | consumed tokens: 2091909120 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.698860E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.236 | TFLOPs: 30.86 | +0: [2023-03-16 23:41:02,943] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=0, lr=[0.00019992278300259638, 0.00019992278300259638, 0.00019992278300259638], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 4000/ 173500 | consumed samples: 1024000 | consumed tokens: 2097152000 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.685293E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.404 | TFLOPs: 31.08 | +0: steps: 4000 loss: 3.7046 iter time (s): 0.426 samples/sec: 601.137 +7: iteration 4010/ 173500 | consumed samples: 1026560 | consumed tokens: 2102394880 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.696207E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.161 | TFLOPs: 31.28 | +7: iteration 4020/ 173500 | consumed samples: 1029120 | consumed tokens: 2107637760 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.682235E+00 | grad norm: 0.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.118 | TFLOPs: 31.22 | +7: iteration 4030/ 173500 | consumed samples: 1031680 | consumed tokens: 2112880640 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.694265E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.612 | TFLOPs: 31.09 | +7: iteration 4040/ 173500 | consumed samples: 1034240 | consumed tokens: 2118123520 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.719470E+00 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.728 | TFLOPs: 30.63 | +7: iteration 4050/ 173500 | consumed samples: 1036800 | consumed tokens: 2123366400 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.699894E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.477 | TFLOPs: 30.72 | +7: iteration 4060/ 173500 | consumed samples: 1039360 | consumed tokens: 2128609280 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.703086E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.732 | TFLOPs: 31.20 | +7: iteration 4070/ 173500 | consumed samples: 1041920 | consumed tokens: 2133852160 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.693624E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.708 | TFLOPs: 31.47 | +7: iteration 4080/ 173500 | consumed samples: 1044480 | consumed tokens: 2139095040 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.678944E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.820 | TFLOPs: 31.68 | +7: iteration 4090/ 173500 | consumed samples: 1047040 | consumed tokens: 2144337920 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.685651E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.339 | TFLOPs: 31.45 | +7: iteration 4100/ 173500 | consumed samples: 1049600 | consumed tokens: 2149580800 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.684243E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.354 | TFLOPs: 30.87 | +7: iteration 4110/ 173500 | consumed samples: 1052160 | consumed tokens: 2154823680 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.687714E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.461 | TFLOPs: 31.45 | +7: iteration 4120/ 173500 | consumed samples: 1054720 | consumed tokens: 2160066560 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.677441E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.644 | TFLOPs: 31.30 | +7: iteration 4130/ 173500 | consumed samples: 1057280 | consumed tokens: 2165309440 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.680568E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.561 | TFLOPs: 31.35 | +7: iteration 4140/ 173500 | consumed samples: 1059840 | consumed tokens: 2170552320 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.670870E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.837 | TFLOPs: 31.05 | +7: iteration 4150/ 173500 | consumed samples: 1062400 | consumed tokens: 2175795200 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.672789E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.669 | TFLOPs: 30.89 | +7: iteration 4160/ 173500 | consumed samples: 1064960 | consumed tokens: 2181038080 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.683763E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.548 | TFLOPs: 31.72 | +7: iteration 4170/ 173500 | consumed samples: 1067520 | consumed tokens: 2186280960 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.669456E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.037 | TFLOPs: 31.33 | +7: iteration 4180/ 173500 | consumed samples: 1070080 | consumed tokens: 2191523840 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.669727E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.990 | TFLOPs: 31.01 | +7: iteration 4190/ 173500 | consumed samples: 1072640 | consumed tokens: 2196766720 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.662915E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.253 | TFLOPs: 31.86 | +7: iteration 4200/ 173500 | consumed samples: 1075200 | consumed tokens: 2202009600 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.688901E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.915 | TFLOPs: 31.00 | +7: iteration 4210/ 173500 | consumed samples: 1077760 | consumed tokens: 2207252480 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.673551E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.709 | TFLOPs: 31.26 | +7: iteration 4220/ 173500 | consumed samples: 1080320 | consumed tokens: 2212495360 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.675057E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.066 | TFLOPs: 31.06 | +7: iteration 4230/ 173500 | consumed samples: 1082880 | consumed tokens: 2217738240 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.672914E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.469 | TFLOPs: 31.61 | +7: iteration 4240/ 173500 | consumed samples: 1085440 | consumed tokens: 2222981120 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.662315E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.802 | TFLOPs: 31.05 | +7: iteration 4250/ 173500 | consumed samples: 1088000 | consumed tokens: 2228224000 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.664058E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.916 | TFLOPs: 31.69 | +7: iteration 4260/ 173500 | consumed samples: 1090560 | consumed tokens: 2233466880 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.652395E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.863 | TFLOPs: 31.21 | +7: iteration 4270/ 173500 | consumed samples: 1093120 | consumed tokens: 2238709760 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.644294E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.885 | TFLOPs: 31.32 | +7: iteration 4280/ 173500 | consumed samples: 1095680 | consumed tokens: 2243952640 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.672278E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.589 | TFLOPs: 31.72 | +7: iteration 4290/ 173500 | consumed samples: 1098240 | consumed tokens: 2249195520 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.654831E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.066 | TFLOPs: 31.17 | +7: iteration 4300/ 173500 | consumed samples: 1100800 | consumed tokens: 2254438400 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.657030E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.996 | TFLOPs: 31.32 | +7: iteration 4310/ 173500 | consumed samples: 1103360 | consumed tokens: 2259681280 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.657473E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.789 | TFLOPs: 30.79 | +7: iteration 4320/ 173500 | consumed samples: 1105920 | consumed tokens: 2264924160 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.654404E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.752 | TFLOPs: 31.78 | +7: iteration 4330/ 173500 | consumed samples: 1108480 | consumed tokens: 2270167040 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.657529E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.023 | TFLOPs: 30.64 | +7: iteration 4340/ 173500 | consumed samples: 1111040 | consumed tokens: 2275409920 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.642547E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.444 | TFLOPs: 30.93 | +7: iteration 4350/ 173500 | consumed samples: 1113600 | consumed tokens: 2280652800 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.640149E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.514 | TFLOPs: 31.82 | +7: iteration 4360/ 173500 | consumed samples: 1116160 | consumed tokens: 2285895680 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.651385E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.126 | TFLOPs: 31.38 | +7: iteration 4370/ 173500 | consumed samples: 1118720 | consumed tokens: 2291138560 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.660643E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.994 | TFLOPs: 31.74 | +7: iteration 4380/ 173500 | consumed samples: 1121280 | consumed tokens: 2296381440 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.641351E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.156 | TFLOPs: 31.17 | +7: iteration 4390/ 173500 | consumed samples: 1123840 | consumed tokens: 2301624320 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.638283E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.216 | TFLOPs: 31.54 | +7: iteration 4400/ 173500 | consumed samples: 1126400 | consumed tokens: 2306867200 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.654099E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.003 | TFLOPs: 31.06 | +7: iteration 4410/ 173500 | consumed samples: 1128960 | consumed tokens: 2312110080 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.642993E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.177 | TFLOPs: 31.65 | +7: iteration 4420/ 173500 | consumed samples: 1131520 | consumed tokens: 2317352960 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.647881E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.936 | TFLOPs: 31.79 | +7: iteration 4430/ 173500 | consumed samples: 1134080 | consumed tokens: 2322595840 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.637556E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.663 | TFLOPs: 31.10 | +7: iteration 4440/ 173500 | consumed samples: 1136640 | consumed tokens: 2327838720 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.634969E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.189 | TFLOPs: 31.44 | +7: iteration 4450/ 173500 | consumed samples: 1139200 | consumed tokens: 2333081600 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.648293E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.124 | TFLOPs: 31.44 | +7: iteration 4460/ 173500 | consumed samples: 1141760 | consumed tokens: 2338324480 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.642069E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.342 | TFLOPs: 31.13 | +7: iteration 4470/ 173500 | consumed samples: 1144320 | consumed tokens: 2343567360 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.650588E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.811 | TFLOPs: 31.00 | +7: iteration 4480/ 173500 | consumed samples: 1146880 | consumed tokens: 2348810240 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.635883E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.570 | TFLOPs: 31.46 | +7: iteration 4490/ 173500 | consumed samples: 1149440 | consumed tokens: 2354053120 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.643434E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.128 | TFLOPs: 31.28 | +7: iteration 4500/ 173500 | consumed samples: 1152000 | consumed tokens: 2359296000 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.647507E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.450 | TFLOPs: 31.98 | +7: iteration 4510/ 173500 | consumed samples: 1154560 | consumed tokens: 2364538880 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.627450E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.160 | TFLOPs: 31.75 | +7: iteration 4520/ 173500 | consumed samples: 1157120 | consumed tokens: 2369781760 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.639697E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.479 | TFLOPs: 31.56 | +7: iteration 4530/ 173500 | consumed samples: 1159680 | consumed tokens: 2375024640 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.630564E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.727 | TFLOPs: 31.15 | +7: iteration 4540/ 173500 | consumed samples: 1162240 | consumed tokens: 2380267520 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.633606E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.892 | TFLOPs: 31.69 | +7: iteration 4550/ 173500 | consumed samples: 1164800 | consumed tokens: 2385510400 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.634479E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.516 | TFLOPs: 31.40 | +7: iteration 4560/ 173500 | consumed samples: 1167360 | consumed tokens: 2390753280 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.621396E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.389 | TFLOPs: 31.50 | +7: iteration 4570/ 173500 | consumed samples: 1169920 | consumed tokens: 2395996160 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.637163E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.763 | TFLOPs: 31.36 | +7: iteration 4580/ 173500 | consumed samples: 1172480 | consumed tokens: 2401239040 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.637055E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.283 | TFLOPs: 30.97 | +7: iteration 4590/ 173500 | consumed samples: 1175040 | consumed tokens: 2406481920 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.626540E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.075 | TFLOPs: 31.38 | +7: iteration 4600/ 173500 | consumed samples: 1177600 | consumed tokens: 2411724800 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.626719E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.496 | TFLOPs: 31.72 | +7: iteration 4610/ 173500 | consumed samples: 1180160 | consumed tokens: 2416967680 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.630160E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.183 | TFLOPs: 31.39 | +7: iteration 4620/ 173500 | consumed samples: 1182720 | consumed tokens: 2422210560 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.621062E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.623 | TFLOPs: 31.62 | +7: iteration 4630/ 173500 | consumed samples: 1185280 | consumed tokens: 2427453440 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.620701E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.679 | TFLOPs: 31.78 | +7: iteration 4640/ 173500 | consumed samples: 1187840 | consumed tokens: 2432696320 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.619572E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.012 | TFLOPs: 31.11 | +7: iteration 4650/ 173500 | consumed samples: 1190400 | consumed tokens: 2437939200 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.608701E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.143 | TFLOPs: 31.54 | +7: iteration 4660/ 173500 | consumed samples: 1192960 | consumed tokens: 2443182080 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.631224E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.402 | TFLOPs: 31.76 | +7: iteration 4670/ 173500 | consumed samples: 1195520 | consumed tokens: 2448424960 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.632467E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.486 | TFLOPs: 30.88 | +7: iteration 4680/ 173500 | consumed samples: 1198080 | consumed tokens: 2453667840 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.607500E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.667 | TFLOPs: 31.57 | +7: iteration 4690/ 173500 | consumed samples: 1200640 | consumed tokens: 2458910720 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.617946E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.232 | TFLOPs: 31.76 | +7: iteration 4700/ 173500 | consumed samples: 1203200 | consumed tokens: 2464153600 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.605574E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.318 | TFLOPs: 31.24 | +7: iteration 4710/ 173500 | consumed samples: 1205760 | consumed tokens: 2469396480 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.611149E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.295 | TFLOPs: 31.29 | +7: iteration 4720/ 173500 | consumed samples: 1208320 | consumed tokens: 2474639360 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.610288E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.266 | TFLOPs: 31.18 | +7: iteration 4730/ 173500 | consumed samples: 1210880 | consumed tokens: 2479882240 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.621973E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.478 | TFLOPs: 30.56 | +7: iteration 4740/ 173500 | consumed samples: 1213440 | consumed tokens: 2485125120 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.606894E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.385 | TFLOPs: 31.40 | +7: iteration 4750/ 173500 | consumed samples: 1216000 | consumed tokens: 2490368000 | elapsed time per iteration (s): 0.45 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.612993E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.320 | TFLOPs: 30.08 | +7: iteration 4760/ 173500 | consumed samples: 1218560 | consumed tokens: 2495610880 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.598732E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.260 | TFLOPs: 31.91 | +7: iteration 4770/ 173500 | consumed samples: 1221120 | consumed tokens: 2500853760 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.612497E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.839 | TFLOPs: 31.73 | +7: iteration 4780/ 173500 | consumed samples: 1223680 | consumed tokens: 2506096640 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.629135E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.757 | TFLOPs: 31.68 | +7: iteration 4790/ 173500 | consumed samples: 1226240 | consumed tokens: 2511339520 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.609413E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.346 | TFLOPs: 31.97 | +7: iteration 4800/ 173500 | consumed samples: 1228800 | consumed tokens: 2516582400 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.619527E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.355 | TFLOPs: 31.71 | +7: iteration 4810/ 173500 | consumed samples: 1231360 | consumed tokens: 2521825280 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.621017E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.719 | TFLOPs: 30.73 | +7: iteration 4820/ 173500 | consumed samples: 1233920 | consumed tokens: 2527068160 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.617453E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.410 | TFLOPs: 31.66 | +7: iteration 4830/ 173500 | consumed samples: 1236480 | consumed tokens: 2532311040 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.605521E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.608 | TFLOPs: 30.78 | +7: iteration 4840/ 173500 | consumed samples: 1239040 | consumed tokens: 2537553920 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.608658E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.178 | TFLOPs: 31.44 | +7: iteration 4850/ 173500 | consumed samples: 1241600 | consumed tokens: 2542796800 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.609438E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.236 | TFLOPs: 31.55 | +7: iteration 4860/ 173500 | consumed samples: 1244160 | consumed tokens: 2548039680 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.607699E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.765 | TFLOPs: 31.68 | +7: iteration 4870/ 173500 | consumed samples: 1246720 | consumed tokens: 2553282560 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.605410E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.759 | TFLOPs: 31.47 | +7: iteration 4880/ 173500 | consumed samples: 1249280 | consumed tokens: 2558525440 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.610337E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.062 | TFLOPs: 30.91 | +7: iteration 4890/ 173500 | consumed samples: 1251840 | consumed tokens: 2563768320 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.585411E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.847 | TFLOPs: 31.16 | +7: iteration 4900/ 173500 | consumed samples: 1254400 | consumed tokens: 2569011200 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.613982E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.569 | TFLOPs: 32.04 | +7: iteration 4910/ 173500 | consumed samples: 1256960 | consumed tokens: 2574254080 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.602100E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.046 | TFLOPs: 31.12 | +7: iteration 4920/ 173500 | consumed samples: 1259520 | consumed tokens: 2579496960 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.596835E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.560 | TFLOPs: 31.30 | +7: iteration 4930/ 173500 | consumed samples: 1262080 | consumed tokens: 2584739840 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.597276E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.095 | TFLOPs: 31.75 | +7: iteration 4940/ 173500 | consumed samples: 1264640 | consumed tokens: 2589982720 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.593523E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.367 | TFLOPs: 32.02 | +7: iteration 4950/ 173500 | consumed samples: 1267200 | consumed tokens: 2595225600 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.602964E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.812 | TFLOPs: 31.58 | +7: iteration 4960/ 173500 | consumed samples: 1269760 | consumed tokens: 2600468480 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.601748E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.469 | TFLOPs: 31.35 | +7: iteration 4970/ 173500 | consumed samples: 1272320 | consumed tokens: 2605711360 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.592247E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.197 | TFLOPs: 31.96 | +7: iteration 4980/ 173500 | consumed samples: 1274880 | consumed tokens: 2610954240 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.579997E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.094 | TFLOPs: 31.49 | +7: iteration 4990/ 173500 | consumed samples: 1277440 | consumed tokens: 2616197120 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.587194E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.025 | TFLOPs: 31.69 | +7: iteration 5000/ 173500 | consumed samples: 1280000 | consumed tokens: 2621440000 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.603516E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.352 | TFLOPs: 31.34 | +7: iteration 5010/ 173500 | consumed samples: 1282560 | consumed tokens: 2626682880 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.592070E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.524 | TFLOPs: 31.56 | +7: iteration 5020/ 173500 | consumed samples: 1285120 | consumed tokens: 2631925760 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.587508E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.759 | TFLOPs: 31.52 | +7: iteration 5030/ 173500 | consumed samples: 1287680 | consumed tokens: 2637168640 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.579689E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.076 | TFLOPs: 31.96 | +7: iteration 5040/ 173500 | consumed samples: 1290240 | consumed tokens: 2642411520 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.581233E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.493 | TFLOPs: 31.45 | +7: iteration 5050/ 173500 | consumed samples: 1292800 | consumed tokens: 2647654400 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.595408E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.921 | TFLOPs: 31.74 | +7: iteration 5060/ 173500 | consumed samples: 1295360 | consumed tokens: 2652897280 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.580438E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.368 | TFLOPs: 30.92 | +7: iteration 5070/ 173500 | consumed samples: 1297920 | consumed tokens: 2658140160 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.587961E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.305 | TFLOPs: 31.60 | +7: iteration 5080/ 173500 | consumed samples: 1300480 | consumed tokens: 2663383040 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.580768E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.015 | TFLOPs: 31.53 | +7: iteration 5090/ 173500 | consumed samples: 1303040 | consumed tokens: 2668625920 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.582520E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.233 | TFLOPs: 31.91 | +7: iteration 5100/ 173500 | consumed samples: 1305600 | consumed tokens: 2673868800 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.581404E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.264 | TFLOPs: 31.65 | +7: iteration 5110/ 173500 | consumed samples: 1308160 | consumed tokens: 2679111680 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.590484E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.014 | TFLOPs: 31.53 | +7: iteration 5120/ 173500 | consumed samples: 1310720 | consumed tokens: 2684354560 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.589562E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.067 | TFLOPs: 31.01 | +7: iteration 5130/ 173500 | consumed samples: 1313280 | consumed tokens: 2689597440 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.586823E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.197 | TFLOPs: 31.65 | +7: iteration 5140/ 173500 | consumed samples: 1315840 | consumed tokens: 2694840320 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.592232E+00 | grad norm: 0.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.733 | TFLOPs: 30.68 | +7: iteration 5150/ 173500 | consumed samples: 1318400 | consumed tokens: 2700083200 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.576223E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.369 | TFLOPs: 31.71 | +7: iteration 5160/ 173500 | consumed samples: 1320960 | consumed tokens: 2705326080 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.572421E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.865 | TFLOPs: 31.74 | +7: iteration 5170/ 173500 | consumed samples: 1323520 | consumed tokens: 2710568960 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.583334E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.565 | TFLOPs: 31.56 | +7: iteration 5180/ 173500 | consumed samples: 1326080 | consumed tokens: 2715811840 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.578819E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.079 | TFLOPs: 30.91 | +7: iteration 5190/ 173500 | consumed samples: 1328640 | consumed tokens: 2721054720 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.576821E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.611 | TFLOPs: 31.51 | +7: iteration 5200/ 173500 | consumed samples: 1331200 | consumed tokens: 2726297600 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.566772E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.623 | TFLOPs: 31.25 | +7: iteration 5210/ 173500 | consumed samples: 1333760 | consumed tokens: 2731540480 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.571239E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.250 | TFLOPs: 31.49 | +7: iteration 5220/ 173500 | consumed samples: 1336320 | consumed tokens: 2736783360 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.578031E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.035 | TFLOPs: 31.80 | +7: iteration 5230/ 173500 | consumed samples: 1338880 | consumed tokens: 2742026240 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.568482E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.125 | TFLOPs: 31.96 | +7: iteration 5240/ 173500 | consumed samples: 1341440 | consumed tokens: 2747269120 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.567912E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.568 | TFLOPs: 31.35 | +7: iteration 5250/ 173500 | consumed samples: 1344000 | consumed tokens: 2752512000 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.569660E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.246 | TFLOPs: 31.97 | +7: iteration 5260/ 173500 | consumed samples: 1346560 | consumed tokens: 2757754880 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.576001E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.037 | TFLOPs: 31.54 | +7: iteration 5270/ 173500 | consumed samples: 1349120 | consumed tokens: 2762997760 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.576099E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.394 | TFLOPs: 31.55 | +7: iteration 5280/ 173500 | consumed samples: 1351680 | consumed tokens: 2768240640 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.568815E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.768 | TFLOPs: 31.84 | +7: iteration 5290/ 173500 | consumed samples: 1354240 | consumed tokens: 2773483520 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.574557E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.534 | TFLOPs: 31.19 | +7: iteration 5300/ 173500 | consumed samples: 1356800 | consumed tokens: 2778726400 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.553600E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.258 | TFLOPs: 31.76 | +7: iteration 5310/ 173500 | consumed samples: 1359360 | consumed tokens: 2783969280 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.567954E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.967 | TFLOPs: 31.43 | +7: iteration 5320/ 173500 | consumed samples: 1361920 | consumed tokens: 2789212160 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.565025E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.466 | TFLOPs: 31.77 | +7: iteration 5330/ 173500 | consumed samples: 1364480 | consumed tokens: 2794455040 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.573528E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.072 | TFLOPs: 31.43 | +7: iteration 5340/ 173500 | consumed samples: 1367040 | consumed tokens: 2799697920 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.563240E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.506 | TFLOPs: 31.67 | +7: iteration 5350/ 173500 | consumed samples: 1369600 | consumed tokens: 2804940800 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.562567E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.005 | TFLOPs: 31.17 | +7: iteration 5360/ 173500 | consumed samples: 1372160 | consumed tokens: 2810183680 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.567605E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.134 | TFLOPs: 31.17 | +7: iteration 5370/ 173500 | consumed samples: 1374720 | consumed tokens: 2815426560 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.558639E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.010 | TFLOPs: 31.85 | +7: iteration 5380/ 173500 | consumed samples: 1377280 | consumed tokens: 2820669440 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.564203E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.683 | TFLOPs: 31.52 | +7: iteration 5390/ 173500 | consumed samples: 1379840 | consumed tokens: 2825912320 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.560722E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.934 | TFLOPs: 30.74 | +7: iteration 5400/ 173500 | consumed samples: 1382400 | consumed tokens: 2831155200 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.560081E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.821 | TFLOPs: 31.84 | +7: iteration 5410/ 173500 | consumed samples: 1384960 | consumed tokens: 2836398080 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.555434E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.747 | TFLOPs: 31.63 | +7: iteration 5420/ 173500 | consumed samples: 1387520 | consumed tokens: 2841640960 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.551626E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.582 | TFLOPs: 30.57 | +7: iteration 5430/ 173500 | consumed samples: 1390080 | consumed tokens: 2846883840 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.565230E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.988 | TFLOPs: 30.27 | +7: iteration 5440/ 173500 | consumed samples: 1392640 | consumed tokens: 2852126720 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.557349E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.367 | TFLOPs: 31.66 | +7: iteration 5450/ 173500 | consumed samples: 1395200 | consumed tokens: 2857369600 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.562934E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.448 | TFLOPs: 30.93 | +7: iteration 5460/ 173500 | consumed samples: 1397760 | consumed tokens: 2862612480 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.544410E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.004 | TFLOPs: 31.95 | +7: iteration 5470/ 173500 | consumed samples: 1400320 | consumed tokens: 2867855360 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.555112E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.364 | TFLOPs: 31.97 | +7: iteration 5480/ 173500 | consumed samples: 1402880 | consumed tokens: 2873098240 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.560999E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.311 | TFLOPs: 31.97 | +7: iteration 5490/ 173500 | consumed samples: 1405440 | consumed tokens: 2878341120 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.559237E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.436 | TFLOPs: 31.24 | +7: iteration 5500/ 173500 | consumed samples: 1408000 | consumed tokens: 2883584000 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.562294E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.093 | TFLOPs: 31.80 | +7: iteration 5510/ 173500 | consumed samples: 1410560 | consumed tokens: 2888826880 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.539724E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.914 | TFLOPs: 31.74 | +7: iteration 5520/ 173500 | consumed samples: 1413120 | consumed tokens: 2894069760 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.556507E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.407 | TFLOPs: 31.82 | +7: iteration 5530/ 173500 | consumed samples: 1415680 | consumed tokens: 2899312640 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.551323E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.863 | TFLOPs: 31.63 | +7: iteration 5540/ 173500 | consumed samples: 1418240 | consumed tokens: 2904555520 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.562208E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.219 | TFLOPs: 31.39 | +7: iteration 5550/ 173500 | consumed samples: 1420800 | consumed tokens: 2909798400 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.543303E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.070 | TFLOPs: 31.80 | +7: iteration 5560/ 173500 | consumed samples: 1423360 | consumed tokens: 2915041280 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.541513E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.400 | TFLOPs: 31.40 | +7: iteration 5570/ 173500 | consumed samples: 1425920 | consumed tokens: 2920284160 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.544024E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.499 | TFLOPs: 31.72 | +7: iteration 5580/ 173500 | consumed samples: 1428480 | consumed tokens: 2925527040 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.546162E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.479 | TFLOPs: 31.56 | +7: iteration 5590/ 173500 | consumed samples: 1431040 | consumed tokens: 2930769920 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.565742E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.049 | TFLOPs: 31.75 | +7: iteration 5600/ 173500 | consumed samples: 1433600 | consumed tokens: 2936012800 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.541418E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.044 | TFLOPs: 31.38 | +7: iteration 5610/ 173500 | consumed samples: 1436160 | consumed tokens: 2941255680 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.540216E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.090 | TFLOPs: 31.75 | +7: iteration 5620/ 173500 | consumed samples: 1438720 | consumed tokens: 2946498560 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.527920E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.881 | TFLOPs: 30.53 | +7: iteration 5630/ 173500 | consumed samples: 1441280 | consumed tokens: 2951741440 | elapsed time per iteration (s): 0.45 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.534998E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.301 | TFLOPs: 29.87 | +7: iteration 5640/ 173500 | consumed samples: 1443840 | consumed tokens: 2956984320 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.547526E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.254 | TFLOPs: 30.76 | +7: iteration 5650/ 173500 | consumed samples: 1446400 | consumed tokens: 2962227200 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.542918E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.354 | TFLOPs: 31.29 | +7: iteration 5660/ 173500 | consumed samples: 1448960 | consumed tokens: 2967470080 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.555585E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.535 | TFLOPs: 30.30 | +7: iteration 5670/ 173500 | consumed samples: 1451520 | consumed tokens: 2972712960 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.539465E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.022 | TFLOPs: 30.64 | +7: iteration 5680/ 173500 | consumed samples: 1454080 | consumed tokens: 2977955840 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.539077E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.456 | TFLOPs: 31.40 | +7: iteration 5690/ 173500 | consumed samples: 1456640 | consumed tokens: 2983198720 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.527791E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.796 | TFLOPs: 31.99 | +7: iteration 5700/ 173500 | consumed samples: 1459200 | consumed tokens: 2988441600 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.542718E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.893 | TFLOPs: 31.53 | +7: iteration 5710/ 173500 | consumed samples: 1461760 | consumed tokens: 2993684480 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.535653E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.271 | TFLOPs: 31.34 | +7: iteration 5720/ 173500 | consumed samples: 1464320 | consumed tokens: 2998927360 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.535029E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.626 | TFLOPs: 31.99 | +7: iteration 5730/ 173500 | consumed samples: 1466880 | consumed tokens: 3004170240 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.541227E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.940 | TFLOPs: 31.95 | +7: iteration 5740/ 173500 | consumed samples: 1469440 | consumed tokens: 3009413120 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.525795E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.391 | TFLOPs: 31.97 | +7: iteration 5750/ 173500 | consumed samples: 1472000 | consumed tokens: 3014656000 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.535302E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.995 | TFLOPs: 31.69 | +7: iteration 5760/ 173500 | consumed samples: 1474560 | consumed tokens: 3019898880 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.543637E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.887 | TFLOPs: 31.79 | +7: iteration 5770/ 173500 | consumed samples: 1477120 | consumed tokens: 3025141760 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.542370E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.002 | TFLOPs: 31.85 | +7: iteration 5780/ 173500 | consumed samples: 1479680 | consumed tokens: 3030384640 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.537753E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.943 | TFLOPs: 31.32 | +7: iteration 5790/ 173500 | consumed samples: 1482240 | consumed tokens: 3035627520 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.537697E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.268 | TFLOPs: 30.87 | +7: iteration 5800/ 173500 | consumed samples: 1484800 | consumed tokens: 3040870400 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.542047E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.616 | TFLOPs: 31.99 | +7: iteration 5810/ 173500 | consumed samples: 1487360 | consumed tokens: 3046113280 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.535614E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.714 | TFLOPs: 31.94 | +7: iteration 5820/ 173500 | consumed samples: 1489920 | consumed tokens: 3051356160 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.531043E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.400 | TFLOPs: 31.76 | +7: iteration 5830/ 173500 | consumed samples: 1492480 | consumed tokens: 3056599040 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.534399E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.415 | TFLOPs: 31.45 | +7: iteration 5840/ 173500 | consumed samples: 1495040 | consumed tokens: 3061841920 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.534502E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.729 | TFLOPs: 31.94 | +7: iteration 5850/ 173500 | consumed samples: 1497600 | consumed tokens: 3067084800 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.540197E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.624 | TFLOPs: 31.51 | +7: iteration 5860/ 173500 | consumed samples: 1500160 | consumed tokens: 3072327680 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.525751E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.156 | TFLOPs: 31.91 | +7: iteration 5870/ 173500 | consumed samples: 1502720 | consumed tokens: 3077570560 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.534681E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.255 | TFLOPs: 31.65 | +7: iteration 5880/ 173500 | consumed samples: 1505280 | consumed tokens: 3082813440 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.521056E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.488 | TFLOPs: 31.72 | +7: iteration 5890/ 173500 | consumed samples: 1507840 | consumed tokens: 3088056320 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.527750E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.496 | TFLOPs: 31.77 | +7: iteration 5900/ 173500 | consumed samples: 1510400 | consumed tokens: 3093299200 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.523694E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.012 | TFLOPs: 31.90 | +7: iteration 5910/ 173500 | consumed samples: 1512960 | consumed tokens: 3098542080 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.526963E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.428 | TFLOPs: 31.35 | +7: iteration 5920/ 173500 | consumed samples: 1515520 | consumed tokens: 3103784960 | elapsed time per iteration (s): 0.44 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.525782E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.607 | TFLOPs: 30.83 | +7: iteration 5930/ 173500 | consumed samples: 1518080 | consumed tokens: 3109027840 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.532711E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.471 | TFLOPs: 31.61 | +7: iteration 5940/ 173500 | consumed samples: 1520640 | consumed tokens: 3114270720 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.522296E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.559 | TFLOPs: 31.93 | +7: iteration 5950/ 173500 | consumed samples: 1523200 | consumed tokens: 3119513600 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.528672E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.664 | TFLOPs: 31.52 | +7: iteration 5960/ 173500 | consumed samples: 1525760 | consumed tokens: 3124756480 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.527314E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.615 | TFLOPs: 31.51 | +7: iteration 5970/ 173500 | consumed samples: 1528320 | consumed tokens: 3129999360 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.523200E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.027 | TFLOPs: 31.48 | +7: iteration 5980/ 173500 | consumed samples: 1530880 | consumed tokens: 3135242240 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.520488E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.646 | TFLOPs: 31.93 | +7: iteration 5990/ 173500 | consumed samples: 1533440 | consumed tokens: 3140485120 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.517503E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.218 | TFLOPs: 31.91 | +0: [2023-03-16 23:55:17,625] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=0, lr=[0.0001997263111243839, 0.0001997263111243839, 0.0001997263111243839], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 6000/ 173500 | consumed samples: 1536000 | consumed tokens: 3145728000 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.515137E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.273 | TFLOPs: 31.55 | +0: steps: 6000 loss: 3.4787 iter time (s): 0.425 samples/sec: 602.314 +7: iteration 6010/ 173500 | consumed samples: 1538560 | consumed tokens: 3150970880 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.519645E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.575 | TFLOPs: 31.62 | +7: iteration 6020/ 173500 | consumed samples: 1541120 | consumed tokens: 3156213760 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.508930E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.286 | TFLOPs: 31.92 | +7: iteration 6030/ 173500 | consumed samples: 1543680 | consumed tokens: 3161456640 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.519581E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.731 | TFLOPs: 31.68 | +7: iteration 6040/ 173500 | consumed samples: 1546240 | consumed tokens: 3166699520 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.515918E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.240 | TFLOPs: 31.91 | +7: iteration 6050/ 173500 | consumed samples: 1548800 | consumed tokens: 3171942400 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.515849E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.048 | TFLOPs: 31.64 | +7: iteration 6060/ 173500 | consumed samples: 1551360 | consumed tokens: 3177185280 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.522321E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.005 | TFLOPs: 31.48 | +7: iteration 6070/ 173500 | consumed samples: 1553920 | consumed tokens: 3182428160 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.510226E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.645 | TFLOPs: 31.41 | +7: iteration 6080/ 173500 | consumed samples: 1556480 | consumed tokens: 3187671040 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.523573E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.454 | TFLOPs: 31.92 | +7: iteration 6090/ 173500 | consumed samples: 1559040 | consumed tokens: 3192913920 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.518907E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.623 | TFLOPs: 31.88 | +7: iteration 6100/ 173500 | consumed samples: 1561600 | consumed tokens: 3198156800 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.514256E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.251 | TFLOPs: 31.23 | +7: iteration 6110/ 173500 | consumed samples: 1564160 | consumed tokens: 3203399680 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.518419E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.520 | TFLOPs: 31.93 | +7: iteration 6120/ 173500 | consumed samples: 1566720 | consumed tokens: 3208642560 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.514887E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.665 | TFLOPs: 31.31 | +7: iteration 6130/ 173500 | consumed samples: 1569280 | consumed tokens: 3213885440 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.510272E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.186 | TFLOPs: 31.39 | +7: iteration 6140/ 173500 | consumed samples: 1571840 | consumed tokens: 3219128320 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.517900E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.328 | TFLOPs: 31.34 | +7: iteration 6150/ 173500 | consumed samples: 1574400 | consumed tokens: 3224371200 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.512019E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.249 | TFLOPs: 31.91 | +7: iteration 6160/ 173500 | consumed samples: 1576960 | consumed tokens: 3229614080 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.503505E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.316 | TFLOPs: 31.76 | +7: iteration 6170/ 173500 | consumed samples: 1579520 | consumed tokens: 3234856960 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.508780E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.588 | TFLOPs: 31.77 | +7: iteration 6180/ 173500 | consumed samples: 1582080 | consumed tokens: 3240099840 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.507213E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.817 | TFLOPs: 31.73 | +7: iteration 6190/ 173500 | consumed samples: 1584640 | consumed tokens: 3245342720 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.513003E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.809 | TFLOPs: 31.47 | +7: iteration 6200/ 173500 | consumed samples: 1587200 | consumed tokens: 3250585600 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.518428E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.906 | TFLOPs: 31.79 | +7: iteration 6210/ 173500 | consumed samples: 1589760 | consumed tokens: 3255828480 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.509009E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.042 | TFLOPs: 31.90 | +7: iteration 6220/ 173500 | consumed samples: 1592320 | consumed tokens: 3261071360 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.517261E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.751 | TFLOPs: 31.68 | +7: iteration 6230/ 173500 | consumed samples: 1594880 | consumed tokens: 3266314240 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.504768E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.425 | TFLOPs: 31.71 | +7: iteration 6240/ 173500 | consumed samples: 1597440 | consumed tokens: 3271557120 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.509812E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.594 | TFLOPs: 31.67 | +7: iteration 6250/ 173500 | consumed samples: 1600000 | consumed tokens: 3276800000 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.485074E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.220 | TFLOPs: 31.70 | +7: iteration 6260/ 173500 | consumed samples: 1602560 | consumed tokens: 3282042880 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.513326E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.459 | TFLOPs: 31.40 | +7: iteration 6270/ 173500 | consumed samples: 1605120 | consumed tokens: 3287285760 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.500428E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.522 | TFLOPs: 31.67 | +7: iteration 6280/ 173500 | consumed samples: 1607680 | consumed tokens: 3292528640 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.512084E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.304 | TFLOPs: 31.76 | +7: iteration 6290/ 173500 | consumed samples: 1610240 | consumed tokens: 3297771520 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.500275E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.261 | TFLOPs: 31.76 | +7: iteration 6300/ 173500 | consumed samples: 1612800 | consumed tokens: 3303014400 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.492566E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.112 | TFLOPs: 31.64 | +7: iteration 6310/ 173500 | consumed samples: 1615360 | consumed tokens: 3308257280 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.498654E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.917 | TFLOPs: 31.90 | +7: iteration 6320/ 173500 | consumed samples: 1617920 | consumed tokens: 3313500160 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.497669E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.836 | TFLOPs: 31.89 | +7: iteration 6330/ 173500 | consumed samples: 1620480 | consumed tokens: 3318743040 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.502014E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.985 | TFLOPs: 31.90 | +7: iteration 6340/ 173500 | consumed samples: 1623040 | consumed tokens: 3323985920 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.511024E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.106 | TFLOPs: 31.54 | +7: iteration 6350/ 173500 | consumed samples: 1625600 | consumed tokens: 3329228800 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.499797E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.620 | TFLOPs: 31.62 | +7: iteration 6360/ 173500 | consumed samples: 1628160 | consumed tokens: 3334471680 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.486452E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.886 | TFLOPs: 31.48 | +7: iteration 6370/ 173500 | consumed samples: 1630720 | consumed tokens: 3339714560 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.490891E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.881 | TFLOPs: 31.79 | +7: iteration 6380/ 173500 | consumed samples: 1633280 | consumed tokens: 3344957440 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.496385E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.031 | TFLOPs: 31.90 | +7: iteration 6390/ 173500 | consumed samples: 1635840 | consumed tokens: 3350200320 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.487823E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.045 | TFLOPs: 31.38 | +7: iteration 6400/ 173500 | consumed samples: 1638400 | consumed tokens: 3355443200 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.489455E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.366 | TFLOPs: 31.92 | +7: iteration 6410/ 173500 | consumed samples: 1640960 | consumed tokens: 3360686080 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.500586E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.716 | TFLOPs: 31.68 | +7: iteration 6420/ 173500 | consumed samples: 1643520 | consumed tokens: 3365928960 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.482755E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.429 | TFLOPs: 31.71 | +7: iteration 6430/ 173500 | consumed samples: 1646080 | consumed tokens: 3371171840 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.493610E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.650 | TFLOPs: 31.62 | +7: iteration 6440/ 173500 | consumed samples: 1648640 | consumed tokens: 3376414720 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.503334E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.182 | TFLOPs: 31.91 | +7: iteration 6450/ 173500 | consumed samples: 1651200 | consumed tokens: 3381657600 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.488897E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.236 | TFLOPs: 31.55 | +7: iteration 6460/ 173500 | consumed samples: 1653760 | consumed tokens: 3386900480 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.497587E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.410 | TFLOPs: 31.56 | +7: iteration 6470/ 173500 | consumed samples: 1656320 | consumed tokens: 3392143360 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.473328E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.165 | TFLOPs: 31.65 | +7: iteration 6480/ 173500 | consumed samples: 1658880 | consumed tokens: 3397386240 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.503131E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.645 | TFLOPs: 31.78 | +7: iteration 6490/ 173500 | consumed samples: 1661440 | consumed tokens: 3402629120 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.484864E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.926 | TFLOPs: 31.58 | +7: iteration 6500/ 173500 | consumed samples: 1664000 | consumed tokens: 3407872000 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.477653E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.253 | TFLOPs: 31.44 | +7: iteration 6510/ 173500 | consumed samples: 1666560 | consumed tokens: 3413114880 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.486652E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.258 | TFLOPs: 31.70 | +7: iteration 6520/ 173500 | consumed samples: 1669120 | consumed tokens: 3418357760 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.488947E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.662 | TFLOPs: 31.94 | +7: iteration 6530/ 173500 | consumed samples: 1671680 | consumed tokens: 3423600640 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.486799E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.371 | TFLOPs: 31.34 | +7: iteration 6540/ 173500 | consumed samples: 1674240 | consumed tokens: 3428843520 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.489621E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.887 | TFLOPs: 31.74 | +7: iteration 6550/ 173500 | consumed samples: 1676800 | consumed tokens: 3434086400 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.473443E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.759 | TFLOPs: 31.52 | +7: iteration 6560/ 173500 | consumed samples: 1679360 | consumed tokens: 3439329280 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.479617E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.593 | TFLOPs: 31.51 | +7: iteration 6570/ 173500 | consumed samples: 1681920 | consumed tokens: 3444572160 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.502148E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.691 | TFLOPs: 31.67 | +7: iteration 6580/ 173500 | consumed samples: 1684480 | consumed tokens: 3449815040 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.472604E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.809 | TFLOPs: 31.79 | +7: iteration 6590/ 173500 | consumed samples: 1687040 | consumed tokens: 3455057920 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.493138E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.568 | TFLOPs: 31.67 | +7: iteration 6600/ 173500 | consumed samples: 1689600 | consumed tokens: 3460300800 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.479844E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.259 | TFLOPs: 31.91 | +7: iteration 6610/ 173500 | consumed samples: 1692160 | consumed tokens: 3465543680 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.487031E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.086 | TFLOPs: 31.91 | +7: iteration 6620/ 173500 | consumed samples: 1694720 | consumed tokens: 3470786560 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.492185E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.819 | TFLOPs: 31.89 | +7: iteration 6630/ 173500 | consumed samples: 1697280 | consumed tokens: 3476029440 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.489804E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.097 | TFLOPs: 31.75 | +7: iteration 6640/ 173500 | consumed samples: 1699840 | consumed tokens: 3481272320 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.471788E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.614 | TFLOPs: 31.62 | +7: iteration 6650/ 173500 | consumed samples: 1702400 | consumed tokens: 3486515200 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.488688E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.923 | TFLOPs: 31.74 | +7: iteration 6660/ 173500 | consumed samples: 1704960 | consumed tokens: 3491758080 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.483530E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.147 | TFLOPs: 31.59 | +7: iteration 6670/ 173500 | consumed samples: 1707520 | consumed tokens: 3497000960 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.498820E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.308 | TFLOPs: 31.76 | +7: iteration 6680/ 173500 | consumed samples: 1710080 | consumed tokens: 3502243840 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.481286E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.192 | TFLOPs: 31.54 | +7: iteration 6690/ 173500 | consumed samples: 1712640 | consumed tokens: 3507486720 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.479290E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.703 | TFLOPs: 31.94 | +7: iteration 6700/ 173500 | consumed samples: 1715200 | consumed tokens: 3512729600 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.480068E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.858 | TFLOPs: 31.63 | +7: iteration 6710/ 173500 | consumed samples: 1717760 | consumed tokens: 3517972480 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.489353E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.233 | TFLOPs: 31.76 | +7: iteration 6720/ 173500 | consumed samples: 1720320 | consumed tokens: 3523215360 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.479927E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.890 | TFLOPs: 31.69 | +7: iteration 6730/ 173500 | consumed samples: 1722880 | consumed tokens: 3528458240 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.471534E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.070 | TFLOPs: 31.90 | +7: iteration 6740/ 173500 | consumed samples: 1725440 | consumed tokens: 3533701120 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.472935E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.740 | TFLOPs: 31.73 | +7: iteration 6750/ 173500 | consumed samples: 1728000 | consumed tokens: 3538944000 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.472895E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.926 | TFLOPs: 31.53 | +7: iteration 6760/ 173500 | consumed samples: 1730560 | consumed tokens: 3544186880 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.469511E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.628 | TFLOPs: 31.78 | +7: iteration 6770/ 173500 | consumed samples: 1733120 | consumed tokens: 3549429760 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.482986E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.517 | TFLOPs: 31.93 | +7: iteration 6780/ 173500 | consumed samples: 1735680 | consumed tokens: 3554672640 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.475810E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.069 | TFLOPs: 31.75 | +7: iteration 6790/ 173500 | consumed samples: 1738240 | consumed tokens: 3559915520 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.457126E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.622 | TFLOPs: 31.72 | +7: iteration 6800/ 173500 | consumed samples: 1740800 | consumed tokens: 3565158400 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.476709E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.718 | TFLOPs: 31.47 | +7: iteration 6810/ 173500 | consumed samples: 1743360 | consumed tokens: 3570401280 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.454260E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.425 | TFLOPs: 31.92 | +7: iteration 6820/ 173500 | consumed samples: 1745920 | consumed tokens: 3575644160 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.478461E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.070 | TFLOPs: 31.90 | +7: iteration 6830/ 173500 | consumed samples: 1748480 | consumed tokens: 3580887040 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.481667E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.338 | TFLOPs: 31.76 | +7: iteration 6840/ 173500 | consumed samples: 1751040 | consumed tokens: 3586129920 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.474242E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.149 | TFLOPs: 31.44 | +7: iteration 6850/ 173500 | consumed samples: 1753600 | consumed tokens: 3591372800 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.463092E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.204 | TFLOPs: 31.91 | +7: iteration 6860/ 173500 | consumed samples: 1756160 | consumed tokens: 3596615680 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.465506E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.259 | TFLOPs: 31.44 | +7: iteration 6870/ 173500 | consumed samples: 1758720 | consumed tokens: 3601858560 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.470591E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.873 | TFLOPs: 31.11 | +7: iteration 6880/ 173500 | consumed samples: 1761280 | consumed tokens: 3607101440 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.460854E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.638 | TFLOPs: 31.93 | +7: iteration 6890/ 173500 | consumed samples: 1763840 | consumed tokens: 3612344320 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.455239E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.024 | TFLOPs: 31.90 | +7: iteration 6900/ 173500 | consumed samples: 1766400 | consumed tokens: 3617587200 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.475474E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.410 | TFLOPs: 31.87 | +7: iteration 6910/ 173500 | consumed samples: 1768960 | consumed tokens: 3622830080 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.466082E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.845 | TFLOPs: 31.89 | +7: iteration 6920/ 173500 | consumed samples: 1771520 | consumed tokens: 3628072960 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.458086E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.854 | TFLOPs: 31.68 | +7: iteration 6930/ 173500 | consumed samples: 1774080 | consumed tokens: 3633315840 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.454092E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.399 | TFLOPs: 31.66 | +7: iteration 6940/ 173500 | consumed samples: 1776640 | consumed tokens: 3638558720 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.464841E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.620 | TFLOPs: 31.83 | +7: iteration 6950/ 173500 | consumed samples: 1779200 | consumed tokens: 3643801600 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.466809E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.035 | TFLOPs: 31.80 | +7: iteration 6960/ 173500 | consumed samples: 1781760 | consumed tokens: 3649044480 | elapsed time per iteration (s): 0.44 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.455545E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.327 | TFLOPs: 30.66 | +7: iteration 6970/ 173500 | consumed samples: 1784320 | consumed tokens: 3654287360 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.454422E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.633 | TFLOPs: 30.88 | +7: iteration 6980/ 173500 | consumed samples: 1786880 | consumed tokens: 3659530240 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.468490E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.075 | TFLOPs: 31.96 | +7: iteration 6990/ 173500 | consumed samples: 1789440 | consumed tokens: 3664773120 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.470431E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.659 | TFLOPs: 31.73 | +7: iteration 7000/ 173500 | consumed samples: 1792000 | consumed tokens: 3670016000 | elapsed time per iteration (s): 0.44 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.449409E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.021 | TFLOPs: 30.80 | +7: iteration 7010/ 173500 | consumed samples: 1794560 | consumed tokens: 3675258880 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.470418E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.130 | TFLOPs: 31.65 | +7: iteration 7020/ 173500 | consumed samples: 1797120 | consumed tokens: 3680501760 | elapsed time per iteration (s): 0.45 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.464132E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.037 | TFLOPs: 29.91 | +7: iteration 7030/ 173500 | consumed samples: 1799680 | consumed tokens: 3685744640 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.466728E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.375 | TFLOPs: 31.03 | +7: iteration 7040/ 173500 | consumed samples: 1802240 | consumed tokens: 3690987520 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.458816E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.433 | TFLOPs: 31.98 | +7: iteration 7050/ 173500 | consumed samples: 1804800 | consumed tokens: 3696230400 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.471156E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.812 | TFLOPs: 31.89 | +7: iteration 7060/ 173500 | consumed samples: 1807360 | consumed tokens: 3701473280 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.462133E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.897 | TFLOPs: 31.79 | +7: iteration 7070/ 173500 | consumed samples: 1809920 | consumed tokens: 3706716160 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.473428E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.438 | TFLOPs: 31.92 | +7: iteration 7080/ 173500 | consumed samples: 1812480 | consumed tokens: 3711959040 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.447355E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.409 | TFLOPs: 31.92 | +7: iteration 7090/ 173500 | consumed samples: 1815040 | consumed tokens: 3717201920 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.469070E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.435 | TFLOPs: 31.92 | +7: iteration 7100/ 173500 | consumed samples: 1817600 | consumed tokens: 3722444800 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.441280E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.265 | TFLOPs: 31.91 | +7: iteration 7110/ 173500 | consumed samples: 1820160 | consumed tokens: 3727687680 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.458254E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.393 | TFLOPs: 31.76 | +7: iteration 7120/ 173500 | consumed samples: 1822720 | consumed tokens: 3732930560 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.452705E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.355 | TFLOPs: 31.71 | +7: iteration 7130/ 173500 | consumed samples: 1825280 | consumed tokens: 3738173440 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.457594E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.348 | TFLOPs: 31.92 | +7: iteration 7140/ 173500 | consumed samples: 1827840 | consumed tokens: 3743416320 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.447944E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.199 | TFLOPs: 31.91 | +7: iteration 7150/ 173500 | consumed samples: 1830400 | consumed tokens: 3748659200 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.452124E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.261 | TFLOPs: 31.91 | +7: iteration 7160/ 173500 | consumed samples: 1832960 | consumed tokens: 3753902080 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.458780E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.404 | TFLOPs: 31.61 | +7: iteration 7170/ 173500 | consumed samples: 1835520 | consumed tokens: 3759144960 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.457995E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.992 | TFLOPs: 31.74 | +7: iteration 7180/ 173500 | consumed samples: 1838080 | consumed tokens: 3764387840 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.451122E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.360 | TFLOPs: 31.92 | +7: iteration 7190/ 173500 | consumed samples: 1840640 | consumed tokens: 3769630720 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.455958E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.093 | TFLOPs: 31.64 | +7: iteration 7200/ 173500 | consumed samples: 1843200 | consumed tokens: 3774873600 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.438208E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.267 | TFLOPs: 31.91 | +7: iteration 7210/ 173500 | consumed samples: 1845760 | consumed tokens: 3780116480 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.445604E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.290 | TFLOPs: 31.71 | +7: iteration 7220/ 173500 | consumed samples: 1848320 | consumed tokens: 3785359360 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.450839E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.846 | TFLOPs: 31.53 | +7: iteration 7230/ 173500 | consumed samples: 1850880 | consumed tokens: 3790602240 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.457647E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.682 | TFLOPs: 31.78 | +7: iteration 7240/ 173500 | consumed samples: 1853440 | consumed tokens: 3795845120 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.447031E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.530 | TFLOPs: 31.77 | +7: iteration 7250/ 173500 | consumed samples: 1856000 | consumed tokens: 3801088000 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.446062E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.594 | TFLOPs: 31.35 | +7: iteration 7260/ 173500 | consumed samples: 1858560 | consumed tokens: 3806330880 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.445883E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.756 | TFLOPs: 31.57 | +7: iteration 7270/ 173500 | consumed samples: 1861120 | consumed tokens: 3811573760 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.444911E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.193 | TFLOPs: 31.91 | +7: iteration 7280/ 173500 | consumed samples: 1863680 | consumed tokens: 3816816640 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.446991E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.080 | TFLOPs: 31.28 | +7: iteration 7290/ 173500 | consumed samples: 1866240 | consumed tokens: 3822059520 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.459788E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.107 | TFLOPs: 31.75 | +7: iteration 7300/ 173500 | consumed samples: 1868800 | consumed tokens: 3827302400 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.457791E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.683 | TFLOPs: 31.78 | +7: iteration 7310/ 173500 | consumed samples: 1871360 | consumed tokens: 3832545280 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.436703E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.312 | TFLOPs: 31.92 | +7: iteration 7320/ 173500 | consumed samples: 1873920 | consumed tokens: 3837788160 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.429761E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.100 | TFLOPs: 31.49 | +7: iteration 7330/ 173500 | consumed samples: 1876480 | consumed tokens: 3843031040 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.442203E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.174 | TFLOPs: 31.44 | +7: iteration 7340/ 173500 | consumed samples: 1879040 | consumed tokens: 3848273920 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.434850E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.343 | TFLOPs: 31.50 | +7: iteration 7350/ 173500 | consumed samples: 1881600 | consumed tokens: 3853516800 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.430617E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.837 | TFLOPs: 31.73 | +7: iteration 7360/ 173500 | consumed samples: 1884160 | consumed tokens: 3858759680 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.447298E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.796 | TFLOPs: 31.73 | +7: iteration 7370/ 173500 | consumed samples: 1886720 | consumed tokens: 3864002560 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.441492E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.302 | TFLOPs: 31.92 | +7: iteration 7380/ 173500 | consumed samples: 1889280 | consumed tokens: 3869245440 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.455555E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.542 | TFLOPs: 31.72 | +7: iteration 7390/ 173500 | consumed samples: 1891840 | consumed tokens: 3874488320 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.443375E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.562 | TFLOPs: 31.72 | +7: iteration 7400/ 173500 | consumed samples: 1894400 | consumed tokens: 3879731200 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.453314E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.514 | TFLOPs: 31.93 | +7: iteration 7410/ 173500 | consumed samples: 1896960 | consumed tokens: 3884974080 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.442498E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.183 | TFLOPs: 31.91 | +7: iteration 7420/ 173500 | consumed samples: 1899520 | consumed tokens: 3890216960 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.444591E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.265 | TFLOPs: 31.91 | +7: iteration 7430/ 173500 | consumed samples: 1902080 | consumed tokens: 3895459840 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.435830E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.278 | TFLOPs: 31.92 | +7: iteration 7440/ 173500 | consumed samples: 1904640 | consumed tokens: 3900702720 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.435516E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.753 | TFLOPs: 31.31 | +7: iteration 7450/ 173500 | consumed samples: 1907200 | consumed tokens: 3905945600 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.426963E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.169 | TFLOPs: 31.70 | +7: iteration 7460/ 173500 | consumed samples: 1909760 | consumed tokens: 3911188480 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.433368E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.472 | TFLOPs: 31.77 | +7: iteration 7470/ 173500 | consumed samples: 1912320 | consumed tokens: 3916431360 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.435811E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.334 | TFLOPs: 31.66 | +7: iteration 7480/ 173500 | consumed samples: 1914880 | consumed tokens: 3921674240 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.433196E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.559 | TFLOPs: 31.77 | +7: iteration 7490/ 173500 | consumed samples: 1917440 | consumed tokens: 3926917120 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.441533E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.354 | TFLOPs: 31.92 | +7: iteration 7500/ 173500 | consumed samples: 1920000 | consumed tokens: 3932160000 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.432957E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.130 | TFLOPs: 31.65 | +7: iteration 7510/ 173500 | consumed samples: 1922560 | consumed tokens: 3937402880 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.442390E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.269 | TFLOPs: 31.91 | +7: iteration 7520/ 173500 | consumed samples: 1925120 | consumed tokens: 3942645760 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.428591E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.158 | TFLOPs: 31.70 | +7: iteration 7530/ 173500 | consumed samples: 1927680 | consumed tokens: 3947888640 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.432850E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.368 | TFLOPs: 31.92 | +7: iteration 7540/ 173500 | consumed samples: 1930240 | consumed tokens: 3953131520 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.436300E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.736 | TFLOPs: 31.52 | +7: iteration 7550/ 173500 | consumed samples: 1932800 | consumed tokens: 3958374400 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.176420E+00 | grad norm: 7.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.497 | TFLOPs: 31.82 | +7: iteration 7560/ 173500 | consumed samples: 1935360 | consumed tokens: 3963617280 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.126638E+00 | grad norm: 3.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.307 | TFLOPs: 31.44 | +7: iteration 7570/ 173500 | consumed samples: 1937920 | consumed tokens: 3968860160 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.750418E+00 | grad norm: 0.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.390 | TFLOPs: 31.55 | +7: iteration 7580/ 173500 | consumed samples: 1940480 | consumed tokens: 3974103040 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.602849E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.299 | TFLOPs: 31.60 | +7: iteration 7590/ 173500 | consumed samples: 1943040 | consumed tokens: 3979345920 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.516285E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.142 | TFLOPs: 31.70 | +7: iteration 7600/ 173500 | consumed samples: 1945600 | consumed tokens: 3984588800 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.481142E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.186 | TFLOPs: 31.44 | +7: iteration 7610/ 173500 | consumed samples: 1948160 | consumed tokens: 3989831680 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.495148E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.391 | TFLOPs: 31.76 | +7: iteration 7620/ 173500 | consumed samples: 1950720 | consumed tokens: 3995074560 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.456691E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.964 | TFLOPs: 31.16 | +7: iteration 7630/ 173500 | consumed samples: 1953280 | consumed tokens: 4000317440 | elapsed time per iteration (s): 0.45 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.465451E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.158 | TFLOPs: 29.97 | +7: iteration 7640/ 173500 | consumed samples: 1955840 | consumed tokens: 4005560320 | elapsed time per iteration (s): 0.44 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.453983E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.892 | TFLOPs: 30.32 | +7: iteration 7650/ 173500 | consumed samples: 1958400 | consumed tokens: 4010803200 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.442947E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.581 | TFLOPs: 31.35 | +7: iteration 7660/ 173500 | consumed samples: 1960960 | consumed tokens: 4016046080 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.447602E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.844 | TFLOPs: 31.63 | +7: iteration 7670/ 173500 | consumed samples: 1963520 | consumed tokens: 4021288960 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.441680E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.144 | TFLOPs: 31.75 | +7: iteration 7680/ 173500 | consumed samples: 1966080 | consumed tokens: 4026531840 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.444854E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.777 | TFLOPs: 31.26 | +7: iteration 7690/ 173500 | consumed samples: 1968640 | consumed tokens: 4031774720 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.437954E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.128 | TFLOPs: 31.75 | +7: iteration 7700/ 173500 | consumed samples: 1971200 | consumed tokens: 4037017600 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.437012E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.628 | TFLOPs: 31.93 | +7: iteration 7710/ 173500 | consumed samples: 1973760 | consumed tokens: 4042260480 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.462436E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.840 | TFLOPs: 31.84 | +7: iteration 7720/ 173500 | consumed samples: 1976320 | consumed tokens: 4047503360 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.439871E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.073 | TFLOPs: 31.75 | +7: iteration 7730/ 173500 | consumed samples: 1978880 | consumed tokens: 4052746240 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.441194E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.409 | TFLOPs: 31.71 | +7: iteration 7740/ 173500 | consumed samples: 1981440 | consumed tokens: 4057989120 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.424305E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.852 | TFLOPs: 31.95 | +7: iteration 7750/ 173500 | consumed samples: 1984000 | consumed tokens: 4063232000 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.421161E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.703 | TFLOPs: 31.94 | +7: iteration 7760/ 173500 | consumed samples: 1986560 | consumed tokens: 4068474880 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.440883E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.953 | TFLOPs: 31.64 | +7: iteration 7770/ 173500 | consumed samples: 1989120 | consumed tokens: 4073717760 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.425238E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.767 | TFLOPs: 31.78 | +7: iteration 7780/ 173500 | consumed samples: 1991680 | consumed tokens: 4078960640 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.427966E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.880 | TFLOPs: 31.58 | +7: iteration 7790/ 173500 | consumed samples: 1994240 | consumed tokens: 4084203520 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.434549E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.201 | TFLOPs: 31.75 | +7: iteration 7800/ 173500 | consumed samples: 1996800 | consumed tokens: 4089446400 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.425456E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.863 | TFLOPs: 31.95 | +7: iteration 7810/ 173500 | consumed samples: 1999360 | consumed tokens: 4094689280 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.431676E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.441 | TFLOPs: 31.82 | +7: iteration 7820/ 173500 | consumed samples: 2001920 | consumed tokens: 4099932160 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.435063E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.642 | TFLOPs: 31.72 | +7: iteration 7830/ 173500 | consumed samples: 2004480 | consumed tokens: 4105175040 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.445181E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.537 | TFLOPs: 31.93 | +7: iteration 7840/ 173500 | consumed samples: 2007040 | consumed tokens: 4110417920 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.418851E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.641 | TFLOPs: 31.57 | +7: iteration 7850/ 173500 | consumed samples: 2009600 | consumed tokens: 4115660800 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.431491E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.001 | TFLOPs: 31.69 | +7: iteration 7860/ 173500 | consumed samples: 2012160 | consumed tokens: 4120903680 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.441256E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.087 | TFLOPs: 31.75 | +7: iteration 7870/ 173500 | consumed samples: 2014720 | consumed tokens: 4126146560 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.443703E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.921 | TFLOPs: 31.84 | +7: iteration 7880/ 173500 | consumed samples: 2017280 | consumed tokens: 4131389440 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.423123E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.186 | TFLOPs: 31.75 | +7: iteration 7890/ 173500 | consumed samples: 2019840 | consumed tokens: 4136632320 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.433937E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.230 | TFLOPs: 31.91 | +7: iteration 7900/ 173500 | consumed samples: 2022400 | consumed tokens: 4141875200 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.411023E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.149 | TFLOPs: 31.91 | +7: iteration 7910/ 173500 | consumed samples: 2024960 | consumed tokens: 4147118080 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.415625E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.030 | TFLOPs: 31.90 | +7: iteration 7920/ 173500 | consumed samples: 2027520 | consumed tokens: 4152360960 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.413707E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.747 | TFLOPs: 31.94 | +7: iteration 7930/ 173500 | consumed samples: 2030080 | consumed tokens: 4157603840 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.427390E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.602 | TFLOPs: 31.51 | +7: iteration 7940/ 173500 | consumed samples: 2032640 | consumed tokens: 4162846720 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.437177E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.151 | TFLOPs: 31.91 | +7: iteration 7950/ 173500 | consumed samples: 2035200 | consumed tokens: 4168089600 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.427810E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.796 | TFLOPs: 31.68 | +7: iteration 7960/ 173500 | consumed samples: 2037760 | consumed tokens: 4173332480 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.432201E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.589 | TFLOPs: 31.93 | +7: iteration 7970/ 173500 | consumed samples: 2040320 | consumed tokens: 4178575360 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.407401E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.936 | TFLOPs: 31.90 | +7: iteration 7980/ 173500 | consumed samples: 2042880 | consumed tokens: 4183818240 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.423203E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.172 | TFLOPs: 31.86 | +7: iteration 7990/ 173500 | consumed samples: 2045440 | consumed tokens: 4189061120 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.435278E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.405 | TFLOPs: 31.92 | +0: [2023-03-17 00:09:25,774] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=0, lr=[0.00019940979012929202, 0.00019940979012929202, 0.00019940979012929202], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 8000/ 173500 | consumed samples: 2048000 | consumed tokens: 4194304000 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.437437E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.966 | TFLOPs: 31.06 | +0: steps: 8000 loss: 3.4537 iter time (s): 0.422 samples/sec: 606.947 +7: iteration 8010/ 173500 | consumed samples: 2050560 | consumed tokens: 4199546880 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.409982E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.395 | TFLOPs: 31.45 | +7: iteration 8020/ 173500 | consumed samples: 2053120 | consumed tokens: 4204789760 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.423544E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.126 | TFLOPs: 31.85 | +7: iteration 8030/ 173500 | consumed samples: 2055680 | consumed tokens: 4210032640 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.412918E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.523 | TFLOPs: 31.93 | +7: iteration 8040/ 173500 | consumed samples: 2058240 | consumed tokens: 4215275520 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.424656E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.426 | TFLOPs: 31.08 | +7: iteration 8050/ 173500 | consumed samples: 2060800 | consumed tokens: 4220518400 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.426939E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.209 | TFLOPs: 31.70 | +7: iteration 8060/ 173500 | consumed samples: 2063360 | consumed tokens: 4225761280 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.416831E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.606 | TFLOPs: 31.93 | +7: iteration 8070/ 173500 | consumed samples: 2065920 | consumed tokens: 4231004160 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.425257E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.430 | TFLOPs: 31.92 | +7: iteration 8080/ 173500 | consumed samples: 2068480 | consumed tokens: 4236247040 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.421726E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.461 | TFLOPs: 31.61 | +7: iteration 8090/ 173500 | consumed samples: 2071040 | consumed tokens: 4241489920 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.414767E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.879 | TFLOPs: 31.95 | +7: iteration 8100/ 173500 | consumed samples: 2073600 | consumed tokens: 4246732800 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.412912E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.271 | TFLOPs: 31.71 | +7: iteration 8110/ 173500 | consumed samples: 2076160 | consumed tokens: 4251975680 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.411168E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.147 | TFLOPs: 31.91 | +7: iteration 8120/ 173500 | consumed samples: 2078720 | consumed tokens: 4257218560 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.420653E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.550 | TFLOPs: 31.93 | +7: iteration 8130/ 173500 | consumed samples: 2081280 | consumed tokens: 4262461440 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.414525E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.199 | TFLOPs: 31.91 | +7: iteration 8140/ 173500 | consumed samples: 2083840 | consumed tokens: 4267704320 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.411960E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.931 | TFLOPs: 31.90 | +7: iteration 8150/ 173500 | consumed samples: 2086400 | consumed tokens: 4272947200 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.408881E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.050 | TFLOPs: 31.59 | +7: iteration 8160/ 173500 | consumed samples: 2088960 | consumed tokens: 4278190080 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.421377E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.107 | TFLOPs: 31.91 | +7: iteration 8170/ 173500 | consumed samples: 2091520 | consumed tokens: 4283432960 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.399721E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.843 | TFLOPs: 31.89 | +7: iteration 8180/ 173500 | consumed samples: 2094080 | consumed tokens: 4288675840 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.401662E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.721 | TFLOPs: 31.89 | +7: iteration 8190/ 173500 | consumed samples: 2096640 | consumed tokens: 4293918720 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.421748E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.045 | TFLOPs: 31.69 | +7: iteration 8200/ 173500 | consumed samples: 2099200 | consumed tokens: 4299161600 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.393900E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.904 | TFLOPs: 31.90 | +7: iteration 8210/ 173500 | consumed samples: 2101760 | consumed tokens: 4304404480 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.410690E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.591 | TFLOPs: 31.88 | +7: iteration 8220/ 173500 | consumed samples: 2104320 | consumed tokens: 4309647360 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.414074E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.927 | TFLOPs: 31.90 | +7: iteration 8230/ 173500 | consumed samples: 2106880 | consumed tokens: 4314890240 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.415121E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.101 | TFLOPs: 31.91 | +7: iteration 8240/ 173500 | consumed samples: 2109440 | consumed tokens: 4320133120 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.401048E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.809 | TFLOPs: 31.52 | +7: iteration 8250/ 173500 | consumed samples: 2112000 | consumed tokens: 4325376000 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.412582E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.171 | TFLOPs: 31.44 | +7: iteration 8260/ 173500 | consumed samples: 2114560 | consumed tokens: 4330618880 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.400811E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.524 | TFLOPs: 31.72 | +7: iteration 8270/ 173500 | consumed samples: 2117120 | consumed tokens: 4335861760 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.398669E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.896 | TFLOPs: 31.95 | +7: iteration 8280/ 173500 | consumed samples: 2119680 | consumed tokens: 4341104640 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.405918E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.212 | TFLOPs: 30.97 | +7: iteration 8290/ 173500 | consumed samples: 2122240 | consumed tokens: 4346347520 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.403075E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.888 | TFLOPs: 31.95 | +7: iteration 8300/ 173500 | consumed samples: 2124800 | consumed tokens: 4351590400 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.408847E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.647 | TFLOPs: 31.93 | +7: iteration 8310/ 173500 | consumed samples: 2127360 | consumed tokens: 4356833280 | elapsed time per iteration (s): 0.44 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.416117E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.494 | TFLOPs: 30.56 | +7: iteration 8320/ 173500 | consumed samples: 2129920 | consumed tokens: 4362076160 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.419902E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.704 | TFLOPs: 31.52 | +7: iteration 8330/ 173500 | consumed samples: 2132480 | consumed tokens: 4367319040 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.401550E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.324 | TFLOPs: 31.97 | +7: iteration 8340/ 173500 | consumed samples: 2135040 | consumed tokens: 4372561920 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.391754E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.569 | TFLOPs: 31.93 | +7: iteration 8350/ 173500 | consumed samples: 2137600 | consumed tokens: 4377804800 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.408497E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.547 | TFLOPs: 31.93 | +7: iteration 8360/ 173500 | consumed samples: 2140160 | consumed tokens: 4383047680 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.401530E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.659 | TFLOPs: 31.94 | +7: iteration 8370/ 173500 | consumed samples: 2142720 | consumed tokens: 4388290560 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.410896E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.316 | TFLOPs: 31.92 | +7: iteration 8380/ 173500 | consumed samples: 2145280 | consumed tokens: 4393533440 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.404339E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.400 | TFLOPs: 31.92 | +7: iteration 8390/ 173500 | consumed samples: 2147840 | consumed tokens: 4398776320 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.415316E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.154 | TFLOPs: 31.91 | +7: iteration 8400/ 173500 | consumed samples: 2150400 | consumed tokens: 4404019200 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.386517E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.117 | TFLOPs: 31.91 | +7: iteration 8410/ 173500 | consumed samples: 2152960 | consumed tokens: 4409262080 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.403752E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.936 | TFLOPs: 31.90 | +7: iteration 8420/ 173500 | consumed samples: 2155520 | consumed tokens: 4414504960 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.395428E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.991 | TFLOPs: 31.90 | +7: iteration 8430/ 173500 | consumed samples: 2158080 | consumed tokens: 4419747840 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.401073E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.384 | TFLOPs: 31.92 | +7: iteration 8440/ 173500 | consumed samples: 2160640 | consumed tokens: 4424990720 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.399099E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.882 | TFLOPs: 31.63 | +7: iteration 8450/ 173500 | consumed samples: 2163200 | consumed tokens: 4430233600 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.394401E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.670 | TFLOPs: 31.94 | +7: iteration 8460/ 173500 | consumed samples: 2165760 | consumed tokens: 4435476480 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.404714E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.914 | TFLOPs: 31.63 | +7: iteration 8470/ 173500 | consumed samples: 2168320 | consumed tokens: 4440719360 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.399814E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.141 | TFLOPs: 31.33 | +7: iteration 8480/ 173500 | consumed samples: 2170880 | consumed tokens: 4445962240 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.390930E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.501 | TFLOPs: 31.77 | +7: iteration 8490/ 173500 | consumed samples: 2173440 | consumed tokens: 4451205120 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.400038E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.893 | TFLOPs: 31.58 | +7: iteration 8500/ 173500 | consumed samples: 2176000 | consumed tokens: 4456448000 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.410769E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.064 | TFLOPs: 31.96 | +7: iteration 8510/ 173500 | consumed samples: 2178560 | consumed tokens: 4461690880 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.386477E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.221 | TFLOPs: 31.91 | +7: iteration 8520/ 173500 | consumed samples: 2181120 | consumed tokens: 4466933760 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.390789E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.446 | TFLOPs: 31.92 | +7: iteration 8530/ 173500 | consumed samples: 2183680 | consumed tokens: 4472176640 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.397664E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.808 | TFLOPs: 31.68 | +7: iteration 8540/ 173500 | consumed samples: 2186240 | consumed tokens: 4477419520 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.394287E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.564 | TFLOPs: 31.93 | +7: iteration 8550/ 173500 | consumed samples: 2188800 | consumed tokens: 4482662400 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.402592E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.368 | TFLOPs: 31.29 | +7: iteration 8560/ 173500 | consumed samples: 2191360 | consumed tokens: 4487905280 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.402105E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.554 | TFLOPs: 31.93 | +7: iteration 8570/ 173500 | consumed samples: 2193920 | consumed tokens: 4493148160 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.394811E+00 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.317 | TFLOPs: 31.92 | +7: iteration 8580/ 173500 | consumed samples: 2196480 | consumed tokens: 4498391040 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.400794E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.784 | TFLOPs: 31.89 | +7: iteration 8590/ 173500 | consumed samples: 2199040 | consumed tokens: 4503633920 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.399690E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.665 | TFLOPs: 31.88 | +7: iteration 8600/ 173500 | consumed samples: 2201600 | consumed tokens: 4508876800 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.394592E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.801 | TFLOPs: 31.63 | +7: iteration 8610/ 173500 | consumed samples: 2204160 | consumed tokens: 4514119680 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.398164E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.735 | TFLOPs: 31.62 | +7: iteration 8620/ 173500 | consumed samples: 2206720 | consumed tokens: 4519362560 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.402287E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.724 | TFLOPs: 31.62 | +7: iteration 8630/ 173500 | consumed samples: 2209280 | consumed tokens: 4524605440 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.394585E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.969 | TFLOPs: 31.74 | +7: iteration 8640/ 173500 | consumed samples: 2211840 | consumed tokens: 4529848320 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.398690E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.936 | TFLOPs: 31.95 | +7: iteration 8650/ 173500 | consumed samples: 2214400 | consumed tokens: 4535091200 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.412601E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.689 | TFLOPs: 31.94 | +7: iteration 8660/ 173500 | consumed samples: 2216960 | consumed tokens: 4540334080 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.380767E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.807 | TFLOPs: 31.94 | +7: iteration 8670/ 173500 | consumed samples: 2219520 | consumed tokens: 4545576960 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.407200E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.567 | TFLOPs: 31.93 | +7: iteration 8680/ 173500 | consumed samples: 2222080 | consumed tokens: 4550819840 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.404330E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.257 | TFLOPs: 31.91 | +7: iteration 8690/ 173500 | consumed samples: 2224640 | consumed tokens: 4556062720 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.392862E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.679 | TFLOPs: 31.73 | +7: iteration 8700/ 173500 | consumed samples: 2227200 | consumed tokens: 4561305600 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.378944E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.415 | TFLOPs: 31.87 | +7: iteration 8710/ 173500 | consumed samples: 2229760 | consumed tokens: 4566548480 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.389437E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.501 | TFLOPs: 31.93 | +7: iteration 8720/ 173500 | consumed samples: 2232320 | consumed tokens: 4571791360 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.384852E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.986 | TFLOPs: 31.01 | +7: iteration 8730/ 173500 | consumed samples: 2234880 | consumed tokens: 4577034240 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.382086E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.156 | TFLOPs: 31.54 | +7: iteration 8740/ 173500 | consumed samples: 2237440 | consumed tokens: 4582277120 | elapsed time per iteration (s): 0.44 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.387656E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.062 | TFLOPs: 30.85 | +7: iteration 8750/ 173500 | consumed samples: 2240000 | consumed tokens: 4587520000 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.395347E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.167 | TFLOPs: 31.80 | +7: iteration 8760/ 173500 | consumed samples: 2242560 | consumed tokens: 4592762880 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.390272E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.742 | TFLOPs: 31.68 | +7: iteration 8770/ 173500 | consumed samples: 2245120 | consumed tokens: 4598005760 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.390100E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.370 | TFLOPs: 31.66 | +7: iteration 8780/ 173500 | consumed samples: 2247680 | consumed tokens: 4603248640 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.407413E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.039 | TFLOPs: 31.85 | +7: iteration 8790/ 173500 | consumed samples: 2250240 | consumed tokens: 4608491520 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.388912E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.748 | TFLOPs: 31.73 | +7: iteration 8800/ 173500 | consumed samples: 2252800 | consumed tokens: 4613734400 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.380102E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.289 | TFLOPs: 31.71 | +7: iteration 8810/ 173500 | consumed samples: 2255360 | consumed tokens: 4618977280 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.385089E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.253 | TFLOPs: 31.76 | +7: iteration 8820/ 173500 | consumed samples: 2257920 | consumed tokens: 4624220160 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.387591E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.104 | TFLOPs: 31.96 | +7: iteration 8830/ 173500 | consumed samples: 2260480 | consumed tokens: 4629463040 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.395258E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.065 | TFLOPs: 31.96 | +7: iteration 8840/ 173500 | consumed samples: 2263040 | consumed tokens: 4634705920 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.388433E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.993 | TFLOPs: 31.80 | +7: iteration 8850/ 173500 | consumed samples: 2265600 | consumed tokens: 4639948800 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.388814E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.641 | TFLOPs: 31.83 | +7: iteration 8860/ 173500 | consumed samples: 2268160 | consumed tokens: 4645191680 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.392928E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.789 | TFLOPs: 31.57 | +7: iteration 8870/ 173500 | consumed samples: 2270720 | consumed tokens: 4650434560 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.384664E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.747 | TFLOPs: 31.89 | +7: iteration 8880/ 173500 | consumed samples: 2273280 | consumed tokens: 4655677440 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.386871E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.992 | TFLOPs: 31.95 | +7: iteration 8890/ 173500 | consumed samples: 2275840 | consumed tokens: 4660920320 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.395275E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.274 | TFLOPs: 31.92 | +7: iteration 8900/ 173500 | consumed samples: 2278400 | consumed tokens: 4666163200 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.391332E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.454 | TFLOPs: 31.92 | +7: iteration 8910/ 173500 | consumed samples: 2280960 | consumed tokens: 4671406080 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.381759E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.258 | TFLOPs: 31.91 | +7: iteration 8920/ 173500 | consumed samples: 2283520 | consumed tokens: 4676648960 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.374951E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.862 | TFLOPs: 31.42 | +7: iteration 8930/ 173500 | consumed samples: 2286080 | consumed tokens: 4681891840 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.375533E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.428 | TFLOPs: 31.82 | +7: iteration 8940/ 173500 | consumed samples: 2288640 | consumed tokens: 4687134720 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.371156E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.704 | TFLOPs: 31.68 | +7: iteration 8950/ 173500 | consumed samples: 2291200 | consumed tokens: 4692377600 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.375734E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.506 | TFLOPs: 31.56 | +7: iteration 8960/ 173500 | consumed samples: 2293760 | consumed tokens: 4697620480 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.364379E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.643 | TFLOPs: 31.67 | +7: iteration 8970/ 173500 | consumed samples: 2296320 | consumed tokens: 4702863360 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.371604E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.292 | TFLOPs: 31.97 | +7: iteration 8980/ 173500 | consumed samples: 2298880 | consumed tokens: 4708106240 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.386330E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.261 | TFLOPs: 31.97 | +7: iteration 8990/ 173500 | consumed samples: 2301440 | consumed tokens: 4713349120 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.371339E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.229 | TFLOPs: 31.97 | +7: iteration 9000/ 173500 | consumed samples: 2304000 | consumed tokens: 4718592000 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.386101E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.603 | TFLOPs: 31.88 | +7: iteration 9010/ 173500 | consumed samples: 2306560 | consumed tokens: 4723834880 | elapsed time per iteration (s): 0.46 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.376173E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.244 | TFLOPs: 29.29 | +7: iteration 9020/ 173500 | consumed samples: 2309120 | consumed tokens: 4729077760 | elapsed time per iteration (s): 0.49 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.366775E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 520.920 | TFLOPs: 27.33 | +7: iteration 9030/ 173500 | consumed samples: 2311680 | consumed tokens: 4734320640 | elapsed time per iteration (s): 0.44 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.383079E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.476 | TFLOPs: 30.40 | +7: iteration 9040/ 173500 | consumed samples: 2314240 | consumed tokens: 4739563520 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.383025E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.864 | TFLOPs: 32.16 | +7: iteration 9050/ 173500 | consumed samples: 2316800 | consumed tokens: 4744806400 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.386445E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.115 | TFLOPs: 30.96 | +7: iteration 9060/ 173500 | consumed samples: 2319360 | consumed tokens: 4750049280 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.380887E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.667 | TFLOPs: 32.09 | +7: iteration 9070/ 173500 | consumed samples: 2321920 | consumed tokens: 4755292160 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.380398E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.330 | TFLOPs: 32.08 | +7: iteration 9080/ 173500 | consumed samples: 2324480 | consumed tokens: 4760535040 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.372701E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.334 | TFLOPs: 31.24 | +7: iteration 9090/ 173500 | consumed samples: 2327040 | consumed tokens: 4765777920 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.382311E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.046 | TFLOPs: 31.75 | +7: iteration 9100/ 173500 | consumed samples: 2329600 | consumed tokens: 4771020800 | elapsed time per iteration (s): 0.44 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.367050E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.005 | TFLOPs: 30.38 | +7: iteration 9110/ 173500 | consumed samples: 2332160 | consumed tokens: 4776263680 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.380579E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.069 | TFLOPs: 31.22 | +7: iteration 9120/ 173500 | consumed samples: 2334720 | consumed tokens: 4781506560 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.364448E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.602 | TFLOPs: 31.62 | +7: iteration 9130/ 173500 | consumed samples: 2337280 | consumed tokens: 4786749440 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.374015E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.792 | TFLOPs: 31.63 | +7: iteration 9140/ 173500 | consumed samples: 2339840 | consumed tokens: 4791992320 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.361899E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.946 | TFLOPs: 31.90 | +7: iteration 9150/ 173500 | consumed samples: 2342400 | consumed tokens: 4797235200 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.356104E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.232 | TFLOPs: 32.02 | +7: iteration 9160/ 173500 | consumed samples: 2344960 | consumed tokens: 4802478080 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.369126E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.677 | TFLOPs: 31.99 | +7: iteration 9170/ 173500 | consumed samples: 2347520 | consumed tokens: 4807720960 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.373098E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.611 | TFLOPs: 31.88 | +7: iteration 9180/ 173500 | consumed samples: 2350080 | consumed tokens: 4812963840 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.368488E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.886 | TFLOPs: 32.00 | +7: iteration 9190/ 173500 | consumed samples: 2352640 | consumed tokens: 4818206720 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.371375E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.403 | TFLOPs: 31.97 | +7: iteration 9200/ 173500 | consumed samples: 2355200 | consumed tokens: 4823449600 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.384419E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.511 | TFLOPs: 31.98 | +7: iteration 9210/ 173500 | consumed samples: 2357760 | consumed tokens: 4828692480 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.369354E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.814 | TFLOPs: 31.68 | +7: iteration 9220/ 173500 | consumed samples: 2360320 | consumed tokens: 4833935360 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.375895E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.041 | TFLOPs: 31.96 | +7: iteration 9230/ 173500 | consumed samples: 2362880 | consumed tokens: 4839178240 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.368320E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.508 | TFLOPs: 31.82 | +7: iteration 9240/ 173500 | consumed samples: 2365440 | consumed tokens: 4844421120 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.362136E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.126 | TFLOPs: 31.70 | +7: iteration 9250/ 173500 | consumed samples: 2368000 | consumed tokens: 4849664000 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.374799E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.218 | TFLOPs: 31.28 | +7: iteration 9260/ 173500 | consumed samples: 2370560 | consumed tokens: 4854906880 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.369103E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.325 | TFLOPs: 32.02 | +7: iteration 9270/ 173500 | consumed samples: 2373120 | consumed tokens: 4860149760 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.370653E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.968 | TFLOPs: 32.00 | +7: iteration 9280/ 173500 | consumed samples: 2375680 | consumed tokens: 4865392640 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.371193E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.677 | TFLOPs: 31.99 | +7: iteration 9290/ 173500 | consumed samples: 2378240 | consumed tokens: 4870635520 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.378958E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.299 | TFLOPs: 31.97 | +7: iteration 9300/ 173500 | consumed samples: 2380800 | consumed tokens: 4875878400 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.389134E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.357 | TFLOPs: 31.13 | +7: iteration 9310/ 173500 | consumed samples: 2383360 | consumed tokens: 4881121280 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.385907E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.158 | TFLOPs: 31.02 | +7: iteration 9320/ 173500 | consumed samples: 2385920 | consumed tokens: 4886364160 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.384273E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.372 | TFLOPs: 32.03 | +7: iteration 9330/ 173500 | consumed samples: 2388480 | consumed tokens: 4891607040 | elapsed time per iteration (s): 0.45 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.374863E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.951 | TFLOPs: 29.54 | +7: iteration 9340/ 173500 | consumed samples: 2391040 | consumed tokens: 4896849920 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.376377E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.150 | TFLOPs: 32.07 | +7: iteration 9350/ 173500 | consumed samples: 2393600 | consumed tokens: 4902092800 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.356727E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.973 | TFLOPs: 32.00 | +7: iteration 9360/ 173500 | consumed samples: 2396160 | consumed tokens: 4907335680 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.361253E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.316 | TFLOPs: 32.02 | +7: iteration 9370/ 173500 | consumed samples: 2398720 | consumed tokens: 4912578560 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.361325E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.914 | TFLOPs: 32.00 | +7: iteration 9380/ 173500 | consumed samples: 2401280 | consumed tokens: 4917821440 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.367392E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.738 | TFLOPs: 31.78 | +7: iteration 9390/ 173500 | consumed samples: 2403840 | consumed tokens: 4923064320 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.367283E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.606 | TFLOPs: 31.99 | +7: iteration 9400/ 173500 | consumed samples: 2406400 | consumed tokens: 4928307200 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.353489E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.855 | TFLOPs: 32.00 | +7: iteration 9410/ 173500 | consumed samples: 2408960 | consumed tokens: 4933550080 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.359495E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.537 | TFLOPs: 31.98 | +7: iteration 9420/ 173500 | consumed samples: 2411520 | consumed tokens: 4938792960 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.359594E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.382 | TFLOPs: 31.97 | +7: iteration 9430/ 173500 | consumed samples: 2414080 | consumed tokens: 4944035840 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.366998E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.760 | TFLOPs: 31.57 | +7: iteration 9440/ 173500 | consumed samples: 2416640 | consumed tokens: 4949278720 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.371790E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.719 | TFLOPs: 31.99 | +7: iteration 9450/ 173500 | consumed samples: 2419200 | consumed tokens: 4954521600 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.376241E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.473 | TFLOPs: 31.98 | +7: iteration 9460/ 173500 | consumed samples: 2421760 | consumed tokens: 4959764480 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.363155E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.086 | TFLOPs: 31.96 | +7: iteration 9470/ 173500 | consumed samples: 2424320 | consumed tokens: 4965007360 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.355120E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.374 | TFLOPs: 31.97 | +7: iteration 9480/ 173500 | consumed samples: 2426880 | consumed tokens: 4970250240 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.360439E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.688 | TFLOPs: 31.46 | +7: iteration 9490/ 173500 | consumed samples: 2429440 | consumed tokens: 4975493120 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.357269E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.461 | TFLOPs: 31.98 | +7: iteration 9500/ 173500 | consumed samples: 2432000 | consumed tokens: 4980736000 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.377005E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.498 | TFLOPs: 31.98 | +7: iteration 9510/ 173500 | consumed samples: 2434560 | consumed tokens: 4985978880 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.364222E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.398 | TFLOPs: 31.97 | +7: iteration 9520/ 173500 | consumed samples: 2437120 | consumed tokens: 4991221760 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.375149E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.150 | TFLOPs: 31.44 | +7: iteration 9530/ 173500 | consumed samples: 2439680 | consumed tokens: 4996464640 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.357754E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.998 | TFLOPs: 31.69 | +7: iteration 9540/ 173500 | consumed samples: 2442240 | consumed tokens: 5001707520 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.362784E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.541 | TFLOPs: 31.98 | +7: iteration 9550/ 173500 | consumed samples: 2444800 | consumed tokens: 5006950400 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.363675E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.641 | TFLOPs: 31.93 | +7: iteration 9560/ 173500 | consumed samples: 2447360 | consumed tokens: 5012193280 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.359816E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.043 | TFLOPs: 31.96 | +7: iteration 9570/ 173500 | consumed samples: 2449920 | consumed tokens: 5017436160 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.353634E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.751 | TFLOPs: 31.94 | +7: iteration 9580/ 173500 | consumed samples: 2452480 | consumed tokens: 5022679040 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.355832E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.876 | TFLOPs: 31.95 | +7: iteration 9590/ 173500 | consumed samples: 2455040 | consumed tokens: 5027921920 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.367548E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.726 | TFLOPs: 31.62 | +7: iteration 9600/ 173500 | consumed samples: 2457600 | consumed tokens: 5033164800 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.370189E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.496 | TFLOPs: 31.98 | +7: iteration 9610/ 173500 | consumed samples: 2460160 | consumed tokens: 5038407680 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.363594E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.681 | TFLOPs: 31.94 | +7: iteration 9620/ 173500 | consumed samples: 2462720 | consumed tokens: 5043650560 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.374411E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.697 | TFLOPs: 31.94 | +7: iteration 9630/ 173500 | consumed samples: 2465280 | consumed tokens: 5048893440 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.352856E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.627 | TFLOPs: 31.41 | +7: iteration 9640/ 173500 | consumed samples: 2467840 | consumed tokens: 5054136320 | elapsed time per iteration (s): 0.44 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.354791E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.868 | TFLOPs: 30.69 | +7: iteration 9650/ 173500 | consumed samples: 2470400 | consumed tokens: 5059379200 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.366607E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.201 | TFLOPs: 31.18 | +7: iteration 9660/ 173500 | consumed samples: 2472960 | consumed tokens: 5064622080 | elapsed time per iteration (s): 0.46 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.349893E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.917 | TFLOPs: 29.38 | +7: iteration 9670/ 173500 | consumed samples: 2475520 | consumed tokens: 5069864960 | elapsed time per iteration (s): 0.44 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.370175E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.868 | TFLOPs: 30.58 | +7: iteration 9680/ 173500 | consumed samples: 2478080 | consumed tokens: 5075107840 | elapsed time per iteration (s): 0.44 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.355265E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.702 | TFLOPs: 30.73 | +7: iteration 9690/ 173500 | consumed samples: 2480640 | consumed tokens: 5080350720 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.354001E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.552 | TFLOPs: 31.72 | +7: iteration 9700/ 173500 | consumed samples: 2483200 | consumed tokens: 5085593600 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.351841E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.465 | TFLOPs: 32.03 | +7: iteration 9710/ 173500 | consumed samples: 2485760 | consumed tokens: 5090836480 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.353075E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.988 | TFLOPs: 31.74 | +7: iteration 9720/ 173500 | consumed samples: 2488320 | consumed tokens: 5096079360 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.351615E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.135 | TFLOPs: 31.80 | +7: iteration 9730/ 173500 | consumed samples: 2490880 | consumed tokens: 5101322240 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.355284E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.653 | TFLOPs: 31.99 | +7: iteration 9740/ 173500 | consumed samples: 2493440 | consumed tokens: 5106565120 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.367060E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.428 | TFLOPs: 31.98 | +7: iteration 9750/ 173500 | consumed samples: 2496000 | consumed tokens: 5111808000 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.349011E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.652 | TFLOPs: 31.99 | +7: iteration 9760/ 173500 | consumed samples: 2498560 | consumed tokens: 5117050880 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.358163E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.285 | TFLOPs: 31.97 | +7: iteration 9770/ 173500 | consumed samples: 2501120 | consumed tokens: 5122293760 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.372815E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.257 | TFLOPs: 31.97 | +7: iteration 9780/ 173500 | consumed samples: 2503680 | consumed tokens: 5127536640 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.350805E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.277 | TFLOPs: 31.97 | +7: iteration 9790/ 173500 | consumed samples: 2506240 | consumed tokens: 5132779520 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.344708E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.052 | TFLOPs: 31.96 | +7: iteration 9800/ 173500 | consumed samples: 2508800 | consumed tokens: 5138022400 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.367422E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.233 | TFLOPs: 31.97 | +7: iteration 9810/ 173500 | consumed samples: 2511360 | consumed tokens: 5143265280 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.366359E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.930 | TFLOPs: 31.95 | +7: iteration 9820/ 173500 | consumed samples: 2513920 | consumed tokens: 5148508160 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.357301E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.375 | TFLOPs: 31.08 | +7: iteration 9830/ 173500 | consumed samples: 2516480 | consumed tokens: 5153751040 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.343296E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.168 | TFLOPs: 31.70 | +7: iteration 9840/ 173500 | consumed samples: 2519040 | consumed tokens: 5158993920 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.363488E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.936 | TFLOPs: 31.79 | +7: iteration 9850/ 173500 | consumed samples: 2521600 | consumed tokens: 5164236800 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.371947E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.253 | TFLOPs: 31.97 | +7: iteration 9860/ 173500 | consumed samples: 2524160 | consumed tokens: 5169479680 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.347537E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.975 | TFLOPs: 31.95 | +7: iteration 9870/ 173500 | consumed samples: 2526720 | consumed tokens: 5174722560 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.355381E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.304 | TFLOPs: 31.97 | +7: iteration 9880/ 173500 | consumed samples: 2529280 | consumed tokens: 5179965440 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.349373E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.987 | TFLOPs: 31.95 | +7: iteration 9890/ 173500 | consumed samples: 2531840 | consumed tokens: 5185208320 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.346310E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.226 | TFLOPs: 31.70 | +7: iteration 9900/ 173500 | consumed samples: 2534400 | consumed tokens: 5190451200 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.341590E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.443 | TFLOPs: 31.98 | +7: iteration 9910/ 173500 | consumed samples: 2536960 | consumed tokens: 5195694080 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.351146E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.321 | TFLOPs: 31.97 | +7: iteration 9920/ 173500 | consumed samples: 2539520 | consumed tokens: 5200936960 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.350008E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.213 | TFLOPs: 31.96 | +7: iteration 9930/ 173500 | consumed samples: 2542080 | consumed tokens: 5206179840 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.359231E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.355 | TFLOPs: 31.97 | +7: iteration 9940/ 173500 | consumed samples: 2544640 | consumed tokens: 5211422720 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.354213E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.093 | TFLOPs: 31.96 | +7: iteration 9950/ 173500 | consumed samples: 2547200 | consumed tokens: 5216665600 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.348192E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.007 | TFLOPs: 31.95 | +7: iteration 9960/ 173500 | consumed samples: 2549760 | consumed tokens: 5221908480 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.352114E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.231 | TFLOPs: 31.97 | +7: iteration 9970/ 173500 | consumed samples: 2552320 | consumed tokens: 5227151360 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.331315E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.930 | TFLOPs: 31.95 | +7: iteration 9980/ 173500 | consumed samples: 2554880 | consumed tokens: 5232394240 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.343472E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.926 | TFLOPs: 31.95 | +7: iteration 9990/ 173500 | consumed samples: 2557440 | consumed tokens: 5237637120 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.350257E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.327 | TFLOPs: 31.97 | +0: [2023-03-17 00:23:33,227] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=0, lr=[0.00019897364350587667, 0.00019897364350587667, 0.00019897364350587667], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 10000/ 173500 | consumed samples: 2560000 | consumed tokens: 5242880000 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.345691E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.024 | TFLOPs: 31.95 | +0: steps: 10000 loss: 3.3268 iter time (s): 0.421 samples/sec: 607.901 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 10000 | lm loss value: 3.477324E+00 | lm loss PPL: 3.237296E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 10000 to checkpoints_221m91b400m +0: [2023-03-17 00:23:33,476] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step10000 is begin to save! +0: [2023-03-17 00:23:33,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_01-model_00-model_states.pt... +0: [2023-03-17 00:23:34,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_01-model_00-model_states.pt. +0: [2023-03-17 00:23:34,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_03-model_00-model_states.pt... +0: [2023-03-17 00:23:34,064] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_03-model_00-model_states.pt. +0: [2023-03-17 00:23:34,064] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_04-model_00-model_states.pt... +0: [2023-03-17 00:23:34,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_04-model_00-model_states.pt. +0: [2023-03-17 00:23:34,089] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_05-model_00-model_states.pt... +0: [2023-03-17 00:23:34,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_05-model_00-model_states.pt. +0: [2023-03-17 00:23:34,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_06-model_00-model_states.pt... +0: [2023-03-17 00:23:34,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_06-model_00-model_states.pt. +0: [2023-03-17 00:23:34,140] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_07-model_00-model_states.pt... +0: [2023-03-17 00:23:34,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_07-model_00-model_states.pt. +0: [2023-03-17 00:23:34,165] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_08-model_00-model_states.pt... +0: [2023-03-17 00:23:34,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_08-model_00-model_states.pt. +0: [2023-03-17 00:23:34,190] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_09-model_00-model_states.pt... +0: [2023-03-17 00:23:34,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_09-model_00-model_states.pt. +0: [2023-03-17 00:23:34,214] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_10-model_00-model_states.pt... +0: [2023-03-17 00:23:34,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_10-model_00-model_states.pt. +0: [2023-03-17 00:23:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_11-model_00-model_states.pt... +0: [2023-03-17 00:23:34,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_11-model_00-model_states.pt. +0: [2023-03-17 00:23:34,264] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_12-model_00-model_states.pt... +0: [2023-03-17 00:23:34,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_12-model_00-model_states.pt. +0: [2023-03-17 00:23:34,289] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_13-model_00-model_states.pt... +0: [2023-03-17 00:23:34,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_13-model_00-model_states.pt. +0: [2023-03-17 00:23:34,315] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_14-model_00-model_states.pt... +0: [2023-03-17 00:23:34,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_14-model_00-model_states.pt. +0: [2023-03-17 00:23:34,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_15-model_00-model_states.pt... +0: [2023-03-17 00:23:34,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_15-model_00-model_states.pt. +0: [2023-03-17 00:23:34,364] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_16-model_00-model_states.pt... +0: [2023-03-17 00:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_16-model_00-model_states.pt. +0: [2023-03-17 00:23:34,389] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_17-model_00-model_states.pt... +0: [2023-03-17 00:23:34,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_17-model_00-model_states.pt. +0: [2023-03-17 00:23:34,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_18-model_00-model_states.pt... +0: [2023-03-17 00:23:34,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_18-model_00-model_states.pt. +0: [2023-03-17 00:23:34,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_19-model_00-model_states.pt... +0: [2023-03-17 00:23:34,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_19-model_00-model_states.pt. +0: [2023-03-17 00:23:34,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_20-model_00-model_states.pt... +0: [2023-03-17 00:23:34,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_20-model_00-model_states.pt. +0: [2023-03-17 00:23:34,491] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/layer_22-model_00-model_states.pt... +0: [2023-03-17 00:23:34,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/layer_22-model_00-model_states.pt. +0: [2023-03-17 00:23:34,497] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step10000/mp_rank_00_model_states.pt +0: [2023-03-17 00:23:34,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/mp_rank_00_model_states.pt... +0: [2023-03-17 00:23:34,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/mp_rank_00_model_states.pt. +0: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +2: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +7: [2023-03-17 00:23:34,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +0: [2023-03-17 00:23:34,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:23:34,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:23:34,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 00:23:34,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-17 00:23:34,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:23:34,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 00:23:34,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-17 00:23:34,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:23:34,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 00:23:34,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-17 00:23:34,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:23:34,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 00:23:34,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-17 00:23:34,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:23:34,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 00:23:34,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-17 00:23:34,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:23:34,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 00:23:34,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-17 00:23:34,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 00:23:34,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 00:23:34,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 00:23:34,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 00:23:34,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 00:23:34,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 00:23:34,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 00:23:34,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 00:23:34,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 00:23:34,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 00:23:34,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-17 00:23:34,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:23:34,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 00:23:34,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 00:23:34,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 00:23:34,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 00:23:34,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 00:23:34,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 00:23:34,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-17 00:23:34,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-17 00:23:34,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 00:23:34,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 00:23:34,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 00:23:34,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 00:23:34,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 00:23:34,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 00:23:34,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 00:23:34,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 00:23:34,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 00:23:34,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 00:23:34,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:23:34,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:23:34,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 00:23:34,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 00:23:34,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-17 00:23:34,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-17 00:23:34,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 00:23:34,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-17 00:23:34,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-17 00:23:34,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-17 00:23:34,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-17 00:23:34,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-17 00:23:34,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 00:23:34,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 00:23:34,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-17 00:23:34,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 00:23:34,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 00:23:34,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 00:23:34,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 00:23:34,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 00:23:34,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 00:23:34,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 00:23:34,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 00:23:34,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-17 00:23:34,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: successfully saved checkpoint at iteration 10000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 1202.01 +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 00:23:34,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 00:23:34,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 00:23:34,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 00:23:34,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 00:23:34,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 00:23:34,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 00:23:34,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 00:23:34,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-17 00:23:34,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: iteration 10010/ 173500 | consumed samples: 2562560 | consumed tokens: 5248122880 | elapsed time per iteration (s): 0.56 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.341711E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 459.015 | TFLOPs: 24.08 | +7: iteration 10020/ 173500 | consumed samples: 2565120 | consumed tokens: 5253365760 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.351631E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.055 | TFLOPs: 32.17 | +7: iteration 10030/ 173500 | consumed samples: 2567680 | consumed tokens: 5258608640 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.349704E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.889 | TFLOPs: 32.10 | +7: iteration 10040/ 173500 | consumed samples: 2570240 | consumed tokens: 5263851520 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.343894E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.559 | TFLOPs: 32.09 | +7: iteration 10050/ 173500 | consumed samples: 2572800 | consumed tokens: 5269094400 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.342920E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.754 | TFLOPs: 32.05 | +7: iteration 10060/ 173500 | consumed samples: 2575360 | consumed tokens: 5274337280 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.335105E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.873 | TFLOPs: 31.74 | +7: iteration 10070/ 173500 | consumed samples: 2577920 | consumed tokens: 5279580160 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.350441E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.839 | TFLOPs: 31.89 | +7: iteration 10080/ 173500 | consumed samples: 2580480 | consumed tokens: 5284823040 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.326505E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.009 | TFLOPs: 32.01 | +7: iteration 10090/ 173500 | consumed samples: 2583040 | consumed tokens: 5290065920 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.343910E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.997 | TFLOPs: 32.01 | +7: iteration 10100/ 173500 | consumed samples: 2585600 | consumed tokens: 5295308800 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.340743E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.549 | TFLOPs: 31.98 | +7: iteration 10110/ 173500 | consumed samples: 2588160 | consumed tokens: 5300551680 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.342404E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.355 | TFLOPs: 31.97 | +7: iteration 10120/ 173500 | consumed samples: 2590720 | consumed tokens: 5305794560 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.347607E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.216 | TFLOPs: 31.96 | +7: iteration 10130/ 173500 | consumed samples: 2593280 | consumed tokens: 5311037440 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.347815E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.950 | TFLOPs: 31.74 | +7: iteration 10140/ 173500 | consumed samples: 2595840 | consumed tokens: 5316280320 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.343596E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.623 | TFLOPs: 31.99 | +7: iteration 10150/ 173500 | consumed samples: 2598400 | consumed tokens: 5321523200 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.355784E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.810 | TFLOPs: 31.68 | +7: iteration 10160/ 173500 | consumed samples: 2600960 | consumed tokens: 5326766080 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.346823E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.781 | TFLOPs: 31.73 | +7: iteration 10170/ 173500 | consumed samples: 2603520 | consumed tokens: 5332008960 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.327245E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.832 | TFLOPs: 32.00 | +7: iteration 10180/ 173500 | consumed samples: 2606080 | consumed tokens: 5337251840 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.348639E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.550 | TFLOPs: 31.98 | +7: iteration 10190/ 173500 | consumed samples: 2608640 | consumed tokens: 5342494720 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.334122E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.217 | TFLOPs: 31.96 | +7: iteration 10200/ 173500 | consumed samples: 2611200 | consumed tokens: 5347737600 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.334093E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.627 | TFLOPs: 31.99 | +7: iteration 10210/ 173500 | consumed samples: 2613760 | consumed tokens: 5352980480 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.345374E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.185 | TFLOPs: 31.96 | +7: iteration 10220/ 173500 | consumed samples: 2616320 | consumed tokens: 5358223360 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.333368E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.343 | TFLOPs: 31.97 | +7: iteration 10230/ 173500 | consumed samples: 2618880 | consumed tokens: 5363466240 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.331185E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.083 | TFLOPs: 31.91 | +7: iteration 10240/ 173500 | consumed samples: 2621440 | consumed tokens: 5368709120 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.349892E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.944 | TFLOPs: 31.95 | +7: iteration 10250/ 173500 | consumed samples: 2624000 | consumed tokens: 5373952000 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.343934E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.027 | TFLOPs: 31.95 | +7: iteration 10260/ 173500 | consumed samples: 2626560 | consumed tokens: 5379194880 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.336591E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.343 | TFLOPs: 31.97 | +7: iteration 10270/ 173500 | consumed samples: 2629120 | consumed tokens: 5384437760 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.337277E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.418 | TFLOPs: 31.98 | +7: iteration 10280/ 173500 | consumed samples: 2631680 | consumed tokens: 5389680640 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.341038E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.294 | TFLOPs: 31.97 | +7: iteration 10290/ 173500 | consumed samples: 2634240 | consumed tokens: 5394923520 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.335313E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.911 | TFLOPs: 31.95 | +7: iteration 10300/ 173500 | consumed samples: 2636800 | consumed tokens: 5400166400 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.335210E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.001 | TFLOPs: 31.74 | +7: iteration 10310/ 173500 | consumed samples: 2639360 | consumed tokens: 5405409280 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.335988E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.661 | TFLOPs: 31.99 | +7: iteration 10320/ 173500 | consumed samples: 2641920 | consumed tokens: 5410652160 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.338960E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.718 | TFLOPs: 31.99 | +7: iteration 10330/ 173500 | consumed samples: 2644480 | consumed tokens: 5415895040 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.325446E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.239 | TFLOPs: 31.97 | +7: iteration 10340/ 173500 | consumed samples: 2647040 | consumed tokens: 5421137920 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.335577E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.921 | TFLOPs: 31.95 | +7: iteration 10350/ 173500 | consumed samples: 2649600 | consumed tokens: 5426380800 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.339226E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.018 | TFLOPs: 31.95 | +7: iteration 10360/ 173500 | consumed samples: 2652160 | consumed tokens: 5431623680 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.349298E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.108 | TFLOPs: 31.96 | +7: iteration 10370/ 173500 | consumed samples: 2654720 | consumed tokens: 5436866560 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.338709E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.236 | TFLOPs: 31.97 | +7: iteration 10380/ 173500 | consumed samples: 2657280 | consumed tokens: 5442109440 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.346108E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.095 | TFLOPs: 31.96 | +7: iteration 10390/ 173500 | consumed samples: 2659840 | consumed tokens: 5447352320 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.334399E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.249 | TFLOPs: 31.97 | +7: iteration 10400/ 173500 | consumed samples: 2662400 | consumed tokens: 5452595200 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.335003E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.120 | TFLOPs: 31.96 | +7: iteration 10410/ 173500 | consumed samples: 2664960 | consumed tokens: 5457838080 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.341949E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.955 | TFLOPs: 31.95 | +7: iteration 10420/ 173500 | consumed samples: 2667520 | consumed tokens: 5463080960 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.321243E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.530 | TFLOPs: 31.98 | +7: iteration 10430/ 173500 | consumed samples: 2670080 | consumed tokens: 5468323840 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.328009E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.413 | TFLOPs: 31.97 | +7: iteration 10440/ 173500 | consumed samples: 2672640 | consumed tokens: 5473566720 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.335051E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.438 | TFLOPs: 31.98 | +7: iteration 10450/ 173500 | consumed samples: 2675200 | consumed tokens: 5478809600 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.327712E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.248 | TFLOPs: 31.97 | +7: iteration 10460/ 173500 | consumed samples: 2677760 | consumed tokens: 5484052480 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.333110E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.177 | TFLOPs: 31.96 | +7: iteration 10470/ 173500 | consumed samples: 2680320 | consumed tokens: 5489295360 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.322987E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.235 | TFLOPs: 31.97 | +7: iteration 10480/ 173500 | consumed samples: 2682880 | consumed tokens: 5494538240 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.337548E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.488 | TFLOPs: 31.98 | +7: iteration 10490/ 173500 | consumed samples: 2685440 | consumed tokens: 5499781120 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.324455E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.493 | TFLOPs: 31.98 | +7: iteration 10500/ 173500 | consumed samples: 2688000 | consumed tokens: 5505024000 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.328394E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.293 | TFLOPs: 31.97 | +7: iteration 10510/ 173500 | consumed samples: 2690560 | consumed tokens: 5510266880 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.318975E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.186 | TFLOPs: 31.96 | +7: iteration 10520/ 173500 | consumed samples: 2693120 | consumed tokens: 5515509760 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.333708E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.385 | TFLOPs: 31.97 | +7: iteration 10530/ 173500 | consumed samples: 2695680 | consumed tokens: 5520752640 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.337459E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.262 | TFLOPs: 31.91 | +7: iteration 10540/ 173500 | consumed samples: 2698240 | consumed tokens: 5525995520 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.325012E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.146 | TFLOPs: 31.96 | +7: iteration 10550/ 173500 | consumed samples: 2700800 | consumed tokens: 5531238400 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.324763E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.261 | TFLOPs: 31.97 | +7: iteration 10560/ 173500 | consumed samples: 2703360 | consumed tokens: 5536481280 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.343030E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.520 | TFLOPs: 31.98 | +7: iteration 10570/ 173500 | consumed samples: 2705920 | consumed tokens: 5541724160 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.323616E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.493 | TFLOPs: 31.98 | +7: iteration 10580/ 173500 | consumed samples: 2708480 | consumed tokens: 5546967040 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.317670E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.295 | TFLOPs: 31.97 | +7: iteration 10590/ 173500 | consumed samples: 2711040 | consumed tokens: 5552209920 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.343086E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.328 | TFLOPs: 31.97 | +7: iteration 10600/ 173500 | consumed samples: 2713600 | consumed tokens: 5557452800 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.319940E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.146 | TFLOPs: 31.96 | +7: iteration 10610/ 173500 | consumed samples: 2716160 | consumed tokens: 5562695680 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.305566E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.157 | TFLOPs: 31.96 | +7: iteration 10620/ 173500 | consumed samples: 2718720 | consumed tokens: 5567938560 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.324469E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.394 | TFLOPs: 31.82 | +7: iteration 10630/ 173500 | consumed samples: 2721280 | consumed tokens: 5573181440 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.323324E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.365 | TFLOPs: 31.97 | +7: iteration 10640/ 173500 | consumed samples: 2723840 | consumed tokens: 5578424320 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.340925E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.056 | TFLOPs: 31.64 | +7: iteration 10650/ 173500 | consumed samples: 2726400 | consumed tokens: 5583667200 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.326072E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.239 | TFLOPs: 31.97 | +7: iteration 10660/ 173500 | consumed samples: 2728960 | consumed tokens: 5588910080 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.321764E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.382 | TFLOPs: 31.97 | +7: iteration 10670/ 173500 | consumed samples: 2731520 | consumed tokens: 5594152960 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.329539E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.598 | TFLOPs: 31.98 | +7: iteration 10680/ 173500 | consumed samples: 2734080 | consumed tokens: 5599395840 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.324694E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.276 | TFLOPs: 31.97 | +7: iteration 10690/ 173500 | consumed samples: 2736640 | consumed tokens: 5604638720 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.313900E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.129 | TFLOPs: 31.96 | +7: iteration 10700/ 173500 | consumed samples: 2739200 | consumed tokens: 5609881600 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.325816E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.436 | TFLOPs: 31.98 | +7: iteration 10710/ 173500 | consumed samples: 2741760 | consumed tokens: 5615124480 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.328316E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.040 | TFLOPs: 31.96 | +7: iteration 10720/ 173500 | consumed samples: 2744320 | consumed tokens: 5620367360 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.311162E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.129 | TFLOPs: 31.96 | +7: iteration 10730/ 173500 | consumed samples: 2746880 | consumed tokens: 5625610240 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.327267E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.313 | TFLOPs: 31.97 | +7: iteration 10740/ 173500 | consumed samples: 2749440 | consumed tokens: 5630853120 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.324736E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.959 | TFLOPs: 31.95 | +7: iteration 10750/ 173500 | consumed samples: 2752000 | consumed tokens: 5636096000 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.321302E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.187 | TFLOPs: 31.96 | +7: iteration 10760/ 173500 | consumed samples: 2754560 | consumed tokens: 5641338880 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.316269E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.224 | TFLOPs: 31.96 | +7: iteration 10770/ 173500 | consumed samples: 2757120 | consumed tokens: 5646581760 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.322228E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.195 | TFLOPs: 31.96 | +7: iteration 10780/ 173500 | consumed samples: 2759680 | consumed tokens: 5651824640 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.324232E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.259 | TFLOPs: 31.97 | +7: iteration 10790/ 173500 | consumed samples: 2762240 | consumed tokens: 5657067520 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.326303E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.199 | TFLOPs: 31.96 | +7: iteration 10800/ 173500 | consumed samples: 2764800 | consumed tokens: 5662310400 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.331149E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.618 | TFLOPs: 31.93 | +7: iteration 10810/ 173500 | consumed samples: 2767360 | consumed tokens: 5667553280 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.308971E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.174 | TFLOPs: 31.96 | +7: iteration 10820/ 173500 | consumed samples: 2769920 | consumed tokens: 5672796160 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.322380E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.016 | TFLOPs: 31.95 | +7: iteration 10830/ 173500 | consumed samples: 2772480 | consumed tokens: 5678039040 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.331558E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.756 | TFLOPs: 31.78 | +7: iteration 10840/ 173500 | consumed samples: 2775040 | consumed tokens: 5683281920 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.329250E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.093 | TFLOPs: 31.96 | +7: iteration 10850/ 173500 | consumed samples: 2777600 | consumed tokens: 5688524800 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.317024E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.197 | TFLOPs: 31.96 | +7: iteration 10860/ 173500 | consumed samples: 2780160 | consumed tokens: 5693767680 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.328730E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.214 | TFLOPs: 31.96 | +7: iteration 10870/ 173500 | consumed samples: 2782720 | consumed tokens: 5699010560 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.334986E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.817 | TFLOPs: 31.79 | +7: iteration 10880/ 173500 | consumed samples: 2785280 | consumed tokens: 5704253440 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.325183E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.177 | TFLOPs: 31.96 | +7: iteration 10890/ 173500 | consumed samples: 2787840 | consumed tokens: 5709496320 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.320855E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.015 | TFLOPs: 31.95 | +7: iteration 10900/ 173500 | consumed samples: 2790400 | consumed tokens: 5714739200 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.324530E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.965 | TFLOPs: 31.95 | +7: iteration 10910/ 173500 | consumed samples: 2792960 | consumed tokens: 5719982080 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.313463E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.099 | TFLOPs: 31.96 | +7: iteration 10920/ 173500 | consumed samples: 2795520 | consumed tokens: 5725224960 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.319581E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.622 | TFLOPs: 31.93 | +7: iteration 10930/ 173500 | consumed samples: 2798080 | consumed tokens: 5730467840 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.291835E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.943 | TFLOPs: 31.95 | +7: iteration 10940/ 173500 | consumed samples: 2800640 | consumed tokens: 5735710720 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.306897E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.954 | TFLOPs: 31.95 | +7: iteration 10950/ 173500 | consumed samples: 2803200 | consumed tokens: 5740953600 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.299564E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.321 | TFLOPs: 31.97 | +7: iteration 10960/ 173500 | consumed samples: 2805760 | consumed tokens: 5746196480 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.317046E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.721 | TFLOPs: 31.94 | +7: iteration 10970/ 173500 | consumed samples: 2808320 | consumed tokens: 5751439360 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.315649E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.973 | TFLOPs: 31.95 | +7: iteration 10980/ 173500 | consumed samples: 2810880 | consumed tokens: 5756682240 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.326871E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.764 | TFLOPs: 31.94 | +7: iteration 10990/ 173500 | consumed samples: 2813440 | consumed tokens: 5761925120 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.308002E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.065 | TFLOPs: 31.96 | +7: iteration 11000/ 173500 | consumed samples: 2816000 | consumed tokens: 5767168000 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.320634E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.816 | TFLOPs: 31.94 | +7: iteration 11010/ 173500 | consumed samples: 2818560 | consumed tokens: 5772410880 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.316618E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.658 | TFLOPs: 31.94 | +7: iteration 11020/ 173500 | consumed samples: 2821120 | consumed tokens: 5777653760 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.319054E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.734 | TFLOPs: 31.94 | +7: iteration 11030/ 173500 | consumed samples: 2823680 | consumed tokens: 5782896640 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.306380E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.789 | TFLOPs: 31.94 | +7: iteration 11040/ 173500 | consumed samples: 2826240 | consumed tokens: 5788139520 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.329564E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.791 | TFLOPs: 31.94 | +7: iteration 11050/ 173500 | consumed samples: 2828800 | consumed tokens: 5793382400 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.323484E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.733 | TFLOPs: 31.94 | +7: iteration 11060/ 173500 | consumed samples: 2831360 | consumed tokens: 5798625280 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.318982E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.604 | TFLOPs: 31.93 | +7: iteration 11070/ 173500 | consumed samples: 2833920 | consumed tokens: 5803868160 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.316456E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.458 | TFLOPs: 31.92 | +7: iteration 11080/ 173500 | consumed samples: 2836480 | consumed tokens: 5809111040 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.322368E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.552 | TFLOPs: 31.93 | +7: iteration 11090/ 173500 | consumed samples: 2839040 | consumed tokens: 5814353920 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.311628E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.674 | TFLOPs: 31.94 | +7: iteration 11100/ 173500 | consumed samples: 2841600 | consumed tokens: 5819596800 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.317574E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.411 | TFLOPs: 31.92 | +7: iteration 11110/ 173500 | consumed samples: 2844160 | consumed tokens: 5824839680 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.319128E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.594 | TFLOPs: 31.93 | +7: iteration 11120/ 173500 | consumed samples: 2846720 | consumed tokens: 5830082560 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.327062E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.198 | TFLOPs: 31.91 | +7: iteration 11130/ 173500 | consumed samples: 2849280 | consumed tokens: 5835325440 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.315792E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.213 | TFLOPs: 31.91 | +7: iteration 11140/ 173500 | consumed samples: 2851840 | consumed tokens: 5840568320 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.314392E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.927 | TFLOPs: 31.95 | +7: iteration 11150/ 173500 | consumed samples: 2854400 | consumed tokens: 5845811200 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.322808E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.512 | TFLOPs: 31.93 | +7: iteration 11160/ 173500 | consumed samples: 2856960 | consumed tokens: 5851054080 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.323847E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.598 | TFLOPs: 31.93 | +7: iteration 11170/ 173500 | consumed samples: 2859520 | consumed tokens: 5856296960 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.310829E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.409 | TFLOPs: 31.92 | +7: iteration 11180/ 173500 | consumed samples: 2862080 | consumed tokens: 5861539840 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.313982E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.251 | TFLOPs: 31.91 | +7: iteration 11190/ 173500 | consumed samples: 2864640 | consumed tokens: 5866782720 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.313164E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.513 | TFLOPs: 31.93 | +7: iteration 11200/ 173500 | consumed samples: 2867200 | consumed tokens: 5872025600 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.300684E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.434 | TFLOPs: 31.92 | +7: iteration 11210/ 173500 | consumed samples: 2869760 | consumed tokens: 5877268480 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.306100E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.499 | TFLOPs: 31.93 | +7: iteration 11220/ 173500 | consumed samples: 2872320 | consumed tokens: 5882511360 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.327046E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.219 | TFLOPs: 31.91 | +7: iteration 11230/ 173500 | consumed samples: 2874880 | consumed tokens: 5887754240 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.298225E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.239 | TFLOPs: 31.91 | +7: iteration 11240/ 173500 | consumed samples: 2877440 | consumed tokens: 5892997120 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.315551E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.357 | TFLOPs: 31.92 | +7: iteration 11250/ 173500 | consumed samples: 2880000 | consumed tokens: 5898240000 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.314848E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.911 | TFLOPs: 31.90 | +7: iteration 11260/ 173500 | consumed samples: 2882560 | consumed tokens: 5903482880 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.307053E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.431 | TFLOPs: 31.87 | +7: iteration 11270/ 173500 | consumed samples: 2885120 | consumed tokens: 5908725760 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.302526E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.551 | TFLOPs: 31.88 | +7: iteration 11280/ 173500 | consumed samples: 2887680 | consumed tokens: 5913968640 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.316972E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.678 | TFLOPs: 31.88 | +7: iteration 11290/ 173500 | consumed samples: 2890240 | consumed tokens: 5919211520 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.311069E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.005 | TFLOPs: 31.85 | +7: iteration 11300/ 173500 | consumed samples: 2892800 | consumed tokens: 5924454400 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.302699E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.872 | TFLOPs: 31.84 | +7: iteration 11310/ 173500 | consumed samples: 2895360 | consumed tokens: 5929697280 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.312402E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.160 | TFLOPs: 31.86 | +7: iteration 11320/ 173500 | consumed samples: 2897920 | consumed tokens: 5934940160 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.308004E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.831 | TFLOPs: 31.89 | +7: iteration 11330/ 173500 | consumed samples: 2900480 | consumed tokens: 5940183040 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.317084E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.980 | TFLOPs: 31.85 | +7: iteration 11340/ 173500 | consumed samples: 2903040 | consumed tokens: 5945425920 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.309653E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.753 | TFLOPs: 31.84 | +7: iteration 11350/ 173500 | consumed samples: 2905600 | consumed tokens: 5950668800 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.318992E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.708 | TFLOPs: 31.83 | +7: iteration 11360/ 173500 | consumed samples: 2908160 | consumed tokens: 5955911680 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.306541E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.791 | TFLOPs: 31.84 | +7: iteration 11370/ 173500 | consumed samples: 2910720 | consumed tokens: 5961154560 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.312936E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.638 | TFLOPs: 31.83 | +7: iteration 11380/ 173500 | consumed samples: 2913280 | consumed tokens: 5966397440 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.318584E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.806 | TFLOPs: 31.84 | +7: iteration 11390/ 173500 | consumed samples: 2915840 | consumed tokens: 5971640320 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.294359E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.762 | TFLOPs: 31.84 | +7: iteration 11400/ 173500 | consumed samples: 2918400 | consumed tokens: 5976883200 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.316863E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.819 | TFLOPs: 31.89 | +7: iteration 11410/ 173500 | consumed samples: 2920960 | consumed tokens: 5982126080 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.310371E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.594 | TFLOPs: 31.88 | +7: iteration 11420/ 173500 | consumed samples: 2923520 | consumed tokens: 5987368960 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.314542E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.294 | TFLOPs: 31.76 | +7: iteration 11430/ 173500 | consumed samples: 2926080 | consumed tokens: 5992611840 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.314068E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.601 | TFLOPs: 31.83 | +7: iteration 11440/ 173500 | consumed samples: 2928640 | consumed tokens: 5997854720 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.304828E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.510 | TFLOPs: 31.77 | +7: iteration 11450/ 173500 | consumed samples: 2931200 | consumed tokens: 6003097600 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.304513E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.946 | TFLOPs: 31.85 | +7: iteration 11460/ 173500 | consumed samples: 2933760 | consumed tokens: 6008340480 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.294234E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.476 | TFLOPs: 31.82 | +7: iteration 11470/ 173500 | consumed samples: 2936320 | consumed tokens: 6013583360 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.289805E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.188 | TFLOPs: 31.81 | +7: iteration 11480/ 173500 | consumed samples: 2938880 | consumed tokens: 6018826240 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.298055E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.652 | TFLOPs: 31.83 | +7: iteration 11490/ 173500 | consumed samples: 2941440 | consumed tokens: 6024069120 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.312558E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.230 | TFLOPs: 31.81 | +7: iteration 11500/ 173500 | consumed samples: 2944000 | consumed tokens: 6029312000 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.294712E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.004 | TFLOPs: 31.85 | +7: iteration 11510/ 173500 | consumed samples: 2946560 | consumed tokens: 6034554880 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.296292E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.723 | TFLOPs: 31.83 | +7: iteration 11520/ 173500 | consumed samples: 2949120 | consumed tokens: 6039797760 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.293077E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.487 | TFLOPs: 31.82 | +7: iteration 11530/ 173500 | consumed samples: 2951680 | consumed tokens: 6045040640 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.296787E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.558 | TFLOPs: 31.83 | +7: iteration 11540/ 173500 | consumed samples: 2954240 | consumed tokens: 6050283520 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.307822E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.117 | TFLOPs: 31.80 | +7: iteration 11550/ 173500 | consumed samples: 2956800 | consumed tokens: 6055526400 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.312951E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.814 | TFLOPs: 31.73 | +7: iteration 11560/ 173500 | consumed samples: 2959360 | consumed tokens: 6060769280 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.308545E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.277 | TFLOPs: 31.81 | +7: iteration 11570/ 173500 | consumed samples: 2961920 | consumed tokens: 6066012160 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.301178E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.416 | TFLOPs: 31.82 | +7: iteration 11580/ 173500 | consumed samples: 2964480 | consumed tokens: 6071255040 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.308677E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.583 | TFLOPs: 31.83 | +7: iteration 11590/ 173500 | consumed samples: 2967040 | consumed tokens: 6076497920 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.298545E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.983 | TFLOPs: 31.79 | +7: iteration 11600/ 173500 | consumed samples: 2969600 | consumed tokens: 6081740800 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.321656E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.068 | TFLOPs: 31.80 | +7: iteration 11610/ 173500 | consumed samples: 2972160 | consumed tokens: 6086983680 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.297933E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.244 | TFLOPs: 31.81 | +7: iteration 11620/ 173500 | consumed samples: 2974720 | consumed tokens: 6092226560 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.297870E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.060 | TFLOPs: 31.80 | +7: iteration 11630/ 173500 | consumed samples: 2977280 | consumed tokens: 6097469440 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.308381E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.698 | TFLOPs: 31.78 | +7: iteration 11640/ 173500 | consumed samples: 2979840 | consumed tokens: 6102712320 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.312371E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.664 | TFLOPs: 31.78 | +7: iteration 11650/ 173500 | consumed samples: 2982400 | consumed tokens: 6107955200 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.302139E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.041 | TFLOPs: 31.80 | +7: iteration 11660/ 173500 | consumed samples: 2984960 | consumed tokens: 6113198080 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.289213E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.234 | TFLOPs: 31.86 | +7: iteration 11670/ 173500 | consumed samples: 2987520 | consumed tokens: 6118440960 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.299529E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.912 | TFLOPs: 31.84 | +7: iteration 11680/ 173500 | consumed samples: 2990080 | consumed tokens: 6123683840 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.306898E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.554 | TFLOPs: 31.88 | +7: iteration 11690/ 173500 | consumed samples: 2992640 | consumed tokens: 6128926720 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.315593E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.485 | TFLOPs: 31.87 | +7: iteration 11700/ 173500 | consumed samples: 2995200 | consumed tokens: 6134169600 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.309025E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.598 | TFLOPs: 31.88 | +7: iteration 11710/ 173500 | consumed samples: 2997760 | consumed tokens: 6139412480 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.307272E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.436 | TFLOPs: 31.66 | +7: iteration 11720/ 173500 | consumed samples: 3000320 | consumed tokens: 6144655360 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.297723E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.382 | TFLOPs: 31.45 | +7: iteration 11730/ 173500 | consumed samples: 3002880 | consumed tokens: 6149898240 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.303826E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.870 | TFLOPs: 31.42 | +7: iteration 11740/ 173500 | consumed samples: 3005440 | consumed tokens: 6155141120 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.309726E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.188 | TFLOPs: 31.23 | +7: iteration 11750/ 173500 | consumed samples: 3008000 | consumed tokens: 6160384000 | elapsed time per iteration (s): 0.44 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.302087E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.127 | TFLOPs: 30.81 | +7: iteration 11760/ 173500 | consumed samples: 3010560 | consumed tokens: 6165626880 | elapsed time per iteration (s): 0.44 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.287991E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.011 | TFLOPs: 30.80 | +7: iteration 11770/ 173500 | consumed samples: 3013120 | consumed tokens: 6170869760 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.285175E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.637 | TFLOPs: 31.51 | +7: iteration 11780/ 173500 | consumed samples: 3015680 | consumed tokens: 6176112640 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.295090E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.199 | TFLOPs: 31.75 | +7: iteration 11790/ 173500 | consumed samples: 3018240 | consumed tokens: 6181355520 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.317091E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.575 | TFLOPs: 31.72 | +7: iteration 11800/ 173500 | consumed samples: 3020800 | consumed tokens: 6186598400 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.289281E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.175 | TFLOPs: 31.86 | +7: iteration 11810/ 173500 | consumed samples: 3023360 | consumed tokens: 6191841280 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.289243E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.169 | TFLOPs: 31.70 | +7: iteration 11820/ 173500 | consumed samples: 3025920 | consumed tokens: 6197084160 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.308530E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.672 | TFLOPs: 31.83 | +7: iteration 11830/ 173500 | consumed samples: 3028480 | consumed tokens: 6202327040 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.301066E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.947 | TFLOPs: 31.69 | +7: iteration 11840/ 173500 | consumed samples: 3031040 | consumed tokens: 6207569920 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.303440E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.659 | TFLOPs: 31.83 | +7: iteration 11850/ 173500 | consumed samples: 3033600 | consumed tokens: 6212812800 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.302957E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.904 | TFLOPs: 31.84 | +7: iteration 11860/ 173500 | consumed samples: 3036160 | consumed tokens: 6218055680 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.304176E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.484 | TFLOPs: 31.72 | +7: iteration 11870/ 173500 | consumed samples: 3038720 | consumed tokens: 6223298560 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.304819E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.647 | TFLOPs: 31.88 | +7: iteration 11880/ 173500 | consumed samples: 3041280 | consumed tokens: 6228541440 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.291978E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.287 | TFLOPs: 31.86 | +7: iteration 11890/ 173500 | consumed samples: 3043840 | consumed tokens: 6233784320 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.297774E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.346 | TFLOPs: 31.87 | +7: iteration 11900/ 173500 | consumed samples: 3046400 | consumed tokens: 6239027200 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.287717E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.225 | TFLOPs: 31.86 | +7: iteration 11910/ 173500 | consumed samples: 3048960 | consumed tokens: 6244270080 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.292771E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.018 | TFLOPs: 31.85 | +7: iteration 11920/ 173500 | consumed samples: 3051520 | consumed tokens: 6249512960 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.292701E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.424 | TFLOPs: 31.87 | +7: iteration 11930/ 173500 | consumed samples: 3054080 | consumed tokens: 6254755840 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.303325E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.381 | TFLOPs: 31.87 | +7: iteration 11940/ 173500 | consumed samples: 3056640 | consumed tokens: 6259998720 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.305012E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.597 | TFLOPs: 31.88 | +7: iteration 11950/ 173500 | consumed samples: 3059200 | consumed tokens: 6265241600 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.299050E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.239 | TFLOPs: 31.86 | +7: iteration 11960/ 173500 | consumed samples: 3061760 | consumed tokens: 6270484480 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.292488E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.242 | TFLOPs: 31.86 | +7: iteration 11970/ 173500 | consumed samples: 3064320 | consumed tokens: 6275727360 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.291629E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.565 | TFLOPs: 31.88 | +7: iteration 11980/ 173500 | consumed samples: 3066880 | consumed tokens: 6280970240 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.307784E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.498 | TFLOPs: 31.87 | +7: iteration 11990/ 173500 | consumed samples: 3069440 | consumed tokens: 6286213120 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.284821E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.581 | TFLOPs: 31.88 | +0: [2023-03-17 00:37:37,225] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=0, lr=[0.0001984184547955352, 0.0001984184547955352, 0.0001984184547955352], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 12000/ 173500 | consumed samples: 3072000 | consumed tokens: 6291456000 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.282371E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.517 | TFLOPs: 31.88 | +0: steps: 12000 loss: 3.2698 iter time (s): 0.419 samples/sec: 610.906 +7: iteration 12010/ 173500 | consumed samples: 3074560 | consumed tokens: 6296698880 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.296605E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.337 | TFLOPs: 31.81 | +7: iteration 12020/ 173500 | consumed samples: 3077120 | consumed tokens: 6301941760 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.285446E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.681 | TFLOPs: 31.88 | +7: iteration 12030/ 173500 | consumed samples: 3079680 | consumed tokens: 6307184640 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.274450E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.139 | TFLOPs: 31.91 | +7: iteration 12040/ 173500 | consumed samples: 3082240 | consumed tokens: 6312427520 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.299295E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.161 | TFLOPs: 31.75 | +7: iteration 12050/ 173500 | consumed samples: 3084800 | consumed tokens: 6317670400 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.293676E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.869 | TFLOPs: 31.89 | +7: iteration 12060/ 173500 | consumed samples: 3087360 | consumed tokens: 6322913280 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.287405E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.915 | TFLOPs: 31.90 | +7: iteration 12070/ 173500 | consumed samples: 3089920 | consumed tokens: 6328156160 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.287408E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.740 | TFLOPs: 31.89 | +7: iteration 12080/ 173500 | consumed samples: 3092480 | consumed tokens: 6333399040 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.278389E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.811 | TFLOPs: 31.89 | +7: iteration 12090/ 173500 | consumed samples: 3095040 | consumed tokens: 6338641920 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.291250E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.744 | TFLOPs: 31.89 | +7: iteration 12100/ 173500 | consumed samples: 3097600 | consumed tokens: 6343884800 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.292027E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.937 | TFLOPs: 31.90 | +7: iteration 12110/ 173500 | consumed samples: 3100160 | consumed tokens: 6349127680 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.291147E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.788 | TFLOPs: 31.89 | +7: iteration 12120/ 173500 | consumed samples: 3102720 | consumed tokens: 6354370560 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.278861E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.622 | TFLOPs: 31.88 | +7: iteration 12130/ 173500 | consumed samples: 3105280 | consumed tokens: 6359613440 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.306878E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.043 | TFLOPs: 31.85 | +7: iteration 12140/ 173500 | consumed samples: 3107840 | consumed tokens: 6364856320 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.288617E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.861 | TFLOPs: 31.84 | +7: iteration 12150/ 173500 | consumed samples: 3110400 | consumed tokens: 6370099200 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.285136E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.064 | TFLOPs: 31.85 | +7: iteration 12160/ 173500 | consumed samples: 3112960 | consumed tokens: 6375342080 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.278865E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.546 | TFLOPs: 31.88 | +7: iteration 12170/ 173500 | consumed samples: 3115520 | consumed tokens: 6380584960 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.291360E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.898 | TFLOPs: 31.90 | +7: iteration 12180/ 173500 | consumed samples: 3118080 | consumed tokens: 6385827840 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.284145E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.483 | TFLOPs: 31.87 | +7: iteration 12190/ 173500 | consumed samples: 3120640 | consumed tokens: 6391070720 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.279882E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.829 | TFLOPs: 31.89 | +7: iteration 12200/ 173500 | consumed samples: 3123200 | consumed tokens: 6396313600 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.294617E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.529 | TFLOPs: 31.88 | +7: iteration 12210/ 173500 | consumed samples: 3125760 | consumed tokens: 6401556480 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.289436E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.522 | TFLOPs: 31.88 | +7: iteration 12220/ 173500 | consumed samples: 3128320 | consumed tokens: 6406799360 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.291439E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.851 | TFLOPs: 31.89 | +7: iteration 12230/ 173500 | consumed samples: 3130880 | consumed tokens: 6412042240 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.280598E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.518 | TFLOPs: 31.88 | +7: iteration 12240/ 173500 | consumed samples: 3133440 | consumed tokens: 6417285120 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.280516E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.791 | TFLOPs: 31.89 | +7: iteration 12250/ 173500 | consumed samples: 3136000 | consumed tokens: 6422528000 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.286615E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.932 | TFLOPs: 31.90 | +7: iteration 12260/ 173500 | consumed samples: 3138560 | consumed tokens: 6427770880 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.278925E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.754 | TFLOPs: 31.89 | +7: iteration 12270/ 173500 | consumed samples: 3141120 | consumed tokens: 6433013760 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.278945E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.219 | TFLOPs: 31.91 | +7: iteration 12280/ 173500 | consumed samples: 3143680 | consumed tokens: 6438256640 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.283912E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.772 | TFLOPs: 31.89 | +7: iteration 12290/ 173500 | consumed samples: 3146240 | consumed tokens: 6443499520 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.285397E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.790 | TFLOPs: 31.89 | +7: iteration 12300/ 173500 | consumed samples: 3148800 | consumed tokens: 6448742400 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.296242E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.128 | TFLOPs: 31.80 | +7: iteration 12310/ 173500 | consumed samples: 3151360 | consumed tokens: 6453985280 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.293405E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.379 | TFLOPs: 31.87 | +7: iteration 12320/ 173500 | consumed samples: 3153920 | consumed tokens: 6459228160 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.281903E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.861 | TFLOPs: 31.89 | +7: iteration 12330/ 173500 | consumed samples: 3156480 | consumed tokens: 6464471040 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.287592E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.578 | TFLOPs: 31.88 | +7: iteration 12340/ 173500 | consumed samples: 3159040 | consumed tokens: 6469713920 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.277147E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.950 | TFLOPs: 31.90 | +7: iteration 12350/ 173500 | consumed samples: 3161600 | consumed tokens: 6474956800 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.283354E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.706 | TFLOPs: 31.89 | +7: iteration 12360/ 173500 | consumed samples: 3164160 | consumed tokens: 6480199680 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.291885E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.832 | TFLOPs: 31.89 | +7: iteration 12370/ 173500 | consumed samples: 3166720 | consumed tokens: 6485442560 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.269740E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.781 | TFLOPs: 31.89 | +7: iteration 12380/ 173500 | consumed samples: 3169280 | consumed tokens: 6490685440 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.282878E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.741 | TFLOPs: 31.89 | +7: iteration 12390/ 173500 | consumed samples: 3171840 | consumed tokens: 6495928320 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.299344E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.263 | TFLOPs: 31.91 | +7: iteration 12400/ 173500 | consumed samples: 3174400 | consumed tokens: 6501171200 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.289703E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.786 | TFLOPs: 31.89 | +7: iteration 12410/ 173500 | consumed samples: 3176960 | consumed tokens: 6506414080 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.274031E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.838 | TFLOPs: 31.89 | +7: iteration 12420/ 173500 | consumed samples: 3179520 | consumed tokens: 6511656960 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.286838E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.876 | TFLOPs: 31.89 | +7: iteration 12430/ 173500 | consumed samples: 3182080 | consumed tokens: 6516899840 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.291726E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.901 | TFLOPs: 31.90 | +7: iteration 12440/ 173500 | consumed samples: 3184640 | consumed tokens: 6522142720 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.293213E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.219 | TFLOPs: 31.91 | +7: iteration 12450/ 173500 | consumed samples: 3187200 | consumed tokens: 6527385600 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.283928E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.887 | TFLOPs: 31.89 | +7: iteration 12460/ 173500 | consumed samples: 3189760 | consumed tokens: 6532628480 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.289807E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.577 | TFLOPs: 31.88 | +7: iteration 12470/ 173500 | consumed samples: 3192320 | consumed tokens: 6537871360 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.286008E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.319 | TFLOPs: 31.92 | +7: iteration 12480/ 173500 | consumed samples: 3194880 | consumed tokens: 6543114240 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.279159E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.936 | TFLOPs: 31.90 | +7: iteration 12490/ 173500 | consumed samples: 3197440 | consumed tokens: 6548357120 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.265260E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.652 | TFLOPs: 31.88 | +7: iteration 12500/ 173500 | consumed samples: 3200000 | consumed tokens: 6553600000 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.293782E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.588 | TFLOPs: 31.88 | +7: iteration 12510/ 173500 | consumed samples: 3202560 | consumed tokens: 6558842880 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.280702E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.179 | TFLOPs: 31.86 | +7: iteration 12520/ 173500 | consumed samples: 3205120 | consumed tokens: 6564085760 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.276217E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.365 | TFLOPs: 31.87 | +7: iteration 12530/ 173500 | consumed samples: 3207680 | consumed tokens: 6569328640 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.281327E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.886 | TFLOPs: 31.84 | +7: iteration 12540/ 173500 | consumed samples: 3210240 | consumed tokens: 6574571520 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.283364E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.410 | TFLOPs: 31.87 | +7: iteration 12550/ 173500 | consumed samples: 3212800 | consumed tokens: 6579814400 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.288195E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.631 | TFLOPs: 31.88 | +7: iteration 12560/ 173500 | consumed samples: 3215360 | consumed tokens: 6585057280 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.266104E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.035 | TFLOPs: 31.90 | +7: iteration 12570/ 173500 | consumed samples: 3217920 | consumed tokens: 6590300160 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.286694E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.614 | TFLOPs: 31.88 | +7: iteration 12580/ 173500 | consumed samples: 3220480 | consumed tokens: 6595543040 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.271492E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.685 | TFLOPs: 31.88 | +7: iteration 12590/ 173500 | consumed samples: 3223040 | consumed tokens: 6600785920 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.277510E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.174 | TFLOPs: 31.91 | +7: iteration 12600/ 173500 | consumed samples: 3225600 | consumed tokens: 6606028800 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.279160E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.700 | TFLOPs: 31.89 | +7: iteration 12610/ 173500 | consumed samples: 3228160 | consumed tokens: 6611271680 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.279398E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.885 | TFLOPs: 31.89 | +7: iteration 12620/ 173500 | consumed samples: 3230720 | consumed tokens: 6616514560 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.295219E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.910 | TFLOPs: 31.90 | +7: iteration 12630/ 173500 | consumed samples: 3233280 | consumed tokens: 6621757440 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.279278E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.473 | TFLOPs: 31.87 | +7: iteration 12640/ 173500 | consumed samples: 3235840 | consumed tokens: 6627000320 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.282861E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.414 | TFLOPs: 31.87 | +7: iteration 12650/ 173500 | consumed samples: 3238400 | consumed tokens: 6632243200 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.282730E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.240 | TFLOPs: 31.86 | +7: iteration 12660/ 173500 | consumed samples: 3240960 | consumed tokens: 6637486080 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.281123E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.547 | TFLOPs: 31.88 | +7: iteration 12670/ 173500 | consumed samples: 3243520 | consumed tokens: 6642728960 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.272222E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.620 | TFLOPs: 31.88 | +7: iteration 12680/ 173500 | consumed samples: 3246080 | consumed tokens: 6647971840 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.271388E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.059 | TFLOPs: 31.90 | +7: iteration 12690/ 173500 | consumed samples: 3248640 | consumed tokens: 6653214720 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.281609E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.229 | TFLOPs: 31.91 | +7: iteration 12700/ 173500 | consumed samples: 3251200 | consumed tokens: 6658457600 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.289286E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.149 | TFLOPs: 31.86 | +7: iteration 12710/ 173500 | consumed samples: 3253760 | consumed tokens: 6663700480 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.279505E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.233 | TFLOPs: 31.86 | +7: iteration 12720/ 173500 | consumed samples: 3256320 | consumed tokens: 6668943360 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.281684E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.555 | TFLOPs: 31.88 | +7: iteration 12730/ 173500 | consumed samples: 3258880 | consumed tokens: 6674186240 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.280005E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.115 | TFLOPs: 31.91 | +7: iteration 12740/ 173500 | consumed samples: 3261440 | consumed tokens: 6679429120 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.277510E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.672 | TFLOPs: 31.88 | +7: iteration 12750/ 173500 | consumed samples: 3264000 | consumed tokens: 6684672000 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.272796E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.711 | TFLOPs: 31.89 | +7: iteration 12760/ 173500 | consumed samples: 3266560 | consumed tokens: 6689914880 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.278965E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.628 | TFLOPs: 31.88 | +7: iteration 12770/ 173500 | consumed samples: 3269120 | consumed tokens: 6695157760 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.296222E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.819 | TFLOPs: 31.89 | +7: iteration 12780/ 173500 | consumed samples: 3271680 | consumed tokens: 6700400640 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.278115E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.835 | TFLOPs: 31.89 | +7: iteration 12790/ 173500 | consumed samples: 3274240 | consumed tokens: 6705643520 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.267711E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.438 | TFLOPs: 31.92 | +7: iteration 12800/ 173500 | consumed samples: 3276800 | consumed tokens: 6710886400 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.275533E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.662 | TFLOPs: 31.88 | +7: iteration 12810/ 173500 | consumed samples: 3279360 | consumed tokens: 6716129280 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.272362E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.801 | TFLOPs: 31.89 | +7: iteration 12820/ 173500 | consumed samples: 3281920 | consumed tokens: 6721372160 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.277710E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.838 | TFLOPs: 31.89 | +7: iteration 12830/ 173500 | consumed samples: 3284480 | consumed tokens: 6726615040 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.263034E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.229 | TFLOPs: 31.91 | +7: iteration 12840/ 173500 | consumed samples: 3287040 | consumed tokens: 6731857920 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.258247E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.862 | TFLOPs: 31.89 | +7: iteration 12850/ 173500 | consumed samples: 3289600 | consumed tokens: 6737100800 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.264291E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.882 | TFLOPs: 31.89 | +7: iteration 12860/ 173500 | consumed samples: 3292160 | consumed tokens: 6742343680 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.274708E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.777 | TFLOPs: 31.89 | +7: iteration 12870/ 173500 | consumed samples: 3294720 | consumed tokens: 6747586560 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.286388E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.423 | TFLOPs: 31.92 | +7: iteration 12880/ 173500 | consumed samples: 3297280 | consumed tokens: 6752829440 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.276182E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.442 | TFLOPs: 31.92 | +7: iteration 12890/ 173500 | consumed samples: 3299840 | consumed tokens: 6758072320 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.282664E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.631 | TFLOPs: 31.93 | +7: iteration 12900/ 173500 | consumed samples: 3302400 | consumed tokens: 6763315200 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.277879E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.617 | TFLOPs: 31.93 | +7: iteration 12910/ 173500 | consumed samples: 3304960 | consumed tokens: 6768558080 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.274909E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.010 | TFLOPs: 31.95 | +7: iteration 12920/ 173500 | consumed samples: 3307520 | consumed tokens: 6773800960 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.275996E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.365 | TFLOPs: 31.92 | +7: iteration 12930/ 173500 | consumed samples: 3310080 | consumed tokens: 6779043840 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.267354E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.879 | TFLOPs: 31.95 | +7: iteration 12940/ 173500 | consumed samples: 3312640 | consumed tokens: 6784286720 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.282685E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.084 | TFLOPs: 31.96 | +7: iteration 12950/ 173500 | consumed samples: 3315200 | consumed tokens: 6789529600 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.275451E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.227 | TFLOPs: 31.97 | +7: iteration 12960/ 173500 | consumed samples: 3317760 | consumed tokens: 6794772480 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.283385E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.488 | TFLOPs: 31.98 | +7: iteration 12970/ 173500 | consumed samples: 3320320 | consumed tokens: 6800015360 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.283442E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.444 | TFLOPs: 31.98 | +7: iteration 12980/ 173500 | consumed samples: 3322880 | consumed tokens: 6805258240 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.269638E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.036 | TFLOPs: 31.96 | +7: iteration 12990/ 173500 | consumed samples: 3325440 | consumed tokens: 6810501120 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.266274E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.729 | TFLOPs: 31.99 | +7: iteration 13000/ 173500 | consumed samples: 3328000 | consumed tokens: 6815744000 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.287453E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.843 | TFLOPs: 32.00 | +7: iteration 13010/ 173500 | consumed samples: 3330560 | consumed tokens: 6820986880 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.271336E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.862 | TFLOPs: 32.00 | +7: iteration 13020/ 173500 | consumed samples: 3333120 | consumed tokens: 6826229760 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.266020E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.144 | TFLOPs: 32.01 | +7: iteration 13030/ 173500 | consumed samples: 3335680 | consumed tokens: 6831472640 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.269867E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.102 | TFLOPs: 32.01 | +7: iteration 13040/ 173500 | consumed samples: 3338240 | consumed tokens: 6836715520 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.277927E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.165 | TFLOPs: 32.01 | +7: iteration 13050/ 173500 | consumed samples: 3340800 | consumed tokens: 6841958400 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.276902E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.059 | TFLOPs: 32.01 | +7: iteration 13060/ 173500 | consumed samples: 3343360 | consumed tokens: 6847201280 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.263854E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.479 | TFLOPs: 32.03 | +7: iteration 13070/ 173500 | consumed samples: 3345920 | consumed tokens: 6852444160 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.258821E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.777 | TFLOPs: 31.99 | +7: iteration 13080/ 173500 | consumed samples: 3348480 | consumed tokens: 6857687040 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.252819E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.851 | TFLOPs: 32.00 | +7: iteration 13090/ 173500 | consumed samples: 3351040 | consumed tokens: 6862929920 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.262675E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.844 | TFLOPs: 32.00 | +7: iteration 13100/ 173500 | consumed samples: 3353600 | consumed tokens: 6868172800 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.263742E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.400 | TFLOPs: 32.03 | +7: iteration 13110/ 173500 | consumed samples: 3356160 | consumed tokens: 6873415680 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.271463E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.613 | TFLOPs: 31.72 | +7: iteration 13120/ 173500 | consumed samples: 3358720 | consumed tokens: 6878658560 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.268484E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.375 | TFLOPs: 32.03 | +7: iteration 13130/ 173500 | consumed samples: 3361280 | consumed tokens: 6883901440 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.262541E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.290 | TFLOPs: 32.07 | +7: iteration 13140/ 173500 | consumed samples: 3363840 | consumed tokens: 6889144320 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.274402E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.895 | TFLOPs: 32.05 | +7: iteration 13150/ 173500 | consumed samples: 3366400 | consumed tokens: 6894387200 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.273539E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.840 | TFLOPs: 32.05 | +7: iteration 13160/ 173500 | consumed samples: 3368960 | consumed tokens: 6899630080 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.273545E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.992 | TFLOPs: 32.06 | +7: iteration 13170/ 173500 | consumed samples: 3371520 | consumed tokens: 6904872960 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.272823E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.934 | TFLOPs: 32.05 | +7: iteration 13180/ 173500 | consumed samples: 3374080 | consumed tokens: 6910115840 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.270261E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.224 | TFLOPs: 32.07 | +7: iteration 13190/ 173500 | consumed samples: 3376640 | consumed tokens: 6915358720 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.259048E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.199 | TFLOPs: 32.07 | +7: iteration 13200/ 173500 | consumed samples: 3379200 | consumed tokens: 6920601600 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.272030E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.079 | TFLOPs: 32.06 | +7: iteration 13210/ 173500 | consumed samples: 3381760 | consumed tokens: 6925844480 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.256075E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.095 | TFLOPs: 32.06 | +7: iteration 13220/ 173500 | consumed samples: 3384320 | consumed tokens: 6931087360 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.257930E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.976 | TFLOPs: 32.06 | +7: iteration 13230/ 173500 | consumed samples: 3386880 | consumed tokens: 6936330240 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.273978E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.343 | TFLOPs: 32.08 | +7: iteration 13240/ 173500 | consumed samples: 3389440 | consumed tokens: 6941573120 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.255043E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.095 | TFLOPs: 32.06 | +7: iteration 13250/ 173500 | consumed samples: 3392000 | consumed tokens: 6946816000 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.262240E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.993 | TFLOPs: 32.06 | +7: iteration 13260/ 173500 | consumed samples: 3394560 | consumed tokens: 6952058880 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.276196E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.412 | TFLOPs: 31.66 | +7: iteration 13270/ 173500 | consumed samples: 3397120 | consumed tokens: 6957301760 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.267109E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.355 | TFLOPs: 32.08 | +7: iteration 13280/ 173500 | consumed samples: 3399680 | consumed tokens: 6962544640 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.273227E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.773 | TFLOPs: 32.10 | +7: iteration 13290/ 173500 | consumed samples: 3402240 | consumed tokens: 6967787520 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.270468E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.605 | TFLOPs: 32.09 | +7: iteration 13300/ 173500 | consumed samples: 3404800 | consumed tokens: 6973030400 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.272817E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.662 | TFLOPs: 32.09 | +7: iteration 13310/ 173500 | consumed samples: 3407360 | consumed tokens: 6978273280 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.247861E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.562 | TFLOPs: 32.09 | +7: iteration 13320/ 173500 | consumed samples: 3409920 | consumed tokens: 6983516160 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.281189E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.567 | TFLOPs: 32.09 | +7: iteration 13330/ 173500 | consumed samples: 3412480 | consumed tokens: 6988759040 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.255760E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.630 | TFLOPs: 32.09 | +7: iteration 13340/ 173500 | consumed samples: 3415040 | consumed tokens: 6994001920 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.277045E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.153 | TFLOPs: 32.07 | +7: iteration 13350/ 173500 | consumed samples: 3417600 | consumed tokens: 6999244800 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.265759E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.246 | TFLOPs: 32.07 | +7: iteration 13360/ 173500 | consumed samples: 3420160 | consumed tokens: 7004487680 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.269949E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.287 | TFLOPs: 32.07 | +7: iteration 13370/ 173500 | consumed samples: 3422720 | consumed tokens: 7009730560 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.258944E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.145 | TFLOPs: 32.07 | +7: iteration 13380/ 173500 | consumed samples: 3425280 | consumed tokens: 7014973440 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.259047E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.049 | TFLOPs: 32.06 | +7: iteration 13390/ 173500 | consumed samples: 3427840 | consumed tokens: 7020216320 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.264968E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.675 | TFLOPs: 32.04 | +7: iteration 13400/ 173500 | consumed samples: 3430400 | consumed tokens: 7025459200 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.270500E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.642 | TFLOPs: 32.04 | +7: iteration 13410/ 173500 | consumed samples: 3432960 | consumed tokens: 7030702080 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.255034E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.582 | TFLOPs: 32.04 | +7: iteration 13420/ 173500 | consumed samples: 3435520 | consumed tokens: 7035944960 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.267173E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.203 | TFLOPs: 32.02 | +7: iteration 13430/ 173500 | consumed samples: 3438080 | consumed tokens: 7041187840 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.259206E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.316 | TFLOPs: 32.02 | +7: iteration 13440/ 173500 | consumed samples: 3440640 | consumed tokens: 7046430720 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.271132E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.264 | TFLOPs: 32.02 | +7: iteration 13450/ 173500 | consumed samples: 3443200 | consumed tokens: 7051673600 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.268587E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.017 | TFLOPs: 32.01 | +7: iteration 13460/ 173500 | consumed samples: 3445760 | consumed tokens: 7056916480 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.264388E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.871 | TFLOPs: 31.89 | +7: iteration 13470/ 173500 | consumed samples: 3448320 | consumed tokens: 7062159360 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.258621E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.976 | TFLOPs: 32.00 | +7: iteration 13480/ 173500 | consumed samples: 3450880 | consumed tokens: 7067402240 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.258979E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.795 | TFLOPs: 31.99 | +7: iteration 13490/ 173500 | consumed samples: 3453440 | consumed tokens: 7072645120 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.261528E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.604 | TFLOPs: 31.98 | +7: iteration 13500/ 173500 | consumed samples: 3456000 | consumed tokens: 7077888000 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.268295E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.344 | TFLOPs: 31.97 | +7: iteration 13510/ 173500 | consumed samples: 3458560 | consumed tokens: 7083130880 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.264642E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.369 | TFLOPs: 31.97 | +7: iteration 13520/ 173500 | consumed samples: 3461120 | consumed tokens: 7088373760 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.253765E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.498 | TFLOPs: 31.98 | +7: iteration 13530/ 173500 | consumed samples: 3463680 | consumed tokens: 7093616640 | elapsed time per iteration (s): 0.43 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.261153E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.428 | TFLOPs: 31.03 | +7: iteration 13540/ 173500 | consumed samples: 3466240 | consumed tokens: 7098859520 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.261468E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.604 | TFLOPs: 31.88 | +7: iteration 13550/ 173500 | consumed samples: 3468800 | consumed tokens: 7104102400 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.246319E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.335 | TFLOPs: 31.87 | +7: iteration 13560/ 173500 | consumed samples: 3471360 | consumed tokens: 7109345280 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.256393E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.084 | TFLOPs: 31.70 | +7: iteration 13570/ 173500 | consumed samples: 3473920 | consumed tokens: 7114588160 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.275702E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.250 | TFLOPs: 31.97 | +7: iteration 13580/ 173500 | consumed samples: 3476480 | consumed tokens: 7119831040 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.255457E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.985 | TFLOPs: 31.90 | +7: iteration 13590/ 173500 | consumed samples: 3479040 | consumed tokens: 7125073920 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.277139E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.069 | TFLOPs: 31.85 | +7: iteration 13600/ 173500 | consumed samples: 3481600 | consumed tokens: 7130316800 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.245845E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.775 | TFLOPs: 31.94 | +7: iteration 13610/ 173500 | consumed samples: 3484160 | consumed tokens: 7135559680 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.261204E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.179 | TFLOPs: 31.70 | +7: iteration 13620/ 173500 | consumed samples: 3486720 | consumed tokens: 7140802560 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.256389E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.342 | TFLOPs: 31.81 | +7: iteration 13630/ 173500 | consumed samples: 3489280 | consumed tokens: 7146045440 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.259568E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.178 | TFLOPs: 31.65 | +7: iteration 13640/ 173500 | consumed samples: 3491840 | consumed tokens: 7151288320 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.266264E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.920 | TFLOPs: 31.74 | +7: iteration 13650/ 173500 | consumed samples: 3494400 | consumed tokens: 7156531200 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.260699E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.324 | TFLOPs: 31.92 | +7: iteration 13660/ 173500 | consumed samples: 3496960 | consumed tokens: 7161774080 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.262281E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.298 | TFLOPs: 31.92 | +7: iteration 13670/ 173500 | consumed samples: 3499520 | consumed tokens: 7167016960 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.248827E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.110 | TFLOPs: 31.91 | +7: iteration 13680/ 173500 | consumed samples: 3502080 | consumed tokens: 7172259840 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.248222E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.270 | TFLOPs: 31.91 | +7: iteration 13690/ 173500 | consumed samples: 3504640 | consumed tokens: 7177502720 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.244440E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.188 | TFLOPs: 31.91 | +7: iteration 13700/ 173500 | consumed samples: 3507200 | consumed tokens: 7182745600 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.261345E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.048 | TFLOPs: 31.90 | +7: iteration 13710/ 173500 | consumed samples: 3509760 | consumed tokens: 7187988480 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.251855E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.788 | TFLOPs: 31.89 | +7: iteration 13720/ 173500 | consumed samples: 3512320 | consumed tokens: 7193231360 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.248466E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.566 | TFLOPs: 31.88 | +7: iteration 13730/ 173500 | consumed samples: 3514880 | consumed tokens: 7198474240 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.264622E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.908 | TFLOPs: 31.90 | +7: iteration 13740/ 173500 | consumed samples: 3517440 | consumed tokens: 7203717120 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.258221E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.458 | TFLOPs: 31.87 | +7: iteration 13750/ 173500 | consumed samples: 3520000 | consumed tokens: 7208960000 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.267329E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.455 | TFLOPs: 31.87 | +7: iteration 13760/ 173500 | consumed samples: 3522560 | consumed tokens: 7214202880 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.267203E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.584 | TFLOPs: 31.88 | +7: iteration 13770/ 173500 | consumed samples: 3525120 | consumed tokens: 7219445760 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.250512E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.172 | TFLOPs: 31.86 | +7: iteration 13780/ 173500 | consumed samples: 3527680 | consumed tokens: 7224688640 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.276635E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.531 | TFLOPs: 31.88 | +7: iteration 13790/ 173500 | consumed samples: 3530240 | consumed tokens: 7229931520 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.252597E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.396 | TFLOPs: 31.87 | +7: iteration 13800/ 173500 | consumed samples: 3532800 | consumed tokens: 7235174400 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.246791E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.277 | TFLOPs: 31.86 | +7: iteration 13810/ 173500 | consumed samples: 3535360 | consumed tokens: 7240417280 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.257229E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.248 | TFLOPs: 31.86 | +7: iteration 13820/ 173500 | consumed samples: 3537920 | consumed tokens: 7245660160 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.258593E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.394 | TFLOPs: 31.66 | +7: iteration 13830/ 173500 | consumed samples: 3540480 | consumed tokens: 7250903040 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.272191E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.314 | TFLOPs: 31.45 | +7: iteration 13840/ 173500 | consumed samples: 3543040 | consumed tokens: 7256145920 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.266618E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.003 | TFLOPs: 31.69 | +7: iteration 13850/ 173500 | consumed samples: 3545600 | consumed tokens: 7261388800 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.255500E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.120 | TFLOPs: 31.54 | +7: iteration 13860/ 173500 | consumed samples: 3548160 | consumed tokens: 7266631680 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.260033E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.262 | TFLOPs: 31.28 | +7: iteration 13870/ 173500 | consumed samples: 3550720 | consumed tokens: 7271874560 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.246759E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.861 | TFLOPs: 31.21 | +7: iteration 13880/ 173500 | consumed samples: 3553280 | consumed tokens: 7277117440 | elapsed time per iteration (s): 0.44 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.254364E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.253 | TFLOPs: 30.76 | +7: iteration 13890/ 173500 | consumed samples: 3555840 | consumed tokens: 7282360320 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.257061E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.945 | TFLOPs: 30.95 | +7: iteration 13900/ 173500 | consumed samples: 3558400 | consumed tokens: 7287603200 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.260651E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.419 | TFLOPs: 30.98 | +7: iteration 13910/ 173500 | consumed samples: 3560960 | consumed tokens: 7292846080 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.253298E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.711 | TFLOPs: 31.47 | +7: iteration 13920/ 173500 | consumed samples: 3563520 | consumed tokens: 7298088960 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.257833E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.096 | TFLOPs: 30.96 | +7: iteration 13930/ 173500 | consumed samples: 3566080 | consumed tokens: 7303331840 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.253980E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.648 | TFLOPs: 31.88 | +7: iteration 13940/ 173500 | consumed samples: 3568640 | consumed tokens: 7308574720 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.242534E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.508 | TFLOPs: 31.87 | +7: iteration 13950/ 173500 | consumed samples: 3571200 | consumed tokens: 7313817600 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.244872E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.420 | TFLOPs: 31.87 | +7: iteration 13960/ 173500 | consumed samples: 3573760 | consumed tokens: 7319060480 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.248689E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.625 | TFLOPs: 31.67 | +7: iteration 13970/ 173500 | consumed samples: 3576320 | consumed tokens: 7324303360 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.250508E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.385 | TFLOPs: 31.87 | +7: iteration 13980/ 173500 | consumed samples: 3578880 | consumed tokens: 7329546240 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.251411E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.591 | TFLOPs: 31.83 | +7: iteration 13990/ 173500 | consumed samples: 3581440 | consumed tokens: 7334789120 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.263912E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.224 | TFLOPs: 31.81 | +0: [2023-03-17 00:51:39,879] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=0, lr=[0.00019774496681175836, 0.00019774496681175836, 0.00019774496681175836], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 14000/ 173500 | consumed samples: 3584000 | consumed tokens: 7340032000 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.244478E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.570 | TFLOPs: 31.83 | +0: steps: 14000 loss: 3.2399 iter time (s): 0.419 samples/sec: 610.912 +7: iteration 14010/ 173500 | consumed samples: 3586560 | consumed tokens: 7345274880 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.249647E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.783 | TFLOPs: 31.73 | +7: iteration 14020/ 173500 | consumed samples: 3589120 | consumed tokens: 7350517760 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.243286E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.745 | TFLOPs: 31.83 | +7: iteration 14030/ 173500 | consumed samples: 3591680 | consumed tokens: 7355760640 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.237321E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.780 | TFLOPs: 31.84 | +7: iteration 14040/ 173500 | consumed samples: 3594240 | consumed tokens: 7361003520 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.253431E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.019 | TFLOPs: 31.85 | +7: iteration 14050/ 173500 | consumed samples: 3596800 | consumed tokens: 7366246400 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.263379E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.842 | TFLOPs: 31.84 | +7: iteration 14060/ 173500 | consumed samples: 3599360 | consumed tokens: 7371489280 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.252644E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.856 | TFLOPs: 31.84 | +7: iteration 14070/ 173500 | consumed samples: 3601920 | consumed tokens: 7376732160 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.242493E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.200 | TFLOPs: 31.86 | +7: iteration 14080/ 173500 | consumed samples: 3604480 | consumed tokens: 7381975040 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.261047E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.880 | TFLOPs: 31.84 | +7: iteration 14090/ 173500 | consumed samples: 3607040 | consumed tokens: 7387217920 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.246988E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.171 | TFLOPs: 31.80 | +7: iteration 14100/ 173500 | consumed samples: 3609600 | consumed tokens: 7392460800 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.251525E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.240 | TFLOPs: 31.81 | +7: iteration 14110/ 173500 | consumed samples: 3612160 | consumed tokens: 7397703680 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.256542E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.405 | TFLOPs: 31.82 | +7: iteration 14120/ 173500 | consumed samples: 3614720 | consumed tokens: 7402946560 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.255786E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.730 | TFLOPs: 31.83 | +7: iteration 14130/ 173500 | consumed samples: 3617280 | consumed tokens: 7408189440 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.240632E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.968 | TFLOPs: 31.79 | +7: iteration 14140/ 173500 | consumed samples: 3619840 | consumed tokens: 7413432320 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.253724E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.707 | TFLOPs: 31.83 | +7: iteration 14150/ 173500 | consumed samples: 3622400 | consumed tokens: 7418675200 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.257587E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.080 | TFLOPs: 31.85 | +7: iteration 14160/ 173500 | consumed samples: 3624960 | consumed tokens: 7423918080 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.253544E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.877 | TFLOPs: 31.84 | +7: iteration 14170/ 173500 | consumed samples: 3627520 | consumed tokens: 7429160960 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.245677E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.695 | TFLOPs: 31.83 | +7: iteration 14180/ 173500 | consumed samples: 3630080 | consumed tokens: 7434403840 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.250785E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.016 | TFLOPs: 31.80 | +7: iteration 14190/ 173500 | consumed samples: 3632640 | consumed tokens: 7439646720 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.243675E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.006 | TFLOPs: 31.80 | +7: iteration 14200/ 173500 | consumed samples: 3635200 | consumed tokens: 7444889600 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.255718E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.472 | TFLOPs: 31.82 | +7: iteration 14210/ 173500 | consumed samples: 3637760 | consumed tokens: 7450132480 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.257125E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.726 | TFLOPs: 31.78 | +7: iteration 14220/ 173500 | consumed samples: 3640320 | consumed tokens: 7455375360 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.235430E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.287 | TFLOPs: 31.81 | +7: iteration 14230/ 173500 | consumed samples: 3642880 | consumed tokens: 7460618240 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.257038E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.207 | TFLOPs: 31.81 | +7: iteration 14240/ 173500 | consumed samples: 3645440 | consumed tokens: 7465861120 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.228925E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.656 | TFLOPs: 31.83 | +7: iteration 14250/ 173500 | consumed samples: 3648000 | consumed tokens: 7471104000 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.246888E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.458 | TFLOPs: 31.82 | +7: iteration 14260/ 173500 | consumed samples: 3650560 | consumed tokens: 7476346880 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.257590E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.381 | TFLOPs: 31.82 | +7: iteration 14270/ 173500 | consumed samples: 3653120 | consumed tokens: 7481589760 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.237174E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.350 | TFLOPs: 31.81 | +7: iteration 14280/ 173500 | consumed samples: 3655680 | consumed tokens: 7486832640 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.245877E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.308 | TFLOPs: 31.81 | +7: iteration 14290/ 173500 | consumed samples: 3658240 | consumed tokens: 7492075520 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.257880E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.541 | TFLOPs: 31.82 | +7: iteration 14300/ 173500 | consumed samples: 3660800 | consumed tokens: 7497318400 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.243926E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.348 | TFLOPs: 31.81 | +7: iteration 14310/ 173500 | consumed samples: 3663360 | consumed tokens: 7502561280 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.250454E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.862 | TFLOPs: 31.84 | +7: iteration 14320/ 173500 | consumed samples: 3665920 | consumed tokens: 7507804160 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.251762E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.840 | TFLOPs: 31.84 | +7: iteration 14330/ 173500 | consumed samples: 3668480 | consumed tokens: 7513047040 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.263656E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.373 | TFLOPs: 31.82 | +7: iteration 14340/ 173500 | consumed samples: 3671040 | consumed tokens: 7518289920 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.248808E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.716 | TFLOPs: 31.83 | +7: iteration 14350/ 173500 | consumed samples: 3673600 | consumed tokens: 7523532800 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.241264E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.216 | TFLOPs: 31.86 | +7: iteration 14360/ 173500 | consumed samples: 3676160 | consumed tokens: 7528775680 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.248513E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.233 | TFLOPs: 31.86 | +7: iteration 14370/ 173500 | consumed samples: 3678720 | consumed tokens: 7534018560 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.253020E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.774 | TFLOPs: 31.84 | +7: iteration 14380/ 173500 | consumed samples: 3681280 | consumed tokens: 7539261440 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.237646E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.424 | TFLOPs: 31.87 | +7: iteration 14390/ 173500 | consumed samples: 3683840 | consumed tokens: 7544504320 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.232740E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.631 | TFLOPs: 31.88 | +7: iteration 14400/ 173500 | consumed samples: 3686400 | consumed tokens: 7549747200 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.244529E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.651 | TFLOPs: 31.88 | +7: iteration 14410/ 173500 | consumed samples: 3688960 | consumed tokens: 7554990080 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.248021E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.071 | TFLOPs: 31.85 | +7: iteration 14420/ 173500 | consumed samples: 3691520 | consumed tokens: 7560232960 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.234376E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.959 | TFLOPs: 31.85 | +7: iteration 14430/ 173500 | consumed samples: 3694080 | consumed tokens: 7565475840 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.246076E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.369 | TFLOPs: 31.87 | +7: iteration 14440/ 173500 | consumed samples: 3696640 | consumed tokens: 7570718720 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.239770E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.313 | TFLOPs: 31.86 | +7: iteration 14450/ 173500 | consumed samples: 3699200 | consumed tokens: 7575961600 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.239239E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.377 | TFLOPs: 31.87 | +7: iteration 14460/ 173500 | consumed samples: 3701760 | consumed tokens: 7581204480 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.242299E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.117 | TFLOPs: 31.85 | +7: iteration 14470/ 173500 | consumed samples: 3704320 | consumed tokens: 7586447360 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.240889E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.479 | TFLOPs: 31.87 | +7: iteration 14480/ 173500 | consumed samples: 3706880 | consumed tokens: 7591690240 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.260484E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.642 | TFLOPs: 31.88 | +7: iteration 14490/ 173500 | consumed samples: 3709440 | consumed tokens: 7596933120 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.236699E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.739 | TFLOPs: 31.89 | +7: iteration 14500/ 173500 | consumed samples: 3712000 | consumed tokens: 7602176000 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.244389E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.883 | TFLOPs: 31.89 | +7: iteration 14510/ 173500 | consumed samples: 3714560 | consumed tokens: 7607418880 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.248853E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.659 | TFLOPs: 31.88 | +7: iteration 14520/ 173500 | consumed samples: 3717120 | consumed tokens: 7612661760 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.251344E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.604 | TFLOPs: 31.88 | +7: iteration 14530/ 173500 | consumed samples: 3719680 | consumed tokens: 7617904640 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.238770E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.657 | TFLOPs: 31.88 | +7: iteration 14540/ 173500 | consumed samples: 3722240 | consumed tokens: 7623147520 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.226836E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.886 | TFLOPs: 31.89 | +7: iteration 14550/ 173500 | consumed samples: 3724800 | consumed tokens: 7628390400 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.248895E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.618 | TFLOPs: 31.88 | +7: iteration 14560/ 173500 | consumed samples: 3727360 | consumed tokens: 7633633280 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.241885E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.598 | TFLOPs: 31.88 | +7: iteration 14570/ 173500 | consumed samples: 3729920 | consumed tokens: 7638876160 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.245159E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.732 | TFLOPs: 31.89 | +7: iteration 14580/ 173500 | consumed samples: 3732480 | consumed tokens: 7644119040 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.238173E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.427 | TFLOPs: 31.87 | +7: iteration 14590/ 173500 | consumed samples: 3735040 | consumed tokens: 7649361920 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.242214E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.871 | TFLOPs: 31.89 | +7: iteration 14600/ 173500 | consumed samples: 3737600 | consumed tokens: 7654604800 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.226075E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.044 | TFLOPs: 31.90 | +7: iteration 14610/ 173500 | consumed samples: 3740160 | consumed tokens: 7659847680 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.254531E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.216 | TFLOPs: 31.91 | +7: iteration 14620/ 173500 | consumed samples: 3742720 | consumed tokens: 7665090560 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.235047E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.069 | TFLOPs: 31.90 | +7: iteration 14630/ 173500 | consumed samples: 3745280 | consumed tokens: 7670333440 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.229324E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.987 | TFLOPs: 31.90 | +7: iteration 14640/ 173500 | consumed samples: 3747840 | consumed tokens: 7675576320 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.239394E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.086 | TFLOPs: 31.91 | +7: iteration 14650/ 173500 | consumed samples: 3750400 | consumed tokens: 7680819200 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.229820E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.850 | TFLOPs: 31.89 | +7: iteration 14660/ 173500 | consumed samples: 3752960 | consumed tokens: 7686062080 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.246658E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.274 | TFLOPs: 31.92 | +7: iteration 14670/ 173500 | consumed samples: 3755520 | consumed tokens: 7691304960 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.234314E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.960 | TFLOPs: 31.90 | +7: iteration 14680/ 173500 | consumed samples: 3758080 | consumed tokens: 7696547840 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.244238E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.002 | TFLOPs: 31.90 | +7: iteration 14690/ 173500 | consumed samples: 3760640 | consumed tokens: 7701790720 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.234356E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.908 | TFLOPs: 31.90 | +7: iteration 14700/ 173500 | consumed samples: 3763200 | consumed tokens: 7707033600 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.241597E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.037 | TFLOPs: 31.90 | +7: iteration 14710/ 173500 | consumed samples: 3765760 | consumed tokens: 7712276480 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.243653E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.315 | TFLOPs: 31.92 | +7: iteration 14720/ 173500 | consumed samples: 3768320 | consumed tokens: 7717519360 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.225259E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.087 | TFLOPs: 31.91 | +7: iteration 14730/ 173500 | consumed samples: 3770880 | consumed tokens: 7722762240 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.218314E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.177 | TFLOPs: 31.86 | +7: iteration 14740/ 173500 | consumed samples: 3773440 | consumed tokens: 7728005120 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.241409E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.936 | TFLOPs: 31.95 | +7: iteration 14750/ 173500 | consumed samples: 3776000 | consumed tokens: 7733248000 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.239585E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.677 | TFLOPs: 31.94 | +7: iteration 14760/ 173500 | consumed samples: 3778560 | consumed tokens: 7738490880 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.247297E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.512 | TFLOPs: 31.93 | +7: iteration 14770/ 173500 | consumed samples: 3781120 | consumed tokens: 7743733760 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.254712E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.734 | TFLOPs: 31.94 | +7: iteration 14780/ 173500 | consumed samples: 3783680 | consumed tokens: 7748976640 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.237371E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.161 | TFLOPs: 31.91 | +7: iteration 14790/ 173500 | consumed samples: 3786240 | consumed tokens: 7754219520 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.240233E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.838 | TFLOPs: 31.94 | +7: iteration 14800/ 173500 | consumed samples: 3788800 | consumed tokens: 7759462400 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.241675E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.327 | TFLOPs: 31.92 | +7: iteration 14810/ 173500 | consumed samples: 3791360 | consumed tokens: 7764705280 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.239550E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.627 | TFLOPs: 31.93 | +7: iteration 14820/ 173500 | consumed samples: 3793920 | consumed tokens: 7769948160 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.233974E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.459 | TFLOPs: 31.92 | +7: iteration 14830/ 173500 | consumed samples: 3796480 | consumed tokens: 7775191040 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.245720E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.068 | TFLOPs: 31.90 | +7: iteration 14840/ 173500 | consumed samples: 3799040 | consumed tokens: 7780433920 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.237106E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.402 | TFLOPs: 31.92 | +7: iteration 14850/ 173500 | consumed samples: 3801600 | consumed tokens: 7785676800 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.238317E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.353 | TFLOPs: 31.92 | +7: iteration 14860/ 173500 | consumed samples: 3804160 | consumed tokens: 7790919680 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.234924E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.710 | TFLOPs: 31.94 | +7: iteration 14870/ 173500 | consumed samples: 3806720 | consumed tokens: 7796162560 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.238404E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.746 | TFLOPs: 31.94 | +7: iteration 14880/ 173500 | consumed samples: 3809280 | consumed tokens: 7801405440 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.228759E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.586 | TFLOPs: 31.93 | +7: iteration 14890/ 173500 | consumed samples: 3811840 | consumed tokens: 7806648320 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.228675E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.472 | TFLOPs: 31.93 | +7: iteration 14900/ 173500 | consumed samples: 3814400 | consumed tokens: 7811891200 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.251089E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.710 | TFLOPs: 31.94 | +7: iteration 14910/ 173500 | consumed samples: 3816960 | consumed tokens: 7817134080 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.244951E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.777 | TFLOPs: 31.94 | +7: iteration 14920/ 173500 | consumed samples: 3819520 | consumed tokens: 7822376960 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.233654E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.650 | TFLOPs: 31.93 | +7: iteration 14930/ 173500 | consumed samples: 3822080 | consumed tokens: 7827619840 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.236538E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.017 | TFLOPs: 31.95 | +7: iteration 14940/ 173500 | consumed samples: 3824640 | consumed tokens: 7832862720 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.234020E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.986 | TFLOPs: 31.95 | +7: iteration 14950/ 173500 | consumed samples: 3827200 | consumed tokens: 7838105600 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.233076E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.334 | TFLOPs: 31.97 | +7: iteration 14960/ 173500 | consumed samples: 3829760 | consumed tokens: 7843348480 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.233221E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.994 | TFLOPs: 31.95 | +7: iteration 14970/ 173500 | consumed samples: 3832320 | consumed tokens: 7848591360 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.234639E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.226 | TFLOPs: 31.97 | +7: iteration 14980/ 173500 | consumed samples: 3834880 | consumed tokens: 7853834240 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.220347E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.191 | TFLOPs: 31.96 | +7: iteration 14990/ 173500 | consumed samples: 3837440 | consumed tokens: 7859077120 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.234233E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.364 | TFLOPs: 31.97 | +7: iteration 15000/ 173500 | consumed samples: 3840000 | consumed tokens: 7864320000 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.228431E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.447 | TFLOPs: 31.98 | +7: iteration 15010/ 173500 | consumed samples: 3842560 | consumed tokens: 7869562880 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.234782E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.397 | TFLOPs: 31.97 | +7: iteration 15020/ 173500 | consumed samples: 3845120 | consumed tokens: 7874805760 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.236044E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.109 | TFLOPs: 31.70 | +7: iteration 15030/ 173500 | consumed samples: 3847680 | consumed tokens: 7880048640 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.223360E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.489 | TFLOPs: 31.98 | +7: iteration 15040/ 173500 | consumed samples: 3850240 | consumed tokens: 7885291520 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.234225E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.502 | TFLOPs: 31.98 | +7: iteration 15050/ 173500 | consumed samples: 3852800 | consumed tokens: 7890534400 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.225213E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.460 | TFLOPs: 31.98 | +7: iteration 15060/ 173500 | consumed samples: 3855360 | consumed tokens: 7895777280 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.241504E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.743 | TFLOPs: 31.99 | +7: iteration 15070/ 173500 | consumed samples: 3857920 | consumed tokens: 7901020160 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.243555E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.202 | TFLOPs: 31.96 | +7: iteration 15080/ 173500 | consumed samples: 3860480 | consumed tokens: 7906263040 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.242559E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.235 | TFLOPs: 31.97 | +7: iteration 15090/ 173500 | consumed samples: 3863040 | consumed tokens: 7911505920 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.225247E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.529 | TFLOPs: 31.93 | +7: iteration 15100/ 173500 | consumed samples: 3865600 | consumed tokens: 7916748800 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.239298E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.941 | TFLOPs: 31.90 | +7: iteration 15110/ 173500 | consumed samples: 3868160 | consumed tokens: 7921991680 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.253536E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.049 | TFLOPs: 31.90 | +7: iteration 15120/ 173500 | consumed samples: 3870720 | consumed tokens: 7927234560 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.255552E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.953 | TFLOPs: 31.90 | +7: iteration 15130/ 173500 | consumed samples: 3873280 | consumed tokens: 7932477440 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.237546E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.390 | TFLOPs: 31.92 | +7: iteration 15140/ 173500 | consumed samples: 3875840 | consumed tokens: 7937720320 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.219420E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.181 | TFLOPs: 31.91 | +7: iteration 15150/ 173500 | consumed samples: 3878400 | consumed tokens: 7942963200 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.235293E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.021 | TFLOPs: 31.90 | +7: iteration 15160/ 173500 | consumed samples: 3880960 | consumed tokens: 7948206080 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.235839E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.407 | TFLOPs: 31.92 | +7: iteration 15170/ 173500 | consumed samples: 3883520 | consumed tokens: 7953448960 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.238548E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.292 | TFLOPs: 31.92 | +7: iteration 15180/ 173500 | consumed samples: 3886080 | consumed tokens: 7958691840 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.227925E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.530 | TFLOPs: 31.93 | +7: iteration 15190/ 173500 | consumed samples: 3888640 | consumed tokens: 7963934720 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.235017E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.490 | TFLOPs: 31.93 | +7: iteration 15200/ 173500 | consumed samples: 3891200 | consumed tokens: 7969177600 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.234230E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.859 | TFLOPs: 31.95 | +7: iteration 15210/ 173500 | consumed samples: 3893760 | consumed tokens: 7974420480 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.238811E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.101 | TFLOPs: 31.96 | +7: iteration 15220/ 173500 | consumed samples: 3896320 | consumed tokens: 7979663360 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.242408E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.657 | TFLOPs: 31.99 | +7: iteration 15230/ 173500 | consumed samples: 3898880 | consumed tokens: 7984906240 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.222604E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.462 | TFLOPs: 31.98 | +7: iteration 15240/ 173500 | consumed samples: 3901440 | consumed tokens: 7990149120 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.225581E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.932 | TFLOPs: 31.95 | +7: iteration 15250/ 173500 | consumed samples: 3904000 | consumed tokens: 7995392000 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.242390E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.236 | TFLOPs: 31.97 | +7: iteration 15260/ 173500 | consumed samples: 3906560 | consumed tokens: 8000634880 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.234404E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.600 | TFLOPs: 31.98 | +7: iteration 15270/ 173500 | consumed samples: 3909120 | consumed tokens: 8005877760 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.228869E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.650 | TFLOPs: 31.99 | +7: iteration 15280/ 173500 | consumed samples: 3911680 | consumed tokens: 8011120640 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.216304E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.456 | TFLOPs: 31.98 | +7: iteration 15290/ 173500 | consumed samples: 3914240 | consumed tokens: 8016363520 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.224427E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.613 | TFLOPs: 31.99 | +7: iteration 15300/ 173500 | consumed samples: 3916800 | consumed tokens: 8021606400 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.216286E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.453 | TFLOPs: 31.98 | +7: iteration 15310/ 173500 | consumed samples: 3919360 | consumed tokens: 8026849280 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.232866E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.191 | TFLOPs: 31.96 | +7: iteration 15320/ 173500 | consumed samples: 3921920 | consumed tokens: 8032092160 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.226978E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.243 | TFLOPs: 31.97 | +7: iteration 15330/ 173500 | consumed samples: 3924480 | consumed tokens: 8037335040 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.220982E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.188 | TFLOPs: 31.91 | +7: iteration 15340/ 173500 | consumed samples: 3927040 | consumed tokens: 8042577920 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.222653E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.530 | TFLOPs: 31.98 | +7: iteration 15350/ 173500 | consumed samples: 3929600 | consumed tokens: 8047820800 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.232582E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.305 | TFLOPs: 31.97 | +7: iteration 15360/ 173500 | consumed samples: 3932160 | consumed tokens: 8053063680 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.221078E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.315 | TFLOPs: 31.97 | +7: iteration 15370/ 173500 | consumed samples: 3934720 | consumed tokens: 8058306560 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.228628E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.005 | TFLOPs: 31.85 | +7: iteration 15380/ 173500 | consumed samples: 3937280 | consumed tokens: 8063549440 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.223085E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.325 | TFLOPs: 31.97 | +7: iteration 15390/ 173500 | consumed samples: 3939840 | consumed tokens: 8068792320 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.213703E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.221 | TFLOPs: 31.96 | +7: iteration 15400/ 173500 | consumed samples: 3942400 | consumed tokens: 8074035200 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.228228E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.201 | TFLOPs: 31.96 | +7: iteration 15410/ 173500 | consumed samples: 3944960 | consumed tokens: 8079278080 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.227549E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.959 | TFLOPs: 31.95 | +7: iteration 15420/ 173500 | consumed samples: 3947520 | consumed tokens: 8084520960 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.229657E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.193 | TFLOPs: 31.96 | +7: iteration 15430/ 173500 | consumed samples: 3950080 | consumed tokens: 8089763840 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.215802E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.201 | TFLOPs: 31.96 | +7: iteration 15440/ 173500 | consumed samples: 3952640 | consumed tokens: 8095006720 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.232429E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.889 | TFLOPs: 31.95 | +7: iteration 15450/ 173500 | consumed samples: 3955200 | consumed tokens: 8100249600 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.226382E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.398 | TFLOPs: 31.97 | +7: iteration 15460/ 173500 | consumed samples: 3957760 | consumed tokens: 8105492480 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.219839E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.131 | TFLOPs: 31.96 | +7: iteration 15470/ 173500 | consumed samples: 3960320 | consumed tokens: 8110735360 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.233923E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.981 | TFLOPs: 31.95 | +7: iteration 15480/ 173500 | consumed samples: 3962880 | consumed tokens: 8115978240 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.230676E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.301 | TFLOPs: 31.97 | +7: iteration 15490/ 173500 | consumed samples: 3965440 | consumed tokens: 8121221120 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.234772E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.889 | TFLOPs: 31.95 | +7: iteration 15500/ 173500 | consumed samples: 3968000 | consumed tokens: 8126464000 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.222812E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.030 | TFLOPs: 31.95 | +7: iteration 15510/ 173500 | consumed samples: 3970560 | consumed tokens: 8131706880 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.236643E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.925 | TFLOPs: 31.95 | +7: iteration 15520/ 173500 | consumed samples: 3973120 | consumed tokens: 8136949760 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.234303E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.011 | TFLOPs: 31.95 | +7: iteration 15530/ 173500 | consumed samples: 3975680 | consumed tokens: 8142192640 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.219658E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.433 | TFLOPs: 31.98 | +7: iteration 15540/ 173500 | consumed samples: 3978240 | consumed tokens: 8147435520 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.223964E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.905 | TFLOPs: 31.95 | +7: iteration 15550/ 173500 | consumed samples: 3980800 | consumed tokens: 8152678400 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.236560E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.916 | TFLOPs: 31.95 | +7: iteration 15560/ 173500 | consumed samples: 3983360 | consumed tokens: 8157921280 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.223638E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.299 | TFLOPs: 31.97 | +7: iteration 15570/ 173500 | consumed samples: 3985920 | consumed tokens: 8163164160 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.219553E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.808 | TFLOPs: 31.94 | +7: iteration 15580/ 173500 | consumed samples: 3988480 | consumed tokens: 8168407040 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.237069E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.099 | TFLOPs: 31.96 | +7: iteration 15590/ 173500 | consumed samples: 3991040 | consumed tokens: 8173649920 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.233720E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.164 | TFLOPs: 31.96 | +7: iteration 15600/ 173500 | consumed samples: 3993600 | consumed tokens: 8178892800 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.213174E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.987 | TFLOPs: 31.95 | +7: iteration 15610/ 173500 | consumed samples: 3996160 | consumed tokens: 8184135680 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.213353E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.193 | TFLOPs: 31.96 | +7: iteration 15620/ 173500 | consumed samples: 3998720 | consumed tokens: 8189378560 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.204753E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.007 | TFLOPs: 31.95 | +7: iteration 15630/ 173500 | consumed samples: 4001280 | consumed tokens: 8194621440 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.221009E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.253 | TFLOPs: 31.97 | +7: iteration 15640/ 173500 | consumed samples: 4003840 | consumed tokens: 8199864320 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.226714E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.317 | TFLOPs: 31.97 | +7: iteration 15650/ 173500 | consumed samples: 4006400 | consumed tokens: 8205107200 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.209209E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.662 | TFLOPs: 31.94 | +7: iteration 15660/ 173500 | consumed samples: 4008960 | consumed tokens: 8210350080 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.213651E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.547 | TFLOPs: 31.93 | +7: iteration 15670/ 173500 | consumed samples: 4011520 | consumed tokens: 8215592960 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.220193E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.436 | TFLOPs: 31.92 | +7: iteration 15680/ 173500 | consumed samples: 4014080 | consumed tokens: 8220835840 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.227608E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.436 | TFLOPs: 31.92 | +7: iteration 15690/ 173500 | consumed samples: 4016640 | consumed tokens: 8226078720 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.208611E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.973 | TFLOPs: 31.95 | +7: iteration 15700/ 173500 | consumed samples: 4019200 | consumed tokens: 8231321600 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.234721E+00 | grad norm: 4.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.806 | TFLOPs: 31.89 | +7: iteration 15710/ 173500 | consumed samples: 4021760 | consumed tokens: 8236564480 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.461452E+00 | grad norm: 2.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.499 | TFLOPs: 31.82 | +7: iteration 15720/ 173500 | consumed samples: 4024320 | consumed tokens: 8241807360 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.369483E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.882 | TFLOPs: 31.84 | +7: iteration 15730/ 173500 | consumed samples: 4026880 | consumed tokens: 8247050240 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.285765E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.319 | TFLOPs: 31.92 | +7: iteration 15740/ 173500 | consumed samples: 4029440 | consumed tokens: 8252293120 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.254613E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.410 | TFLOPs: 31.92 | +7: iteration 15750/ 173500 | consumed samples: 4032000 | consumed tokens: 8257536000 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.252234E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.257 | TFLOPs: 31.91 | +7: iteration 15760/ 173500 | consumed samples: 4034560 | consumed tokens: 8262778880 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.227260E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.219 | TFLOPs: 31.96 | +7: iteration 15770/ 173500 | consumed samples: 4037120 | consumed tokens: 8268021760 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.247825E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.027 | TFLOPs: 31.95 | +7: iteration 15780/ 173500 | consumed samples: 4039680 | consumed tokens: 8273264640 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.235120E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.922 | TFLOPs: 31.95 | +7: iteration 15790/ 173500 | consumed samples: 4042240 | consumed tokens: 8278507520 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.235457E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.060 | TFLOPs: 31.96 | +7: iteration 15800/ 173500 | consumed samples: 4044800 | consumed tokens: 8283750400 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.229287E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.427 | TFLOPs: 31.98 | +7: iteration 15810/ 173500 | consumed samples: 4047360 | consumed tokens: 8288993280 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.246553E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.341 | TFLOPs: 31.97 | +7: iteration 15820/ 173500 | consumed samples: 4049920 | consumed tokens: 8294236160 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.237667E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.397 | TFLOPs: 31.97 | +7: iteration 15830/ 173500 | consumed samples: 4052480 | consumed tokens: 8299479040 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.223775E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.582 | TFLOPs: 31.98 | +7: iteration 15840/ 173500 | consumed samples: 4055040 | consumed tokens: 8304721920 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.227127E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.543 | TFLOPs: 31.98 | +7: iteration 15850/ 173500 | consumed samples: 4057600 | consumed tokens: 8309964800 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.215048E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.201 | TFLOPs: 31.96 | +7: iteration 15860/ 173500 | consumed samples: 4060160 | consumed tokens: 8315207680 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.238063E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.167 | TFLOPs: 31.96 | +7: iteration 15870/ 173500 | consumed samples: 4062720 | consumed tokens: 8320450560 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.242072E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.567 | TFLOPs: 31.93 | +7: iteration 15880/ 173500 | consumed samples: 4065280 | consumed tokens: 8325693440 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.232321E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.300 | TFLOPs: 31.92 | +7: iteration 15890/ 173500 | consumed samples: 4067840 | consumed tokens: 8330936320 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.206824E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.496 | TFLOPs: 31.93 | +7: iteration 15900/ 173500 | consumed samples: 4070400 | consumed tokens: 8336179200 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.235277E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.473 | TFLOPs: 31.93 | +7: iteration 15910/ 173500 | consumed samples: 4072960 | consumed tokens: 8341422080 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.226931E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.080 | TFLOPs: 31.96 | +7: iteration 15920/ 173500 | consumed samples: 4075520 | consumed tokens: 8346664960 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.211029E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.809 | TFLOPs: 31.89 | +7: iteration 15930/ 173500 | consumed samples: 4078080 | consumed tokens: 8351907840 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.221099E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.100 | TFLOPs: 31.96 | +7: iteration 15940/ 173500 | consumed samples: 4080640 | consumed tokens: 8357150720 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.213467E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.167 | TFLOPs: 31.96 | +7: iteration 15950/ 173500 | consumed samples: 4083200 | consumed tokens: 8362393600 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.219902E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.185 | TFLOPs: 31.96 | +7: iteration 15960/ 173500 | consumed samples: 4085760 | consumed tokens: 8367636480 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.210166E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.911 | TFLOPs: 31.95 | +7: iteration 15970/ 173500 | consumed samples: 4088320 | consumed tokens: 8372879360 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.218036E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.967 | TFLOPs: 31.95 | +7: iteration 15980/ 173500 | consumed samples: 4090880 | consumed tokens: 8378122240 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.232990E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.484 | TFLOPs: 31.30 | +7: iteration 15990/ 173500 | consumed samples: 4093440 | consumed tokens: 8383365120 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.213823E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.154 | TFLOPs: 31.75 | +0: [2023-03-17 01:05:41,842] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=0, lr=[0.00019695408064628468, 0.00019695408064628468, 0.00019695408064628468], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 16000/ 173500 | consumed samples: 4096000 | consumed tokens: 8388608000 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.221842E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.425 | TFLOPs: 31.77 | +0: steps: 16000 loss: 3.2453 iter time (s): 0.419 samples/sec: 611.617 +7: iteration 16010/ 173500 | consumed samples: 4098560 | consumed tokens: 8393850880 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.227373E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.290 | TFLOPs: 31.34 | +7: iteration 16020/ 173500 | consumed samples: 4101120 | consumed tokens: 8399093760 | elapsed time per iteration (s): 0.44 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.227148E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.981 | TFLOPs: 30.85 | +7: iteration 16030/ 173500 | consumed samples: 4103680 | consumed tokens: 8404336640 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.225146E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.333 | TFLOPs: 31.29 | +7: iteration 16040/ 173500 | consumed samples: 4106240 | consumed tokens: 8409579520 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.218740E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.324 | TFLOPs: 30.92 | +7: iteration 16050/ 173500 | consumed samples: 4108800 | consumed tokens: 8414822400 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.218980E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.476 | TFLOPs: 31.35 | +7: iteration 16060/ 173500 | consumed samples: 4111360 | consumed tokens: 8420065280 | elapsed time per iteration (s): 0.44 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.221566E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.680 | TFLOPs: 30.73 | +7: iteration 16070/ 173500 | consumed samples: 4113920 | consumed tokens: 8425308160 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.219478E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.246 | TFLOPs: 31.39 | +7: iteration 16080/ 173500 | consumed samples: 4116480 | consumed tokens: 8430551040 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.229863E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.345 | TFLOPs: 31.55 | +7: iteration 16090/ 173500 | consumed samples: 4119040 | consumed tokens: 8435793920 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.229782E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.080 | TFLOPs: 31.54 | +7: iteration 16100/ 173500 | consumed samples: 4121600 | consumed tokens: 8441036800 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.218472E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.671 | TFLOPs: 31.78 | +7: iteration 16110/ 173500 | consumed samples: 4124160 | consumed tokens: 8446279680 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.230313E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.213 | TFLOPs: 31.81 | +7: iteration 16120/ 173500 | consumed samples: 4126720 | consumed tokens: 8451522560 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.231386E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.970 | TFLOPs: 31.79 | +7: iteration 16130/ 173500 | consumed samples: 4129280 | consumed tokens: 8456765440 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.225919E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.875 | TFLOPs: 32.00 | +7: iteration 16140/ 173500 | consumed samples: 4131840 | consumed tokens: 8462008320 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.204848E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.350 | TFLOPs: 31.97 | +7: iteration 16150/ 173500 | consumed samples: 4134400 | consumed tokens: 8467251200 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.213845E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.544 | TFLOPs: 31.98 | +7: iteration 16160/ 173500 | consumed samples: 4136960 | consumed tokens: 8472494080 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.226847E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.617 | TFLOPs: 31.99 | +7: iteration 16170/ 173500 | consumed samples: 4139520 | consumed tokens: 8477736960 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.226995E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.099 | TFLOPs: 31.96 | +7: iteration 16180/ 173500 | consumed samples: 4142080 | consumed tokens: 8482979840 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.214874E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.267 | TFLOPs: 31.97 | +7: iteration 16190/ 173500 | consumed samples: 4144640 | consumed tokens: 8488222720 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.212774E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.033 | TFLOPs: 31.95 | +7: iteration 16200/ 173500 | consumed samples: 4147200 | consumed tokens: 8493465600 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.217963E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.829 | TFLOPs: 31.94 | +7: iteration 16210/ 173500 | consumed samples: 4149760 | consumed tokens: 8498708480 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.206310E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.253 | TFLOPs: 31.97 | +7: iteration 16220/ 173500 | consumed samples: 4152320 | consumed tokens: 8503951360 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.205613E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.292 | TFLOPs: 31.97 | +7: iteration 16230/ 173500 | consumed samples: 4154880 | consumed tokens: 8509194240 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.208007E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.114 | TFLOPs: 31.96 | +7: iteration 16240/ 173500 | consumed samples: 4157440 | consumed tokens: 8514437120 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.223167E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.185 | TFLOPs: 31.96 | +7: iteration 16250/ 173500 | consumed samples: 4160000 | consumed tokens: 8519680000 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.206609E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.051 | TFLOPs: 31.96 | +7: iteration 16260/ 173500 | consumed samples: 4162560 | consumed tokens: 8524922880 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.218960E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.911 | TFLOPs: 31.95 | +7: iteration 16270/ 173500 | consumed samples: 4165120 | consumed tokens: 8530165760 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.225106E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.358 | TFLOPs: 31.97 | +7: iteration 16280/ 173500 | consumed samples: 4167680 | consumed tokens: 8535408640 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.212035E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.834 | TFLOPs: 31.94 | +7: iteration 16290/ 173500 | consumed samples: 4170240 | consumed tokens: 8540651520 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.209679E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.682 | TFLOPs: 31.94 | +7: iteration 16300/ 173500 | consumed samples: 4172800 | consumed tokens: 8545894400 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.215735E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.561 | TFLOPs: 31.93 | +7: iteration 16310/ 173500 | consumed samples: 4175360 | consumed tokens: 8551137280 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.214969E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.729 | TFLOPs: 31.94 | +7: iteration 16320/ 173500 | consumed samples: 4177920 | consumed tokens: 8556380160 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.205554E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.723 | TFLOPs: 31.94 | +7: iteration 16330/ 173500 | consumed samples: 4180480 | consumed tokens: 8561623040 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.197112E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.707 | TFLOPs: 31.94 | +7: iteration 16340/ 173500 | consumed samples: 4183040 | consumed tokens: 8566865920 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.206189E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.555 | TFLOPs: 31.93 | +7: iteration 16350/ 173500 | consumed samples: 4185600 | consumed tokens: 8572108800 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.198699E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.826 | TFLOPs: 31.94 | +7: iteration 16360/ 173500 | consumed samples: 4188160 | consumed tokens: 8577351680 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.203547E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.724 | TFLOPs: 31.94 | +7: iteration 16370/ 173500 | consumed samples: 4190720 | consumed tokens: 8582594560 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.209245E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.651 | TFLOPs: 31.93 | +7: iteration 16380/ 173500 | consumed samples: 4193280 | consumed tokens: 8587837440 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.214742E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.854 | TFLOPs: 31.95 | +7: iteration 16390/ 173500 | consumed samples: 4195840 | consumed tokens: 8593080320 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.216557E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.518 | TFLOPs: 31.93 | +7: iteration 16400/ 173500 | consumed samples: 4198400 | consumed tokens: 8598323200 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.207257E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.263 | TFLOPs: 31.91 | +7: iteration 16410/ 173500 | consumed samples: 4200960 | consumed tokens: 8603566080 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.206335E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.981 | TFLOPs: 31.90 | +7: iteration 16420/ 173500 | consumed samples: 4203520 | consumed tokens: 8608808960 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.217495E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.375 | TFLOPs: 31.92 | +7: iteration 16430/ 173500 | consumed samples: 4206080 | consumed tokens: 8614051840 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.200287E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.303 | TFLOPs: 31.92 | +7: iteration 16440/ 173500 | consumed samples: 4208640 | consumed tokens: 8619294720 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.219281E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.124 | TFLOPs: 31.91 | +7: iteration 16450/ 173500 | consumed samples: 4211200 | consumed tokens: 8624537600 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.198035E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.914 | TFLOPs: 31.90 | +7: iteration 16460/ 173500 | consumed samples: 4213760 | consumed tokens: 8629780480 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.212359E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.830 | TFLOPs: 31.89 | +7: iteration 16470/ 173500 | consumed samples: 4216320 | consumed tokens: 8635023360 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.232506E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.920 | TFLOPs: 31.90 | +7: iteration 16480/ 173500 | consumed samples: 4218880 | consumed tokens: 8640266240 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.221759E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.107 | TFLOPs: 31.91 | +7: iteration 16490/ 173500 | consumed samples: 4221440 | consumed tokens: 8645509120 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.227111E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.001 | TFLOPs: 31.90 | +7: iteration 16500/ 173500 | consumed samples: 4224000 | consumed tokens: 8650752000 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.209418E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.153 | TFLOPs: 31.91 | +7: iteration 16510/ 173500 | consumed samples: 4226560 | consumed tokens: 8655994880 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.212374E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.169 | TFLOPs: 31.91 | +7: iteration 16520/ 173500 | consumed samples: 4229120 | consumed tokens: 8661237760 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.207064E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.642 | TFLOPs: 31.88 | +7: iteration 16530/ 173500 | consumed samples: 4231680 | consumed tokens: 8666480640 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.212804E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.239 | TFLOPs: 31.91 | +7: iteration 16540/ 173500 | consumed samples: 4234240 | consumed tokens: 8671723520 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.206069E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.478 | TFLOPs: 31.93 | +7: iteration 16550/ 173500 | consumed samples: 4236800 | consumed tokens: 8676966400 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.218232E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.397 | TFLOPs: 31.92 | +7: iteration 16560/ 173500 | consumed samples: 4239360 | consumed tokens: 8682209280 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.215387E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.343 | TFLOPs: 31.92 | +7: iteration 16570/ 173500 | consumed samples: 4241920 | consumed tokens: 8687452160 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.225419E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.548 | TFLOPs: 31.93 | +7: iteration 16580/ 173500 | consumed samples: 4244480 | consumed tokens: 8692695040 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.220583E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.915 | TFLOPs: 31.90 | +7: iteration 16590/ 173500 | consumed samples: 4247040 | consumed tokens: 8697937920 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.231574E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.504 | TFLOPs: 31.93 | +7: iteration 16600/ 173500 | consumed samples: 4249600 | consumed tokens: 8703180800 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.206665E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.325 | TFLOPs: 31.92 | +7: iteration 16610/ 173500 | consumed samples: 4252160 | consumed tokens: 8708423680 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.202504E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.102 | TFLOPs: 31.91 | +7: iteration 16620/ 173500 | consumed samples: 4254720 | consumed tokens: 8713666560 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.208759E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.322 | TFLOPs: 31.92 | +7: iteration 16630/ 173500 | consumed samples: 4257280 | consumed tokens: 8718909440 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.215276E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.162 | TFLOPs: 31.91 | +7: iteration 16640/ 173500 | consumed samples: 4259840 | consumed tokens: 8724152320 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.203685E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.636 | TFLOPs: 31.88 | +7: iteration 16650/ 173500 | consumed samples: 4262400 | consumed tokens: 8729395200 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.211083E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.919 | TFLOPs: 31.90 | +7: iteration 16660/ 173500 | consumed samples: 4264960 | consumed tokens: 8734638080 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.217866E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.126 | TFLOPs: 31.91 | +7: iteration 16670/ 173500 | consumed samples: 4267520 | consumed tokens: 8739880960 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.202464E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.962 | TFLOPs: 31.90 | +7: iteration 16680/ 173500 | consumed samples: 4270080 | consumed tokens: 8745123840 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.206588E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.118 | TFLOPs: 31.91 | +7: iteration 16690/ 173500 | consumed samples: 4272640 | consumed tokens: 8750366720 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.203990E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.035 | TFLOPs: 31.90 | +7: iteration 16700/ 173500 | consumed samples: 4275200 | consumed tokens: 8755609600 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.203793E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.390 | TFLOPs: 31.92 | +7: iteration 16710/ 173500 | consumed samples: 4277760 | consumed tokens: 8760852480 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.206493E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.401 | TFLOPs: 31.92 | +7: iteration 16720/ 173500 | consumed samples: 4280320 | consumed tokens: 8766095360 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.235347E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.319 | TFLOPs: 31.92 | +7: iteration 16730/ 173500 | consumed samples: 4282880 | consumed tokens: 8771338240 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.214325E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.563 | TFLOPs: 31.93 | +7: iteration 16740/ 173500 | consumed samples: 4285440 | consumed tokens: 8776581120 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.213672E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.434 | TFLOPs: 31.92 | +7: iteration 16750/ 173500 | consumed samples: 4288000 | consumed tokens: 8781824000 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.210471E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.374 | TFLOPs: 31.92 | +7: iteration 16760/ 173500 | consumed samples: 4290560 | consumed tokens: 8787066880 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.200991E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.140 | TFLOPs: 31.91 | +7: iteration 16770/ 173500 | consumed samples: 4293120 | consumed tokens: 8792309760 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.207231E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.362 | TFLOPs: 31.92 | +7: iteration 16780/ 173500 | consumed samples: 4295680 | consumed tokens: 8797552640 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.223720E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.133 | TFLOPs: 31.91 | +7: iteration 16790/ 173500 | consumed samples: 4298240 | consumed tokens: 8802795520 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.192927E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.572 | TFLOPs: 31.88 | +7: iteration 16800/ 173500 | consumed samples: 4300800 | consumed tokens: 8808038400 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.224628E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.488 | TFLOPs: 31.87 | +7: iteration 16810/ 173500 | consumed samples: 4303360 | consumed tokens: 8813281280 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.199231E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.715 | TFLOPs: 31.89 | +7: iteration 16820/ 173500 | consumed samples: 4305920 | consumed tokens: 8818524160 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.207751E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.959 | TFLOPs: 31.90 | +7: iteration 16830/ 173500 | consumed samples: 4308480 | consumed tokens: 8823767040 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.205721E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.451 | TFLOPs: 31.87 | +7: iteration 16840/ 173500 | consumed samples: 4311040 | consumed tokens: 8829009920 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.208950E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.892 | TFLOPs: 31.90 | +7: iteration 16850/ 173500 | consumed samples: 4313600 | consumed tokens: 8834252800 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.217048E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.895 | TFLOPs: 31.90 | +7: iteration 16860/ 173500 | consumed samples: 4316160 | consumed tokens: 8839495680 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.207009E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.911 | TFLOPs: 31.90 | +7: iteration 16870/ 173500 | consumed samples: 4318720 | consumed tokens: 8844738560 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.200627E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.933 | TFLOPs: 31.90 | +7: iteration 16880/ 173500 | consumed samples: 4321280 | consumed tokens: 8849981440 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.206064E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.624 | TFLOPs: 31.88 | +7: iteration 16890/ 173500 | consumed samples: 4323840 | consumed tokens: 8855224320 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.209906E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.232 | TFLOPs: 31.91 | +7: iteration 16900/ 173500 | consumed samples: 4326400 | consumed tokens: 8860467200 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.218502E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.887 | TFLOPs: 31.89 | +7: iteration 16910/ 173500 | consumed samples: 4328960 | consumed tokens: 8865710080 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.212076E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.701 | TFLOPs: 31.89 | +7: iteration 16920/ 173500 | consumed samples: 4331520 | consumed tokens: 8870952960 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.202990E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.683 | TFLOPs: 31.88 | +7: iteration 16930/ 173500 | consumed samples: 4334080 | consumed tokens: 8876195840 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.207528E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.900 | TFLOPs: 31.90 | +7: iteration 16940/ 173500 | consumed samples: 4336640 | consumed tokens: 8881438720 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.188095E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.638 | TFLOPs: 31.88 | +7: iteration 16950/ 173500 | consumed samples: 4339200 | consumed tokens: 8886681600 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.203751E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.987 | TFLOPs: 31.90 | +7: iteration 16960/ 173500 | consumed samples: 4341760 | consumed tokens: 8891924480 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.209807E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.411 | TFLOPs: 31.87 | +7: iteration 16970/ 173500 | consumed samples: 4344320 | consumed tokens: 8897167360 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.219471E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.578 | TFLOPs: 31.88 | +7: iteration 16980/ 173500 | consumed samples: 4346880 | consumed tokens: 8902410240 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.195027E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.154 | TFLOPs: 31.91 | +7: iteration 16990/ 173500 | consumed samples: 4349440 | consumed tokens: 8907653120 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.207186E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.980 | TFLOPs: 31.90 | +7: iteration 17000/ 173500 | consumed samples: 4352000 | consumed tokens: 8912896000 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.204131E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.322 | TFLOPs: 31.92 | +7: iteration 17010/ 173500 | consumed samples: 4354560 | consumed tokens: 8918138880 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.202980E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.351 | TFLOPs: 31.92 | +7: iteration 17020/ 173500 | consumed samples: 4357120 | consumed tokens: 8923381760 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.198568E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.737 | TFLOPs: 31.89 | +7: iteration 17030/ 173500 | consumed samples: 4359680 | consumed tokens: 8928624640 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.191061E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.876 | TFLOPs: 31.89 | +7: iteration 17040/ 173500 | consumed samples: 4362240 | consumed tokens: 8933867520 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.216814E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.911 | TFLOPs: 31.90 | +7: iteration 17050/ 173500 | consumed samples: 4364800 | consumed tokens: 8939110400 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.211650E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.466 | TFLOPs: 31.87 | +7: iteration 17060/ 173500 | consumed samples: 4367360 | consumed tokens: 8944353280 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.196968E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.391 | TFLOPs: 31.87 | +7: iteration 17070/ 173500 | consumed samples: 4369920 | consumed tokens: 8949596160 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.220344E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.883 | TFLOPs: 31.89 | +7: iteration 17080/ 173500 | consumed samples: 4372480 | consumed tokens: 8954839040 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.219755E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.259 | TFLOPs: 31.91 | +7: iteration 17090/ 173500 | consumed samples: 4375040 | consumed tokens: 8960081920 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.205177E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.617 | TFLOPs: 31.88 | +7: iteration 17100/ 173500 | consumed samples: 4377600 | consumed tokens: 8965324800 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.213885E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.819 | TFLOPs: 31.89 | +7: iteration 17110/ 173500 | consumed samples: 4380160 | consumed tokens: 8970567680 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.205985E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.502 | TFLOPs: 31.87 | +7: iteration 17120/ 173500 | consumed samples: 4382720 | consumed tokens: 8975810560 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.208987E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.817 | TFLOPs: 31.89 | +7: iteration 17130/ 173500 | consumed samples: 4385280 | consumed tokens: 8981053440 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.203983E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.845 | TFLOPs: 31.89 | +7: iteration 17140/ 173500 | consumed samples: 4387840 | consumed tokens: 8986296320 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.213317E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.766 | TFLOPs: 31.89 | +7: iteration 17150/ 173500 | consumed samples: 4390400 | consumed tokens: 8991539200 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.180403E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.809 | TFLOPs: 31.94 | +7: iteration 17160/ 173500 | consumed samples: 4392960 | consumed tokens: 8996782080 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.208143E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.468 | TFLOPs: 31.93 | +7: iteration 17170/ 173500 | consumed samples: 4395520 | consumed tokens: 9002024960 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.191646E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.912 | TFLOPs: 31.95 | +7: iteration 17180/ 173500 | consumed samples: 4398080 | consumed tokens: 9007267840 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.217003E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.589 | TFLOPs: 31.93 | +7: iteration 17190/ 173500 | consumed samples: 4400640 | consumed tokens: 9012510720 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.187379E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.928 | TFLOPs: 31.95 | +7: iteration 17200/ 173500 | consumed samples: 4403200 | consumed tokens: 9017753600 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.213255E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.471 | TFLOPs: 31.93 | +7: iteration 17210/ 173500 | consumed samples: 4405760 | consumed tokens: 9022996480 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.211450E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.959 | TFLOPs: 31.95 | +7: iteration 17220/ 173500 | consumed samples: 4408320 | consumed tokens: 9028239360 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.207739E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.808 | TFLOPs: 31.94 | +7: iteration 17230/ 173500 | consumed samples: 4410880 | consumed tokens: 9033482240 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.204037E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.371 | TFLOPs: 31.92 | +7: iteration 17240/ 173500 | consumed samples: 4413440 | consumed tokens: 9038725120 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.214349E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.040 | TFLOPs: 31.90 | +7: iteration 17250/ 173500 | consumed samples: 4416000 | consumed tokens: 9043968000 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.201502E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.327 | TFLOPs: 31.92 | +7: iteration 17260/ 173500 | consumed samples: 4418560 | consumed tokens: 9049210880 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.186488E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.073 | TFLOPs: 31.85 | +7: iteration 17270/ 173500 | consumed samples: 4421120 | consumed tokens: 9054453760 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.216908E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.768 | TFLOPs: 31.94 | +7: iteration 17280/ 173500 | consumed samples: 4423680 | consumed tokens: 9059696640 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.198789E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.000 | TFLOPs: 31.95 | +7: iteration 17290/ 173500 | consumed samples: 4426240 | consumed tokens: 9064939520 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.201298E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.662 | TFLOPs: 31.94 | +7: iteration 17300/ 173500 | consumed samples: 4428800 | consumed tokens: 9070182400 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.196447E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.545 | TFLOPs: 31.93 | +7: iteration 17310/ 173500 | consumed samples: 4431360 | consumed tokens: 9075425280 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.204120E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.302 | TFLOPs: 31.92 | +7: iteration 17320/ 173500 | consumed samples: 4433920 | consumed tokens: 9080668160 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.198689E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.230 | TFLOPs: 31.91 | +7: iteration 17330/ 173500 | consumed samples: 4436480 | consumed tokens: 9085911040 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.208009E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.490 | TFLOPs: 31.93 | +7: iteration 17340/ 173500 | consumed samples: 4439040 | consumed tokens: 9091153920 | elapsed time per iteration (s): 0.43 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.212240E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.825 | TFLOPs: 31.58 | +7: iteration 17350/ 173500 | consumed samples: 4441600 | consumed tokens: 9096396800 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.196152E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.875 | TFLOPs: 31.89 | +7: iteration 17360/ 173500 | consumed samples: 4444160 | consumed tokens: 9101639680 | elapsed time per iteration (s): 0.43 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.205921E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.673 | TFLOPs: 31.25 | +7: iteration 17370/ 173500 | consumed samples: 4446720 | consumed tokens: 9106882560 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.186108E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.748 | TFLOPs: 31.84 | +7: iteration 17380/ 173500 | consumed samples: 4449280 | consumed tokens: 9112125440 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.194704E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.925 | TFLOPs: 31.95 | +7: iteration 17390/ 173500 | consumed samples: 4451840 | consumed tokens: 9117368320 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.183841E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.604 | TFLOPs: 31.93 | +7: iteration 17400/ 173500 | consumed samples: 4454400 | consumed tokens: 9122611200 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.200968E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.639 | TFLOPs: 31.83 | +7: iteration 17410/ 173500 | consumed samples: 4456960 | consumed tokens: 9127854080 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.200213E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.780 | TFLOPs: 31.94 | +7: iteration 17420/ 173500 | consumed samples: 4459520 | consumed tokens: 9133096960 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.185547E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.689 | TFLOPs: 31.94 | +7: iteration 17430/ 173500 | consumed samples: 4462080 | consumed tokens: 9138339840 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.189500E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.091 | TFLOPs: 31.96 | +7: iteration 17440/ 173500 | consumed samples: 4464640 | consumed tokens: 9143582720 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.201791E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.698 | TFLOPs: 31.94 | +7: iteration 17450/ 173500 | consumed samples: 4467200 | consumed tokens: 9148825600 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.198559E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.322 | TFLOPs: 31.92 | +7: iteration 17460/ 173500 | consumed samples: 4469760 | consumed tokens: 9154068480 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.193102E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.483 | TFLOPs: 31.93 | +7: iteration 17470/ 173500 | consumed samples: 4472320 | consumed tokens: 9159311360 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.181866E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.421 | TFLOPs: 31.92 | +7: iteration 17480/ 173500 | consumed samples: 4474880 | consumed tokens: 9164554240 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.208066E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.510 | TFLOPs: 31.93 | +7: iteration 17490/ 173500 | consumed samples: 4477440 | consumed tokens: 9169797120 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.193079E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.421 | TFLOPs: 31.92 | +7: iteration 17500/ 173500 | consumed samples: 4480000 | consumed tokens: 9175040000 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.190846E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.701 | TFLOPs: 31.94 | +7: iteration 17510/ 173500 | consumed samples: 4482560 | consumed tokens: 9180282880 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.182973E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.758 | TFLOPs: 31.94 | +7: iteration 17520/ 173500 | consumed samples: 4485120 | consumed tokens: 9185525760 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.195230E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.650 | TFLOPs: 31.93 | +7: iteration 17530/ 173500 | consumed samples: 4487680 | consumed tokens: 9190768640 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.195258E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.529 | TFLOPs: 31.93 | +7: iteration 17540/ 173500 | consumed samples: 4490240 | consumed tokens: 9196011520 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.193149E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.564 | TFLOPs: 31.93 | +7: iteration 17550/ 173500 | consumed samples: 4492800 | consumed tokens: 9201254400 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.197431E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.488 | TFLOPs: 31.93 | +7: iteration 17560/ 173500 | consumed samples: 4495360 | consumed tokens: 9206497280 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.196793E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.733 | TFLOPs: 31.94 | +7: iteration 17570/ 173500 | consumed samples: 4497920 | consumed tokens: 9211740160 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.199410E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.521 | TFLOPs: 31.93 | +7: iteration 17580/ 173500 | consumed samples: 4500480 | consumed tokens: 9216983040 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.207133E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.724 | TFLOPs: 31.94 | +7: iteration 17590/ 173500 | consumed samples: 4503040 | consumed tokens: 9222225920 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.192534E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.311 | TFLOPs: 31.92 | +7: iteration 17600/ 173500 | consumed samples: 4505600 | consumed tokens: 9227468800 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.195061E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.965 | TFLOPs: 31.90 | +7: iteration 17610/ 173500 | consumed samples: 4508160 | consumed tokens: 9232711680 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.185137E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.550 | TFLOPs: 31.93 | +7: iteration 17620/ 173500 | consumed samples: 4510720 | consumed tokens: 9237954560 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.199412E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.179 | TFLOPs: 31.91 | +7: iteration 17630/ 173500 | consumed samples: 4513280 | consumed tokens: 9243197440 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.191610E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.476 | TFLOPs: 31.93 | +7: iteration 17640/ 173500 | consumed samples: 4515840 | consumed tokens: 9248440320 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.205165E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.349 | TFLOPs: 31.92 | +7: iteration 17650/ 173500 | consumed samples: 4518400 | consumed tokens: 9253683200 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.200051E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.477 | TFLOPs: 31.93 | +7: iteration 17660/ 173500 | consumed samples: 4520960 | consumed tokens: 9258926080 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.194731E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.369 | TFLOPs: 31.92 | +7: iteration 17670/ 173500 | consumed samples: 4523520 | consumed tokens: 9264168960 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.195394E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.251 | TFLOPs: 31.91 | +7: iteration 17680/ 173500 | consumed samples: 4526080 | consumed tokens: 9269411840 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.192080E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.361 | TFLOPs: 31.92 | +7: iteration 17690/ 173500 | consumed samples: 4528640 | consumed tokens: 9274654720 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.184924E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.723 | TFLOPs: 31.89 | +7: iteration 17700/ 173500 | consumed samples: 4531200 | consumed tokens: 9279897600 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.200455E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.025 | TFLOPs: 31.90 | +7: iteration 17710/ 173500 | consumed samples: 4533760 | consumed tokens: 9285140480 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.204513E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.966 | TFLOPs: 31.90 | +7: iteration 17720/ 173500 | consumed samples: 4536320 | consumed tokens: 9290383360 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.196768E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.759 | TFLOPs: 31.89 | +7: iteration 17730/ 173500 | consumed samples: 4538880 | consumed tokens: 9295626240 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.195500E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.907 | TFLOPs: 31.90 | +7: iteration 17740/ 173500 | consumed samples: 4541440 | consumed tokens: 9300869120 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.196147E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.171 | TFLOPs: 31.91 | +7: iteration 17750/ 173500 | consumed samples: 4544000 | consumed tokens: 9306112000 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.199597E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.032 | TFLOPs: 31.90 | +7: iteration 17760/ 173500 | consumed samples: 4546560 | consumed tokens: 9311354880 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.175385E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.036 | TFLOPs: 31.90 | +7: iteration 17770/ 173500 | consumed samples: 4549120 | consumed tokens: 9316597760 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.201178E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.071 | TFLOPs: 31.90 | +7: iteration 17780/ 173500 | consumed samples: 4551680 | consumed tokens: 9321840640 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.190150E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.839 | TFLOPs: 31.89 | +7: iteration 17790/ 173500 | consumed samples: 4554240 | consumed tokens: 9327083520 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.202889E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.095 | TFLOPs: 31.91 | +7: iteration 17800/ 173500 | consumed samples: 4556800 | consumed tokens: 9332326400 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.210627E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.924 | TFLOPs: 31.90 | +7: iteration 17810/ 173500 | consumed samples: 4559360 | consumed tokens: 9337569280 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.196839E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.920 | TFLOPs: 31.90 | +7: iteration 17820/ 173500 | consumed samples: 4561920 | consumed tokens: 9342812160 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.192441E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.401 | TFLOPs: 31.92 | +7: iteration 17830/ 173500 | consumed samples: 4564480 | consumed tokens: 9348055040 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.201185E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.996 | TFLOPs: 31.90 | +7: iteration 17840/ 173500 | consumed samples: 4567040 | consumed tokens: 9353297920 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.193469E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.750 | TFLOPs: 31.89 | +7: iteration 17850/ 173500 | consumed samples: 4569600 | consumed tokens: 9358540800 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.205149E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.070 | TFLOPs: 31.90 | +7: iteration 17860/ 173500 | consumed samples: 4572160 | consumed tokens: 9363783680 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.186625E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.801 | TFLOPs: 31.89 | +7: iteration 17870/ 173500 | consumed samples: 4574720 | consumed tokens: 9369026560 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.196687E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.028 | TFLOPs: 31.90 | +7: iteration 17880/ 173500 | consumed samples: 4577280 | consumed tokens: 9374269440 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.198690E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.700 | TFLOPs: 31.89 | +7: iteration 17890/ 173500 | consumed samples: 4579840 | consumed tokens: 9379512320 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.198349E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.834 | TFLOPs: 31.89 | +7: iteration 17900/ 173500 | consumed samples: 4582400 | consumed tokens: 9384755200 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.180743E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.799 | TFLOPs: 31.89 | +7: iteration 17910/ 173500 | consumed samples: 4584960 | consumed tokens: 9389998080 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.199596E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.622 | TFLOPs: 31.93 | +7: iteration 17920/ 173500 | consumed samples: 4587520 | consumed tokens: 9395240960 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.180782E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.100 | TFLOPs: 31.91 | +7: iteration 17930/ 173500 | consumed samples: 4590080 | consumed tokens: 9400483840 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.203331E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.700 | TFLOPs: 31.83 | +7: iteration 17940/ 173500 | consumed samples: 4592640 | consumed tokens: 9405726720 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.204991E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.720 | TFLOPs: 31.94 | +7: iteration 17950/ 173500 | consumed samples: 4595200 | consumed tokens: 9410969600 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.199079E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.897 | TFLOPs: 31.95 | +7: iteration 17960/ 173500 | consumed samples: 4597760 | consumed tokens: 9416212480 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.181458E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.490 | TFLOPs: 31.93 | +7: iteration 17970/ 173500 | consumed samples: 4600320 | consumed tokens: 9421455360 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.192064E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.879 | TFLOPs: 31.95 | +7: iteration 17980/ 173500 | consumed samples: 4602880 | consumed tokens: 9426698240 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.191336E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.658 | TFLOPs: 31.83 | +7: iteration 17990/ 173500 | consumed samples: 4605440 | consumed tokens: 9431941120 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.190516E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.486 | TFLOPs: 31.77 | +0: [2023-03-17 01:19:44,642] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=0, lr=[0.00019604685446348677, 0.00019604685446348677, 0.00019604685446348677], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 18000/ 173500 | consumed samples: 4608000 | consumed tokens: 9437184000 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.188368E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.807 | TFLOPs: 31.79 | +0: steps: 18000 loss: 3.1730 iter time (s): 0.419 samples/sec: 611.042 +7: iteration 18010/ 173500 | consumed samples: 4610560 | consumed tokens: 9442426880 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.184868E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.523 | TFLOPs: 31.77 | +7: iteration 18020/ 173500 | consumed samples: 4613120 | consumed tokens: 9447669760 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.176310E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.993 | TFLOPs: 31.74 | +7: iteration 18030/ 173500 | consumed samples: 4615680 | consumed tokens: 9452912640 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.196106E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.415 | TFLOPs: 31.77 | +7: iteration 18040/ 173500 | consumed samples: 4618240 | consumed tokens: 9458155520 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.184389E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.678 | TFLOPs: 31.83 | +7: iteration 18050/ 173500 | consumed samples: 4620800 | consumed tokens: 9463398400 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.206272E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.115 | TFLOPs: 31.64 | +7: iteration 18060/ 173500 | consumed samples: 4623360 | consumed tokens: 9468641280 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.189995E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.204 | TFLOPs: 31.91 | +7: iteration 18070/ 173500 | consumed samples: 4625920 | consumed tokens: 9473884160 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.190669E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.516 | TFLOPs: 31.82 | +7: iteration 18080/ 173500 | consumed samples: 4628480 | consumed tokens: 9479127040 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.201880E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.560 | TFLOPs: 31.88 | +7: iteration 18090/ 173500 | consumed samples: 4631040 | consumed tokens: 9484369920 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.195135E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.487 | TFLOPs: 31.87 | +7: iteration 18100/ 173500 | consumed samples: 4633600 | consumed tokens: 9489612800 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.202663E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.951 | TFLOPs: 31.74 | +7: iteration 18110/ 173500 | consumed samples: 4636160 | consumed tokens: 9494855680 | elapsed time per iteration (s): 0.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.176800E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.376 | TFLOPs: 31.19 | +7: iteration 18120/ 173500 | consumed samples: 4638720 | consumed tokens: 9500098560 | elapsed time per iteration (s): 0.45 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.183952E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.642 | TFLOPs: 29.99 | +7: iteration 18130/ 173500 | consumed samples: 4641280 | consumed tokens: 9505341440 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.195033E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.994 | TFLOPs: 31.80 | +7: iteration 18140/ 173500 | consumed samples: 4643840 | consumed tokens: 9510584320 | elapsed time per iteration (s): 0.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.186129E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.421 | TFLOPs: 30.93 | +7: iteration 18150/ 173500 | consumed samples: 4646400 | consumed tokens: 9515827200 | elapsed time per iteration (s): 0.45 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.193831E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.061 | TFLOPs: 30.07 | +7: iteration 18160/ 173500 | consumed samples: 4648960 | consumed tokens: 9521070080 | elapsed time per iteration (s): 0.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.185790E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.049 | TFLOPs: 30.96 | +7: iteration 18170/ 173500 | consumed samples: 4651520 | consumed tokens: 9526312960 | elapsed time per iteration (s): 0.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.178765E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.235 | TFLOPs: 31.60 | +7: iteration 18180/ 173500 | consumed samples: 4654080 | consumed tokens: 9531555840 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.197552E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.296 | TFLOPs: 31.97 | +7: iteration 18190/ 173500 | consumed samples: 4656640 | consumed tokens: 9536798720 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.176708E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.598 | TFLOPs: 31.72 | +7: iteration 18200/ 173500 | consumed samples: 4659200 | consumed tokens: 9542041600 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.187313E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.041 | TFLOPs: 30.64 | +7: iteration 18210/ 173500 | consumed samples: 4661760 | consumed tokens: 9547284480 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.201863E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.149 | TFLOPs: 30.44 | +7: iteration 18220/ 173500 | consumed samples: 4664320 | consumed tokens: 9552527360 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.178413E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.313 | TFLOPs: 30.19 | +7: iteration 18230/ 173500 | consumed samples: 4666880 | consumed tokens: 9557770240 | elapsed time per iteration (s): 0.46 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.186876E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.243 | TFLOPs: 29.29 | +7: iteration 18240/ 173500 | consumed samples: 4669440 | consumed tokens: 9563013120 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.183899E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.568 | TFLOPs: 30.25 | +7: iteration 18250/ 173500 | consumed samples: 4672000 | consumed tokens: 9568256000 | elapsed time per iteration (s): 0.45 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.184647E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.735 | TFLOPs: 29.53 | +7: iteration 18260/ 173500 | consumed samples: 4674560 | consumed tokens: 9573498880 | elapsed time per iteration (s): 0.45 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.186012E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.195 | TFLOPs: 30.02 | +7: iteration 18270/ 173500 | consumed samples: 4677120 | consumed tokens: 9578741760 | elapsed time per iteration (s): 0.45 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.199384E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.868 | TFLOPs: 29.90 | +7: iteration 18280/ 173500 | consumed samples: 4679680 | consumed tokens: 9583984640 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.204086E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.203 | TFLOPs: 30.65 | +7: iteration 18290/ 173500 | consumed samples: 4682240 | consumed tokens: 9589227520 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.206201E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.491 | TFLOPs: 31.03 | +7: iteration 18300/ 173500 | consumed samples: 4684800 | consumed tokens: 9594470400 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.186311E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.452 | TFLOPs: 30.82 | +7: iteration 18310/ 173500 | consumed samples: 4687360 | consumed tokens: 9599713280 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.191909E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.446 | TFLOPs: 30.46 | +7: iteration 18320/ 173500 | consumed samples: 4689920 | consumed tokens: 9604956160 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.170531E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.099 | TFLOPs: 30.70 | +7: iteration 18330/ 173500 | consumed samples: 4692480 | consumed tokens: 9610199040 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.178608E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.534 | TFLOPs: 30.46 | +7: iteration 18340/ 173500 | consumed samples: 4695040 | consumed tokens: 9615441920 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.194114E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.682 | TFLOPs: 31.25 | +7: iteration 18350/ 173500 | consumed samples: 4697600 | consumed tokens: 9620684800 | elapsed time per iteration (s): 0.45 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.184221E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.997 | TFLOPs: 30.01 | +7: iteration 18360/ 173500 | consumed samples: 4700160 | consumed tokens: 9625927680 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.193322E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.741 | TFLOPs: 31.00 | +7: iteration 18370/ 173500 | consumed samples: 4702720 | consumed tokens: 9631170560 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.192219E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.493 | TFLOPs: 30.35 | +7: iteration 18380/ 173500 | consumed samples: 4705280 | consumed tokens: 9636413440 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.205426E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.041 | TFLOPs: 31.06 | +7: iteration 18390/ 173500 | consumed samples: 4707840 | consumed tokens: 9641656320 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.179093E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.242 | TFLOPs: 30.71 | +7: iteration 18400/ 173500 | consumed samples: 4710400 | consumed tokens: 9646899200 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.182360E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.660 | TFLOPs: 31.52 | +7: iteration 18410/ 173500 | consumed samples: 4712960 | consumed tokens: 9652142080 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.200857E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.593 | TFLOPs: 31.20 | +7: iteration 18420/ 173500 | consumed samples: 4715520 | consumed tokens: 9657384960 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.207786E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.148 | TFLOPs: 31.02 | +7: iteration 18430/ 173500 | consumed samples: 4718080 | consumed tokens: 9662627840 | elapsed time per iteration (s): 0.44 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.180987E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.358 | TFLOPs: 30.29 | +7: iteration 18440/ 173500 | consumed samples: 4720640 | consumed tokens: 9667870720 | elapsed time per iteration (s): 0.44 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.181470E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.841 | TFLOPs: 30.79 | +7: iteration 18450/ 173500 | consumed samples: 4723200 | consumed tokens: 9673113600 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.178170E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.604 | TFLOPs: 31.62 | +7: iteration 18460/ 173500 | consumed samples: 4725760 | consumed tokens: 9678356480 | elapsed time per iteration (s): 0.44 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.186881E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.615 | TFLOPs: 30.52 | +7: iteration 18470/ 173500 | consumed samples: 4728320 | consumed tokens: 9683599360 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.188117E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.741 | TFLOPs: 31.15 | +7: iteration 18480/ 173500 | consumed samples: 4730880 | consumed tokens: 9688842240 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.177380E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.108 | TFLOPs: 31.07 | +7: iteration 18490/ 173500 | consumed samples: 4733440 | consumed tokens: 9694085120 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.190726E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.299 | TFLOPs: 31.13 | +7: iteration 18500/ 173500 | consumed samples: 4736000 | consumed tokens: 9699328000 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.175689E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.513 | TFLOPs: 31.51 | +7: iteration 18510/ 173500 | consumed samples: 4738560 | consumed tokens: 9704570880 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.184284E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.053 | TFLOPs: 31.17 | +7: iteration 18520/ 173500 | consumed samples: 4741120 | consumed tokens: 9709813760 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.185011E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.812 | TFLOPs: 31.26 | +7: iteration 18530/ 173500 | consumed samples: 4743680 | consumed tokens: 9715056640 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.191375E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.865 | TFLOPs: 31.11 | +7: iteration 18540/ 173500 | consumed samples: 4746240 | consumed tokens: 9720299520 | elapsed time per iteration (s): 0.44 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.170299E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.181 | TFLOPs: 30.23 | +7: iteration 18550/ 173500 | consumed samples: 4748800 | consumed tokens: 9725542400 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.178466E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.322 | TFLOPs: 30.92 | +7: iteration 18560/ 173500 | consumed samples: 4751360 | consumed tokens: 9730785280 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.189197E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.126 | TFLOPs: 31.70 | +7: iteration 18570/ 173500 | consumed samples: 4753920 | consumed tokens: 9736028160 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.179006E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.005 | TFLOPs: 31.22 | +7: iteration 18580/ 173500 | consumed samples: 4756480 | consumed tokens: 9741271040 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.177403E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.858 | TFLOPs: 31.21 | +7: iteration 18590/ 173500 | consumed samples: 4759040 | consumed tokens: 9746513920 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.182112E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.256 | TFLOPs: 31.55 | +7: iteration 18600/ 173500 | consumed samples: 4761600 | consumed tokens: 9751756800 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.194962E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.791 | TFLOPs: 31.47 | +7: iteration 18610/ 173500 | consumed samples: 4764160 | consumed tokens: 9756999680 | elapsed time per iteration (s): 0.44 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.178752E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.745 | TFLOPs: 30.58 | +7: iteration 18620/ 173500 | consumed samples: 4766720 | consumed tokens: 9762242560 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.182308E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.350 | TFLOPs: 31.45 | +7: iteration 18630/ 173500 | consumed samples: 4769280 | consumed tokens: 9767485440 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.174903E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.095 | TFLOPs: 31.43 | +7: iteration 18640/ 173500 | consumed samples: 4771840 | consumed tokens: 9772728320 | elapsed time per iteration (s): 0.44 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.184062E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.993 | TFLOPs: 30.80 | +7: iteration 18650/ 173500 | consumed samples: 4774400 | consumed tokens: 9777971200 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.194078E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.987 | TFLOPs: 31.48 | +7: iteration 18660/ 173500 | consumed samples: 4776960 | consumed tokens: 9783214080 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.176776E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.345 | TFLOPs: 31.13 | +7: iteration 18670/ 173500 | consumed samples: 4779520 | consumed tokens: 9788456960 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.191295E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.654 | TFLOPs: 31.10 | +7: iteration 18680/ 173500 | consumed samples: 4782080 | consumed tokens: 9793699840 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.195665E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.303 | TFLOPs: 31.29 | +7: iteration 18690/ 173500 | consumed samples: 4784640 | consumed tokens: 9798942720 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.179617E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.574 | TFLOPs: 31.30 | +7: iteration 18700/ 173500 | consumed samples: 4787200 | consumed tokens: 9804185600 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.182176E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.694 | TFLOPs: 31.41 | +7: iteration 18710/ 173500 | consumed samples: 4789760 | consumed tokens: 9809428480 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.188398E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.130 | TFLOPs: 31.02 | +7: iteration 18720/ 173500 | consumed samples: 4792320 | consumed tokens: 9814671360 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.182050E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.290 | TFLOPs: 31.55 | +7: iteration 18730/ 173500 | consumed samples: 4794880 | consumed tokens: 9819914240 | elapsed time per iteration (s): 0.45 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.173343E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.501 | TFLOPs: 30.14 | +7: iteration 18740/ 173500 | consumed samples: 4797440 | consumed tokens: 9825157120 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.184237E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.488 | TFLOPs: 30.98 | +7: iteration 18750/ 173500 | consumed samples: 4800000 | consumed tokens: 9830400000 | elapsed time per iteration (s): 0.42 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.188477E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.189 | TFLOPs: 31.70 | +7: iteration 18760/ 173500 | consumed samples: 4802560 | consumed tokens: 9835642880 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.183188E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.712 | TFLOPs: 31.52 | +7: iteration 18770/ 173500 | consumed samples: 4805120 | consumed tokens: 9840885760 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.173201E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.450 | TFLOPs: 31.45 | +7: iteration 18780/ 173500 | consumed samples: 4807680 | consumed tokens: 9846128640 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.168912E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.725 | TFLOPs: 31.31 | +7: iteration 18790/ 173500 | consumed samples: 4810240 | consumed tokens: 9851371520 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.184508E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.769 | TFLOPs: 31.10 | +7: iteration 18800/ 173500 | consumed samples: 4812800 | consumed tokens: 9856614400 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.180151E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.655 | TFLOPs: 30.99 | +7: iteration 18810/ 173500 | consumed samples: 4815360 | consumed tokens: 9861857280 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.196161E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.134 | TFLOPs: 31.80 | +7: iteration 18820/ 173500 | consumed samples: 4817920 | consumed tokens: 9867100160 | elapsed time per iteration (s): 0.45 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.178836E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.038 | TFLOPs: 29.91 | +7: iteration 18830/ 173500 | consumed samples: 4820480 | consumed tokens: 9872343040 | elapsed time per iteration (s): 0.44 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.172961E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.016 | TFLOPs: 30.64 | +7: iteration 18840/ 173500 | consumed samples: 4823040 | consumed tokens: 9877585920 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.176035E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.118 | TFLOPs: 30.96 | +7: iteration 18850/ 173500 | consumed samples: 4825600 | consumed tokens: 9882828800 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.188646E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.686 | TFLOPs: 31.15 | +7: iteration 18860/ 173500 | consumed samples: 4828160 | consumed tokens: 9888071680 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.185901E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.340 | TFLOPs: 31.34 | +7: iteration 18870/ 173500 | consumed samples: 4830720 | consumed tokens: 9893314560 | elapsed time per iteration (s): 0.44 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.165176E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.957 | TFLOPs: 30.38 | +7: iteration 18880/ 173500 | consumed samples: 4833280 | consumed tokens: 9898557440 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.187067E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.772 | TFLOPs: 30.94 | +7: iteration 18890/ 173500 | consumed samples: 4835840 | consumed tokens: 9903800320 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.178555E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.186 | TFLOPs: 31.02 | +7: iteration 18900/ 173500 | consumed samples: 4838400 | consumed tokens: 9909043200 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.186736E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.812 | TFLOPs: 31.84 | +7: iteration 18910/ 173500 | consumed samples: 4840960 | consumed tokens: 9914286080 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.178418E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.880 | TFLOPs: 31.53 | +7: iteration 18920/ 173500 | consumed samples: 4843520 | consumed tokens: 9919528960 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.181109E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.483 | TFLOPs: 31.09 | +7: iteration 18930/ 173500 | consumed samples: 4846080 | consumed tokens: 9924771840 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.176217E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.585 | TFLOPs: 31.04 | +7: iteration 18940/ 173500 | consumed samples: 4848640 | consumed tokens: 9930014720 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.186630E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.765 | TFLOPs: 31.26 | +7: iteration 18950/ 173500 | consumed samples: 4851200 | consumed tokens: 9935257600 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.194251E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.619 | TFLOPs: 31.15 | +7: iteration 18960/ 173500 | consumed samples: 4853760 | consumed tokens: 9940500480 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.173000E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.606 | TFLOPs: 31.20 | +7: iteration 18970/ 173500 | consumed samples: 4856320 | consumed tokens: 9945743360 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.182559E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.960 | TFLOPs: 30.95 | +7: iteration 18980/ 173500 | consumed samples: 4858880 | consumed tokens: 9950986240 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.171892E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.003 | TFLOPs: 31.22 | +7: iteration 18990/ 173500 | consumed samples: 4861440 | consumed tokens: 9956229120 | elapsed time per iteration (s): 0.44 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.176267E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.502 | TFLOPs: 30.46 | +7: iteration 19000/ 173500 | consumed samples: 4864000 | consumed tokens: 9961472000 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.181046E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.190 | TFLOPs: 31.86 | +7: iteration 19010/ 173500 | consumed samples: 4866560 | consumed tokens: 9966714880 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.180443E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.586 | TFLOPs: 31.04 | +7: iteration 19020/ 173500 | consumed samples: 4869120 | consumed tokens: 9971957760 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.187538E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.653 | TFLOPs: 31.36 | +7: iteration 19030/ 173500 | consumed samples: 4871680 | consumed tokens: 9977200640 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.182716E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.790 | TFLOPs: 31.68 | +7: iteration 19040/ 173500 | consumed samples: 4874240 | consumed tokens: 9982443520 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.185160E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.932 | TFLOPs: 31.48 | +7: iteration 19050/ 173500 | consumed samples: 4876800 | consumed tokens: 9987686400 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.173152E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.694 | TFLOPs: 31.88 | +7: iteration 19060/ 173500 | consumed samples: 4879360 | consumed tokens: 9992929280 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.172841E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.378 | TFLOPs: 31.29 | +7: iteration 19070/ 173500 | consumed samples: 4881920 | consumed tokens: 9998172160 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.173750E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.727 | TFLOPs: 30.89 | +7: iteration 19080/ 173500 | consumed samples: 4884480 | consumed tokens: 10003415040 | elapsed time per iteration (s): 0.44 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.171548E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.996 | TFLOPs: 30.64 | +7: iteration 19090/ 173500 | consumed samples: 4887040 | consumed tokens: 10008657920 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.176714E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.725 | TFLOPs: 31.05 | +7: iteration 19100/ 173500 | consumed samples: 4889600 | consumed tokens: 10013900800 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.174608E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.426 | TFLOPs: 31.35 | +7: iteration 19110/ 173500 | consumed samples: 4892160 | consumed tokens: 10019143680 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.166165E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.830 | TFLOPs: 31.42 | +7: iteration 19120/ 173500 | consumed samples: 4894720 | consumed tokens: 10024386560 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.194026E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.563 | TFLOPs: 31.25 | +7: iteration 19130/ 173500 | consumed samples: 4897280 | consumed tokens: 10029629440 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.196115E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.994 | TFLOPs: 30.96 | +7: iteration 19140/ 173500 | consumed samples: 4899840 | consumed tokens: 10034872320 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.175517E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.234 | TFLOPs: 31.49 | +7: iteration 19150/ 173500 | consumed samples: 4902400 | consumed tokens: 10040115200 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.176231E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.814 | TFLOPs: 31.47 | +7: iteration 19160/ 173500 | consumed samples: 4904960 | consumed tokens: 10045358080 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.164598E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.606 | TFLOPs: 31.88 | +7: iteration 19170/ 173500 | consumed samples: 4907520 | consumed tokens: 10050600960 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.170857E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.680 | TFLOPs: 31.25 | +7: iteration 19180/ 173500 | consumed samples: 4910080 | consumed tokens: 10055843840 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.174949E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.683 | TFLOPs: 31.41 | +7: iteration 19190/ 173500 | consumed samples: 4912640 | consumed tokens: 10061086720 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.176337E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.473 | TFLOPs: 31.30 | +7: iteration 19200/ 173500 | consumed samples: 4915200 | consumed tokens: 10066329600 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.184833E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.879 | TFLOPs: 31.37 | +7: iteration 19210/ 173500 | consumed samples: 4917760 | consumed tokens: 10071572480 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.184747E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.376 | TFLOPs: 31.55 | +7: iteration 19220/ 173500 | consumed samples: 4920320 | consumed tokens: 10076815360 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.169072E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.771 | TFLOPs: 31.89 | +7: iteration 19230/ 173500 | consumed samples: 4922880 | consumed tokens: 10082058240 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.186710E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.622 | TFLOPs: 31.99 | +7: iteration 19240/ 173500 | consumed samples: 4925440 | consumed tokens: 10087301120 | elapsed time per iteration (s): 0.44 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.174709E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.690 | TFLOPs: 30.21 | +7: iteration 19250/ 173500 | consumed samples: 4928000 | consumed tokens: 10092544000 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.190237E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.319 | TFLOPs: 31.03 | +7: iteration 19260/ 173500 | consumed samples: 4930560 | consumed tokens: 10097786880 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.180443E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.189 | TFLOPs: 31.23 | +7: iteration 19270/ 173500 | consumed samples: 4933120 | consumed tokens: 10103029760 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.173610E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.777 | TFLOPs: 31.52 | +7: iteration 19280/ 173500 | consumed samples: 4935680 | consumed tokens: 10108272640 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.166663E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.594 | TFLOPs: 31.67 | +7: iteration 19290/ 173500 | consumed samples: 4938240 | consumed tokens: 10113515520 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.184124E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.637 | TFLOPs: 31.72 | +7: iteration 19300/ 173500 | consumed samples: 4940800 | consumed tokens: 10118758400 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.169384E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.409 | TFLOPs: 31.66 | +7: iteration 19310/ 173500 | consumed samples: 4943360 | consumed tokens: 10124001280 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.176857E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.911 | TFLOPs: 31.63 | +7: iteration 19320/ 173500 | consumed samples: 4945920 | consumed tokens: 10129244160 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.179849E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.827 | TFLOPs: 31.10 | +7: iteration 19330/ 173500 | consumed samples: 4948480 | consumed tokens: 10134487040 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.189299E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.145 | TFLOPs: 31.75 | +7: iteration 19340/ 173500 | consumed samples: 4951040 | consumed tokens: 10139729920 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.183406E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.710 | TFLOPs: 31.26 | +7: iteration 19350/ 173500 | consumed samples: 4953600 | consumed tokens: 10144972800 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.180575E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.041 | TFLOPs: 31.59 | +7: iteration 19360/ 173500 | consumed samples: 4956160 | consumed tokens: 10150215680 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.181759E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.218 | TFLOPs: 31.81 | +7: iteration 19370/ 173500 | consumed samples: 4958720 | consumed tokens: 10155458560 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.179921E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.584 | TFLOPs: 31.51 | +7: iteration 19380/ 173500 | consumed samples: 4961280 | consumed tokens: 10160701440 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.171838E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.967 | TFLOPs: 31.79 | +7: iteration 19390/ 173500 | consumed samples: 4963840 | consumed tokens: 10165944320 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.175158E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.923 | TFLOPs: 31.58 | +7: iteration 19400/ 173500 | consumed samples: 4966400 | consumed tokens: 10171187200 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.184319E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.920 | TFLOPs: 31.42 | +7: iteration 19410/ 173500 | consumed samples: 4968960 | consumed tokens: 10176430080 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.177519E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.363 | TFLOPs: 31.45 | +7: iteration 19420/ 173500 | consumed samples: 4971520 | consumed tokens: 10181672960 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.185178E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.046 | TFLOPs: 31.96 | +7: iteration 19430/ 173500 | consumed samples: 4974080 | consumed tokens: 10186915840 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.171917E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.383 | TFLOPs: 31.92 | +7: iteration 19440/ 173500 | consumed samples: 4976640 | consumed tokens: 10192158720 | elapsed time per iteration (s): 0.44 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.165340E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.769 | TFLOPs: 30.58 | +7: iteration 19450/ 173500 | consumed samples: 4979200 | consumed tokens: 10197401600 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.183316E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.841 | TFLOPs: 31.68 | +7: iteration 19460/ 173500 | consumed samples: 4981760 | consumed tokens: 10202644480 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.188160E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.454 | TFLOPs: 31.61 | +7: iteration 19470/ 173500 | consumed samples: 4984320 | consumed tokens: 10207887360 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.180293E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.039 | TFLOPs: 31.75 | +7: iteration 19480/ 173500 | consumed samples: 4986880 | consumed tokens: 10213130240 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.179822E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.892 | TFLOPs: 31.53 | +7: iteration 19490/ 173500 | consumed samples: 4989440 | consumed tokens: 10218373120 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.190970E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.646 | TFLOPs: 31.04 | +7: iteration 19500/ 173500 | consumed samples: 4992000 | consumed tokens: 10223616000 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.166595E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.086 | TFLOPs: 31.70 | +7: iteration 19510/ 173500 | consumed samples: 4994560 | consumed tokens: 10228858880 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.189621E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.023 | TFLOPs: 31.48 | +7: iteration 19520/ 173500 | consumed samples: 4997120 | consumed tokens: 10234101760 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.182084E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.576 | TFLOPs: 31.25 | +7: iteration 19530/ 173500 | consumed samples: 4999680 | consumed tokens: 10239344640 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.170516E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.704 | TFLOPs: 31.10 | +7: iteration 19540/ 173500 | consumed samples: 5002240 | consumed tokens: 10244587520 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.180452E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.567 | TFLOPs: 31.09 | +7: iteration 19550/ 173500 | consumed samples: 5004800 | consumed tokens: 10249830400 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.174430E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.376 | TFLOPs: 31.45 | +7: iteration 19560/ 173500 | consumed samples: 5007360 | consumed tokens: 10255073280 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.167198E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.364 | TFLOPs: 31.45 | +7: iteration 19570/ 173500 | consumed samples: 5009920 | consumed tokens: 10260316160 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.175706E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.139 | TFLOPs: 31.59 | +7: iteration 19580/ 173500 | consumed samples: 5012480 | consumed tokens: 10265559040 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.174364E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.904 | TFLOPs: 31.53 | +7: iteration 19590/ 173500 | consumed samples: 5015040 | consumed tokens: 10270801920 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.170816E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.085 | TFLOPs: 31.01 | +7: iteration 19600/ 173500 | consumed samples: 5017600 | consumed tokens: 10276044800 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.184795E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.327 | TFLOPs: 31.55 | +7: iteration 19610/ 173500 | consumed samples: 5020160 | consumed tokens: 10281287680 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.180052E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.804 | TFLOPs: 31.79 | +7: iteration 19620/ 173500 | consumed samples: 5022720 | consumed tokens: 10286530560 | elapsed time per iteration (s): 0.44 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.177404E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.276 | TFLOPs: 30.71 | +7: iteration 19630/ 173500 | consumed samples: 5025280 | consumed tokens: 10291773440 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.184582E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.885 | TFLOPs: 30.90 | +7: iteration 19640/ 173500 | consumed samples: 5027840 | consumed tokens: 10297016320 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.171888E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.942 | TFLOPs: 31.74 | +7: iteration 19650/ 173500 | consumed samples: 5030400 | consumed tokens: 10302259200 | elapsed time per iteration (s): 0.44 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.168731E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.021 | TFLOPs: 30.49 | +7: iteration 19660/ 173500 | consumed samples: 5032960 | consumed tokens: 10307502080 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.159975E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.872 | TFLOPs: 31.68 | +7: iteration 19670/ 173500 | consumed samples: 5035520 | consumed tokens: 10312744960 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.177471E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.829 | TFLOPs: 31.10 | +7: iteration 19680/ 173500 | consumed samples: 5038080 | consumed tokens: 10317987840 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.164178E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.317 | TFLOPs: 31.81 | +7: iteration 19690/ 173500 | consumed samples: 5040640 | consumed tokens: 10323230720 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.160467E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.264 | TFLOPs: 31.39 | +7: iteration 19700/ 173500 | consumed samples: 5043200 | consumed tokens: 10328473600 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.160010E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.312 | TFLOPs: 31.55 | +7: iteration 19710/ 173500 | consumed samples: 5045760 | consumed tokens: 10333716480 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.183177E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.212 | TFLOPs: 31.60 | +7: iteration 19720/ 173500 | consumed samples: 5048320 | consumed tokens: 10338959360 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.172864E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.873 | TFLOPs: 31.79 | +7: iteration 19730/ 173500 | consumed samples: 5050880 | consumed tokens: 10344202240 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.165310E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.083 | TFLOPs: 31.70 | +7: iteration 19740/ 173500 | consumed samples: 5053440 | consumed tokens: 10349445120 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.189165E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.431 | TFLOPs: 31.61 | +7: iteration 19750/ 173500 | consumed samples: 5056000 | consumed tokens: 10354688000 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.173564E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.432 | TFLOPs: 31.45 | +7: iteration 19760/ 173500 | consumed samples: 5058560 | consumed tokens: 10359930880 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.173605E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.784 | TFLOPs: 31.36 | +7: iteration 19770/ 173500 | consumed samples: 5061120 | consumed tokens: 10365173760 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.177337E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.618 | TFLOPs: 31.78 | +7: iteration 19780/ 173500 | consumed samples: 5063680 | consumed tokens: 10370416640 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.167256E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.101 | TFLOPs: 31.17 | +7: iteration 19790/ 173500 | consumed samples: 5066240 | consumed tokens: 10375659520 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.172244E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.972 | TFLOPs: 31.43 | +7: iteration 19800/ 173500 | consumed samples: 5068800 | consumed tokens: 10380902400 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.185942E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.146 | TFLOPs: 31.54 | +7: iteration 19810/ 173500 | consumed samples: 5071360 | consumed tokens: 10386145280 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.161925E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.846 | TFLOPs: 31.21 | +7: iteration 19820/ 173500 | consumed samples: 5073920 | consumed tokens: 10391388160 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.173185E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.486 | TFLOPs: 31.77 | +7: iteration 19830/ 173500 | consumed samples: 5076480 | consumed tokens: 10396631040 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.180120E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.172 | TFLOPs: 31.44 | +7: iteration 19840/ 173500 | consumed samples: 5079040 | consumed tokens: 10401873920 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.177447E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.774 | TFLOPs: 31.52 | +7: iteration 19850/ 173500 | consumed samples: 5081600 | consumed tokens: 10407116800 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.183715E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.430 | TFLOPs: 31.77 | +7: iteration 19860/ 173500 | consumed samples: 5084160 | consumed tokens: 10412359680 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.179955E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.380 | TFLOPs: 31.03 | +7: iteration 19870/ 173500 | consumed samples: 5086720 | consumed tokens: 10417602560 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.162986E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.412 | TFLOPs: 31.03 | +7: iteration 19880/ 173500 | consumed samples: 5089280 | consumed tokens: 10422845440 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.172223E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.971 | TFLOPs: 31.69 | +7: iteration 19890/ 173500 | consumed samples: 5091840 | consumed tokens: 10428088320 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.175187E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.780 | TFLOPs: 31.63 | +7: iteration 19900/ 173500 | consumed samples: 5094400 | consumed tokens: 10433331200 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.167287E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.467 | TFLOPs: 31.40 | +7: iteration 19910/ 173500 | consumed samples: 5096960 | consumed tokens: 10438574080 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.183677E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.379 | TFLOPs: 31.19 | +7: iteration 19920/ 173500 | consumed samples: 5099520 | consumed tokens: 10443816960 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.177933E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.306 | TFLOPs: 31.13 | +7: iteration 19930/ 173500 | consumed samples: 5102080 | consumed tokens: 10449059840 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.187252E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.510 | TFLOPs: 31.93 | +7: iteration 19940/ 173500 | consumed samples: 5104640 | consumed tokens: 10454302720 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.161967E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.660 | TFLOPs: 31.25 | +7: iteration 19950/ 173500 | consumed samples: 5107200 | consumed tokens: 10459545600 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.177349E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.873 | TFLOPs: 31.42 | +7: iteration 19960/ 173500 | consumed samples: 5109760 | consumed tokens: 10464788480 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.171072E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.416 | TFLOPs: 31.92 | +7: iteration 19970/ 173500 | consumed samples: 5112320 | consumed tokens: 10470031360 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.170689E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.132 | TFLOPs: 31.23 | +7: iteration 19980/ 173500 | consumed samples: 5114880 | consumed tokens: 10475274240 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.184142E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.535 | TFLOPs: 31.88 | +7: iteration 19990/ 173500 | consumed samples: 5117440 | consumed tokens: 10480517120 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.182536E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.095 | TFLOPs: 31.22 | +0: [2023-03-17 01:34:04,666] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=0, lr=[0.00019502450208460265, 0.00019502450208460265, 0.00019502450208460265], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 20000/ 173500 | consumed samples: 5120000 | consumed tokens: 10485760000 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.164417E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.018 | TFLOPs: 31.59 | +0: steps: 20000 loss: 3.1779 iter time (s): 0.428 samples/sec: 598.542 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 20000 | lm loss value: 3.319685E+00 | lm loss PPL: 2.765164E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 20000 to checkpoints_221m91b400m +0: [2023-03-17 01:34:04,827] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step20000 is begin to save! +0: [2023-03-17 01:34:04,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_01-model_00-model_states.pt... +0: [2023-03-17 01:34:04,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_01-model_00-model_states.pt. +0: [2023-03-17 01:34:04,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_03-model_00-model_states.pt... +0: [2023-03-17 01:34:04,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_03-model_00-model_states.pt. +0: [2023-03-17 01:34:04,971] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_04-model_00-model_states.pt... +0: [2023-03-17 01:34:04,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_04-model_00-model_states.pt. +0: [2023-03-17 01:34:04,996] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_05-model_00-model_states.pt... +0: [2023-03-17 01:34:05,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_05-model_00-model_states.pt. +0: [2023-03-17 01:34:05,019] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_06-model_00-model_states.pt... +0: [2023-03-17 01:34:05,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_06-model_00-model_states.pt. +0: [2023-03-17 01:34:05,043] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_07-model_00-model_states.pt... +0: [2023-03-17 01:34:05,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_07-model_00-model_states.pt. +0: [2023-03-17 01:34:05,066] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_08-model_00-model_states.pt... +0: [2023-03-17 01:34:05,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_08-model_00-model_states.pt. +0: [2023-03-17 01:34:05,090] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_09-model_00-model_states.pt... +0: [2023-03-17 01:34:05,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_09-model_00-model_states.pt. +0: [2023-03-17 01:34:05,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_10-model_00-model_states.pt... +0: [2023-03-17 01:34:05,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_10-model_00-model_states.pt. +0: [2023-03-17 01:34:05,139] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_11-model_00-model_states.pt... +0: [2023-03-17 01:34:05,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_11-model_00-model_states.pt. +0: [2023-03-17 01:34:05,163] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_12-model_00-model_states.pt... +0: [2023-03-17 01:34:05,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_12-model_00-model_states.pt. +0: [2023-03-17 01:34:05,187] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_13-model_00-model_states.pt... +0: [2023-03-17 01:34:05,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_13-model_00-model_states.pt. +0: [2023-03-17 01:34:05,210] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_14-model_00-model_states.pt... +0: [2023-03-17 01:34:05,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_14-model_00-model_states.pt. +0: [2023-03-17 01:34:05,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_15-model_00-model_states.pt... +0: [2023-03-17 01:34:05,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_15-model_00-model_states.pt. +0: [2023-03-17 01:34:05,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_16-model_00-model_states.pt... +0: [2023-03-17 01:34:05,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_16-model_00-model_states.pt. +0: [2023-03-17 01:34:05,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_17-model_00-model_states.pt... +0: [2023-03-17 01:34:05,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_17-model_00-model_states.pt. +0: [2023-03-17 01:34:05,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_18-model_00-model_states.pt... +0: [2023-03-17 01:34:05,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_18-model_00-model_states.pt. +0: [2023-03-17 01:34:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_19-model_00-model_states.pt... +0: [2023-03-17 01:34:05,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_19-model_00-model_states.pt. +0: [2023-03-17 01:34:05,357] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_20-model_00-model_states.pt... +0: [2023-03-17 01:34:05,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_20-model_00-model_states.pt. +0: [2023-03-17 01:34:05,381] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/layer_22-model_00-model_states.pt... +0: [2023-03-17 01:34:05,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/layer_22-model_00-model_states.pt. +0: [2023-03-17 01:34:05,385] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step20000/mp_rank_00_model_states.pt +0: [2023-03-17 01:34:05,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/mp_rank_00_model_states.pt... +0: [2023-03-17 01:34:05,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/mp_rank_00_model_states.pt. +0: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +6: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 01:34:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +0: [2023-03-17 01:34:05,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:34:05,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 01:34:05,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +0: [2023-03-17 01:34:05,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:34:05,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:34:05,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:34:05,460] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 01:34:05,460] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 01:34:05,460] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 01:34:05,460] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +0: [2023-03-17 01:34:05,460] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +0: [2023-03-17 01:34:05,460] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +5: [2023-03-17 01:34:05,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:34:05,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 01:34:05,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:34:05,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +5: [2023-03-17 01:34:05,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 01:34:05,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +5: [2023-03-17 01:34:05,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:34:05,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 01:34:05,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +5: [2023-03-17 01:34:05,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:34:05,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 01:34:05,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +5: [2023-03-17 01:34:05,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:34:05,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 01:34:05,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +0: [2023-03-17 01:34:05,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:34:05,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:34:05,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 01:34:05,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +0: [2023-03-17 01:34:05,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:34:05,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:34:05,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +5: [2023-03-17 01:34:05,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +0: [2023-03-17 01:34:05,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +5: [2023-03-17 01:34:05,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +5: [2023-03-17 01:34:05,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 01:34:05,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 01:34:05,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +0: [2023-03-17 01:34:05,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:34:05,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 01:34:05,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +6: [2023-03-17 01:34:05,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:34:05,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:34:05,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:34:05,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 01:34:05,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:34:05,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 01:34:05,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 01:34:05,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +6: [2023-03-17 01:34:05,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +6: [2023-03-17 01:34:05,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +6: [2023-03-17 01:34:05,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 01:34:05,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +6: [2023-03-17 01:34:05,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:34:05,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 01:34:05,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +6: [2023-03-17 01:34:05,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:34:05,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:34:05,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 01:34:05,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 01:34:05,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +6: [2023-03-17 01:34:05,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +6: [2023-03-17 01:34:05,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 01:34:05,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 01:34:05,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +1: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +1: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +3: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 01:34:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +3: [2023-03-17 01:34:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +3: [2023-03-17 01:34:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 01:34:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 01:34:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +0: [2023-03-17 01:34:05,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 01:34:05,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 01:34:05,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +4: [2023-03-17 01:34:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:34:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 01:34:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 01:34:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 01:34:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 01:34:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 01:34:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 01:34:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +4: [2023-03-17 01:34:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +4: [2023-03-17 01:34:05,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 01:34:05,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 01:34:05,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +0: [2023-03-17 01:34:05,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 01:34:05,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 01:34:05,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 01:34:05,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 01:34:05,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 01:34:05,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 01:34:05,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 01:34:05,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 01:34:05,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 01:34:05,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +2: [2023-03-17 01:34:05,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 01:34:05,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 01:34:05,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 01:34:05,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 01:34:05,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 01:34:05,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 01:34:05,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 01:34:05,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +7: [2023-03-17 01:34:05,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +7: [2023-03-17 01:34:05,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! +0: successfully saved checkpoint at iteration 20000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 720.96 +7: iteration 20010/ 173500 | consumed samples: 5122560 | consumed tokens: 10491002880 | elapsed time per iteration (s): 0.51 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.177847E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 500.463 | TFLOPs: 26.26 | +7: iteration 20020/ 173500 | consumed samples: 5125120 | consumed tokens: 10496245760 | elapsed time per iteration (s): 0.44 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.183256E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.260 | TFLOPs: 30.81 | +7: iteration 20030/ 173500 | consumed samples: 5127680 | consumed tokens: 10501488640 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.169353E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.419 | TFLOPs: 31.87 | +7: iteration 20040/ 173500 | consumed samples: 5130240 | consumed tokens: 10506731520 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.177487E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.400 | TFLOPs: 31.61 | +7: iteration 20050/ 173500 | consumed samples: 5132800 | consumed tokens: 10511974400 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.178102E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.140 | TFLOPs: 31.12 | +7: iteration 20060/ 173500 | consumed samples: 5135360 | consumed tokens: 10517217280 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.168347E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.953 | TFLOPs: 31.22 | +7: iteration 20070/ 173500 | consumed samples: 5137920 | consumed tokens: 10522460160 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.165268E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.692 | TFLOPs: 31.36 | +7: iteration 20080/ 173500 | consumed samples: 5140480 | consumed tokens: 10527703040 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.160152E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.016 | TFLOPs: 31.17 | +7: iteration 20090/ 173500 | consumed samples: 5143040 | consumed tokens: 10532945920 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.172885E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.050 | TFLOPs: 31.80 | +7: iteration 20100/ 173500 | consumed samples: 5145600 | consumed tokens: 10538188800 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.164187E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.009 | TFLOPs: 30.90 | +7: iteration 20110/ 173500 | consumed samples: 5148160 | consumed tokens: 10543431680 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.167878E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.813 | TFLOPs: 31.21 | +7: iteration 20120/ 173500 | consumed samples: 5150720 | consumed tokens: 10548674560 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.177804E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.811 | TFLOPs: 31.31 | +7: iteration 20130/ 173500 | consumed samples: 5153280 | consumed tokens: 10553917440 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.181790E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.963 | TFLOPs: 31.43 | +7: iteration 20140/ 173500 | consumed samples: 5155840 | consumed tokens: 10559160320 | elapsed time per iteration (s): 0.42 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.159204E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.275 | TFLOPs: 31.76 | +7: iteration 20150/ 173500 | consumed samples: 5158400 | consumed tokens: 10564403200 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.164969E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.831 | TFLOPs: 31.37 | +7: iteration 20160/ 173500 | consumed samples: 5160960 | consumed tokens: 10569646080 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.165223E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.412 | TFLOPs: 31.56 | +7: iteration 20170/ 173500 | consumed samples: 5163520 | consumed tokens: 10574888960 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.175214E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.625 | TFLOPs: 31.36 | +7: iteration 20180/ 173500 | consumed samples: 5166080 | consumed tokens: 10580131840 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.167606E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.494 | TFLOPs: 31.19 | +7: iteration 20190/ 173500 | consumed samples: 5168640 | consumed tokens: 10585374720 | elapsed time per iteration (s): 0.44 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.163942E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.684 | TFLOPs: 30.78 | +7: iteration 20200/ 173500 | consumed samples: 5171200 | consumed tokens: 10590617600 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.175063E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.953 | TFLOPs: 31.01 | +7: iteration 20210/ 173500 | consumed samples: 5173760 | consumed tokens: 10595860480 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.169842E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.310 | TFLOPs: 31.08 | +7: iteration 20220/ 173500 | consumed samples: 5176320 | consumed tokens: 10601103360 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.156048E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.253 | TFLOPs: 31.39 | +7: iteration 20230/ 173500 | consumed samples: 5178880 | consumed tokens: 10606346240 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.166296E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.318 | TFLOPs: 31.18 | +7: iteration 20240/ 173500 | consumed samples: 5181440 | consumed tokens: 10611589120 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.176842E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.605 | TFLOPs: 31.41 | +7: iteration 20250/ 173500 | consumed samples: 5184000 | consumed tokens: 10616832000 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.155507E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.999 | TFLOPs: 31.11 | +7: iteration 20260/ 173500 | consumed samples: 5186560 | consumed tokens: 10622074880 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.173439E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.242 | TFLOPs: 31.28 | +7: iteration 20270/ 173500 | consumed samples: 5189120 | consumed tokens: 10627317760 | elapsed time per iteration (s): 0.44 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.163989E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.161 | TFLOPs: 30.55 | +7: iteration 20280/ 173500 | consumed samples: 5191680 | consumed tokens: 10632560640 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.165853E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.044 | TFLOPs: 31.33 | +7: iteration 20290/ 173500 | consumed samples: 5194240 | consumed tokens: 10637803520 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.172329E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.245 | TFLOPs: 31.44 | +7: iteration 20300/ 173500 | consumed samples: 5196800 | consumed tokens: 10643046400 | elapsed time per iteration (s): 0.44 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.170231E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.686 | TFLOPs: 30.63 | +7: iteration 20310/ 173500 | consumed samples: 5199360 | consumed tokens: 10648289280 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.177449E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.410 | TFLOPs: 31.19 | +7: iteration 20320/ 173500 | consumed samples: 5201920 | consumed tokens: 10653532160 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.167709E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.309 | TFLOPs: 31.60 | +7: iteration 20330/ 173500 | consumed samples: 5204480 | consumed tokens: 10658775040 | elapsed time per iteration (s): 0.44 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.166117E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.413 | TFLOPs: 30.82 | +7: iteration 20340/ 173500 | consumed samples: 5207040 | consumed tokens: 10664017920 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.169816E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.474 | TFLOPs: 31.56 | +7: iteration 20350/ 173500 | consumed samples: 5209600 | consumed tokens: 10669260800 | elapsed time per iteration (s): 0.44 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.160723E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.410 | TFLOPs: 30.66 | +7: iteration 20360/ 173500 | consumed samples: 5212160 | consumed tokens: 10674503680 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.165161E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.690 | TFLOPs: 31.05 | +7: iteration 20370/ 173500 | consumed samples: 5214720 | consumed tokens: 10679746560 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.177536E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.599 | TFLOPs: 31.15 | +7: iteration 20380/ 173500 | consumed samples: 5217280 | consumed tokens: 10684989440 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.154995E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.170 | TFLOPs: 31.12 | +7: iteration 20390/ 173500 | consumed samples: 5219840 | consumed tokens: 10690232320 | elapsed time per iteration (s): 0.44 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.161887E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.512 | TFLOPs: 30.83 | +7: iteration 20400/ 173500 | consumed samples: 5222400 | consumed tokens: 10695475200 | elapsed time per iteration (s): 0.44 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.170065E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.449 | TFLOPs: 30.72 | +7: iteration 20410/ 173500 | consumed samples: 5224960 | consumed tokens: 10700718080 | elapsed time per iteration (s): 0.47 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.164489E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 549.638 | TFLOPs: 28.84 | +7: iteration 20420/ 173500 | consumed samples: 5227520 | consumed tokens: 10705960960 | elapsed time per iteration (s): 0.44 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.169388E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.733 | TFLOPs: 30.47 | +7: iteration 20430/ 173500 | consumed samples: 5230080 | consumed tokens: 10711203840 | elapsed time per iteration (s): 0.45 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.178195E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.590 | TFLOPs: 30.15 | +7: iteration 20440/ 173500 | consumed samples: 5232640 | consumed tokens: 10716446720 | elapsed time per iteration (s): 0.45 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.168058E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.661 | TFLOPs: 29.57 | +7: iteration 20450/ 173500 | consumed samples: 5235200 | consumed tokens: 10721689600 | elapsed time per iteration (s): 0.44 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.166033E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.984 | TFLOPs: 30.54 | +7: iteration 20460/ 173500 | consumed samples: 5237760 | consumed tokens: 10726932480 | elapsed time per iteration (s): 0.46 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.163997E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.039 | TFLOPs: 29.33 | +7: iteration 20470/ 173500 | consumed samples: 5240320 | consumed tokens: 10732175360 | elapsed time per iteration (s): 0.44 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.155978E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.763 | TFLOPs: 30.52 | +7: iteration 20480/ 173500 | consumed samples: 5242880 | consumed tokens: 10737418240 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.177722E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.581 | TFLOPs: 31.04 | +7: iteration 20490/ 173500 | consumed samples: 5245440 | consumed tokens: 10742661120 | elapsed time per iteration (s): 0.45 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.163773E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.531 | TFLOPs: 29.93 | +7: iteration 20500/ 173500 | consumed samples: 5248000 | consumed tokens: 10747904000 | elapsed time per iteration (s): 0.44 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.170872E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.839 | TFLOPs: 30.74 | +7: iteration 20510/ 173500 | consumed samples: 5250560 | consumed tokens: 10753146880 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.162647E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.271 | TFLOPs: 31.29 | +7: iteration 20520/ 173500 | consumed samples: 5253120 | consumed tokens: 10758389760 | elapsed time per iteration (s): 0.44 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.177144E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.771 | TFLOPs: 30.68 | +7: iteration 20530/ 173500 | consumed samples: 5255680 | consumed tokens: 10763632640 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.171162E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.460 | TFLOPs: 31.45 | +7: iteration 20540/ 173500 | consumed samples: 5258240 | consumed tokens: 10768875520 | elapsed time per iteration (s): 0.44 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.176801E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.993 | TFLOPs: 30.85 | +7: iteration 20550/ 173500 | consumed samples: 5260800 | consumed tokens: 10774118400 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.170565E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.832 | TFLOPs: 31.58 | +7: iteration 20560/ 173500 | consumed samples: 5263360 | consumed tokens: 10779361280 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.162535E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.977 | TFLOPs: 31.16 | +7: iteration 20570/ 173500 | consumed samples: 5265920 | consumed tokens: 10784604160 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.157198E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.747 | TFLOPs: 31.47 | +7: iteration 20580/ 173500 | consumed samples: 5268480 | consumed tokens: 10789847040 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.156844E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.475 | TFLOPs: 31.35 | +7: iteration 20590/ 173500 | consumed samples: 5271040 | consumed tokens: 10795089920 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.170008E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.519 | TFLOPs: 31.14 | +7: iteration 20600/ 173500 | consumed samples: 5273600 | consumed tokens: 10800332800 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.166755E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.347 | TFLOPs: 31.39 | +7: iteration 20610/ 173500 | consumed samples: 5276160 | consumed tokens: 10805575680 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.158707E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.865 | TFLOPs: 30.95 | +7: iteration 20620/ 173500 | consumed samples: 5278720 | consumed tokens: 10810818560 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.165229E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.551 | TFLOPs: 30.93 | +7: iteration 20630/ 173500 | consumed samples: 5281280 | consumed tokens: 10816061440 | elapsed time per iteration (s): 0.42 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.163792E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.999 | TFLOPs: 31.74 | +7: iteration 20640/ 173500 | consumed samples: 5283840 | consumed tokens: 10821304320 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.161562E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.339 | TFLOPs: 31.18 | +7: iteration 20650/ 173500 | consumed samples: 5286400 | consumed tokens: 10826547200 | elapsed time per iteration (s): 0.44 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.155445E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.856 | TFLOPs: 30.69 | +7: iteration 20660/ 173500 | consumed samples: 5288960 | consumed tokens: 10831790080 | elapsed time per iteration (s): 0.42 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.185397E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.110 | TFLOPs: 31.80 | +7: iteration 20670/ 173500 | consumed samples: 5291520 | consumed tokens: 10837032960 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.160706E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.750 | TFLOPs: 31.57 | +7: iteration 20680/ 173500 | consumed samples: 5294080 | consumed tokens: 10842275840 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.159461E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.339 | TFLOPs: 31.55 | +7: iteration 20690/ 173500 | consumed samples: 5296640 | consumed tokens: 10847518720 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.160847E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.330 | TFLOPs: 31.39 | +7: iteration 20700/ 173500 | consumed samples: 5299200 | consumed tokens: 10852761600 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.149973E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.501 | TFLOPs: 30.98 | +7: iteration 20710/ 173500 | consumed samples: 5301760 | consumed tokens: 10858004480 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.154008E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.642 | TFLOPs: 31.67 | +7: iteration 20720/ 173500 | consumed samples: 5304320 | consumed tokens: 10863247360 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.178269E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.654 | TFLOPs: 31.62 | +7: iteration 20730/ 173500 | consumed samples: 5306880 | consumed tokens: 10868490240 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.164416E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.260 | TFLOPs: 31.34 | +7: iteration 20740/ 173500 | consumed samples: 5309440 | consumed tokens: 10873733120 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.171607E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.998 | TFLOPs: 31.06 | +7: iteration 20750/ 173500 | consumed samples: 5312000 | consumed tokens: 10878976000 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.149848E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.198 | TFLOPs: 31.07 | +7: iteration 20760/ 173500 | consumed samples: 5314560 | consumed tokens: 10884218880 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.171944E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.109 | TFLOPs: 31.28 | +7: iteration 20770/ 173500 | consumed samples: 5317120 | consumed tokens: 10889461760 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.153129E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.635 | TFLOPs: 31.36 | +7: iteration 20780/ 173500 | consumed samples: 5319680 | consumed tokens: 10894704640 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.161557E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.045 | TFLOPs: 31.33 | +7: iteration 20790/ 173500 | consumed samples: 5322240 | consumed tokens: 10899947520 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.167595E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.274 | TFLOPs: 31.44 | +7: iteration 20800/ 173500 | consumed samples: 5324800 | consumed tokens: 10905190400 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.158571E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.075 | TFLOPs: 31.22 | +7: iteration 20810/ 173500 | consumed samples: 5327360 | consumed tokens: 10910433280 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.157777E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.735 | TFLOPs: 31.47 | +7: iteration 20820/ 173500 | consumed samples: 5329920 | consumed tokens: 10915676160 | elapsed time per iteration (s): 0.44 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.172150E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.230 | TFLOPs: 30.76 | +7: iteration 20830/ 173500 | consumed samples: 5332480 | consumed tokens: 10920919040 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.165138E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.329 | TFLOPs: 30.92 | +7: iteration 20840/ 173500 | consumed samples: 5335040 | consumed tokens: 10926161920 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.165823E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.066 | TFLOPs: 31.54 | +7: iteration 20850/ 173500 | consumed samples: 5337600 | consumed tokens: 10931404800 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.171482E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.533 | TFLOPs: 31.40 | +7: iteration 20860/ 173500 | consumed samples: 5340160 | consumed tokens: 10936647680 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.163966E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.908 | TFLOPs: 31.32 | +7: iteration 20870/ 173500 | consumed samples: 5342720 | consumed tokens: 10941890560 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.170256E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.682 | TFLOPs: 31.25 | +7: iteration 20880/ 173500 | consumed samples: 5345280 | consumed tokens: 10947133440 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.156310E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.874 | TFLOPs: 31.00 | +7: iteration 20890/ 173500 | consumed samples: 5347840 | consumed tokens: 10952376320 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.166064E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.137 | TFLOPs: 31.02 | +7: iteration 20900/ 173500 | consumed samples: 5350400 | consumed tokens: 10957619200 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.154410E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.676 | TFLOPs: 31.41 | +7: iteration 20910/ 173500 | consumed samples: 5352960 | consumed tokens: 10962862080 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.160789E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.668 | TFLOPs: 31.20 | +7: iteration 20920/ 173500 | consumed samples: 5355520 | consumed tokens: 10968104960 | elapsed time per iteration (s): 0.44 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.149733E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.305 | TFLOPs: 30.81 | +7: iteration 20930/ 173500 | consumed samples: 5358080 | consumed tokens: 10973347840 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.160367E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.516 | TFLOPs: 31.25 | +7: iteration 20940/ 173500 | consumed samples: 5360640 | consumed tokens: 10978590720 | elapsed time per iteration (s): 0.44 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.157197E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.740 | TFLOPs: 30.68 | +7: iteration 20950/ 173500 | consumed samples: 5363200 | consumed tokens: 10983833600 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.172650E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.171 | TFLOPs: 31.12 | +7: iteration 20960/ 173500 | consumed samples: 5365760 | consumed tokens: 10989076480 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.170981E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.006 | TFLOPs: 31.32 | +7: iteration 20970/ 173500 | consumed samples: 5368320 | consumed tokens: 10994319360 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.171337E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.602 | TFLOPs: 31.09 | +7: iteration 20980/ 173500 | consumed samples: 5370880 | consumed tokens: 10999562240 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.176946E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.411 | TFLOPs: 31.56 | +7: iteration 20990/ 173500 | consumed samples: 5373440 | consumed tokens: 11004805120 | elapsed time per iteration (s): 0.44 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.146914E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.876 | TFLOPs: 30.32 | +7: iteration 21000/ 173500 | consumed samples: 5376000 | consumed tokens: 11010048000 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.159358E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.266 | TFLOPs: 31.02 | +7: iteration 21010/ 173500 | consumed samples: 5378560 | consumed tokens: 11015290880 | elapsed time per iteration (s): 0.42 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.150420E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.313 | TFLOPs: 31.81 | +7: iteration 21020/ 173500 | consumed samples: 5381120 | consumed tokens: 11020533760 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.163461E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.166 | TFLOPs: 31.33 | +7: iteration 21030/ 173500 | consumed samples: 5383680 | consumed tokens: 11025776640 | elapsed time per iteration (s): 0.44 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.168158E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.627 | TFLOPs: 30.25 | +7: iteration 21040/ 173500 | consumed samples: 5386240 | consumed tokens: 11031019520 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.158466E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.203 | TFLOPs: 30.97 | +7: iteration 21050/ 173500 | consumed samples: 5388800 | consumed tokens: 11036262400 | elapsed time per iteration (s): 0.44 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.158006E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.032 | TFLOPs: 30.85 | +7: iteration 21060/ 173500 | consumed samples: 5391360 | consumed tokens: 11041505280 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.147457E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.382 | TFLOPs: 31.71 | +7: iteration 21070/ 173500 | consumed samples: 5393920 | consumed tokens: 11046748160 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.162317E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.259 | TFLOPs: 31.13 | +7: iteration 21080/ 173500 | consumed samples: 5396480 | consumed tokens: 11051991040 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.174988E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.846 | TFLOPs: 31.26 | +7: iteration 21090/ 173500 | consumed samples: 5399040 | consumed tokens: 11057233920 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.160017E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.010 | TFLOPs: 31.53 | +7: iteration 21100/ 173500 | consumed samples: 5401600 | consumed tokens: 11062476800 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.147225E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.166 | TFLOPs: 31.12 | +7: iteration 21110/ 173500 | consumed samples: 5404160 | consumed tokens: 11067719680 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.143274E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.559 | TFLOPs: 31.56 | +7: iteration 21120/ 173500 | consumed samples: 5406720 | consumed tokens: 11072962560 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.153045E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.234 | TFLOPs: 31.13 | +7: iteration 21130/ 173500 | consumed samples: 5409280 | consumed tokens: 11078205440 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.152587E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.592 | TFLOPs: 31.04 | +7: iteration 21140/ 173500 | consumed samples: 5411840 | consumed tokens: 11083448320 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.164177E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.597 | TFLOPs: 30.94 | +7: iteration 21150/ 173500 | consumed samples: 5414400 | consumed tokens: 11088691200 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.161293E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.731 | TFLOPs: 31.10 | +7: iteration 21160/ 173500 | consumed samples: 5416960 | consumed tokens: 11093934080 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.158397E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.942 | TFLOPs: 31.79 | +7: iteration 21170/ 173500 | consumed samples: 5419520 | consumed tokens: 11099176960 | elapsed time per iteration (s): 0.46 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.156473E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.300 | TFLOPs: 29.50 | +7: iteration 21180/ 173500 | consumed samples: 5422080 | consumed tokens: 11104419840 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.153123E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.328 | TFLOPs: 31.55 | +7: iteration 21190/ 173500 | consumed samples: 5424640 | consumed tokens: 11109662720 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.169953E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.580 | TFLOPs: 31.35 | +7: iteration 21200/ 173500 | consumed samples: 5427200 | consumed tokens: 11114905600 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.155861E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.775 | TFLOPs: 31.94 | +7: iteration 21210/ 173500 | consumed samples: 5429760 | consumed tokens: 11120148480 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.166158E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.097 | TFLOPs: 31.80 | +7: iteration 21220/ 173500 | consumed samples: 5432320 | consumed tokens: 11125391360 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.157956E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.319 | TFLOPs: 31.66 | +7: iteration 21230/ 173500 | consumed samples: 5434880 | consumed tokens: 11130634240 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.170167E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.913 | TFLOPs: 31.79 | +7: iteration 21240/ 173500 | consumed samples: 5437440 | consumed tokens: 11135877120 | elapsed time per iteration (s): 0.45 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.162585E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.060 | TFLOPs: 30.12 | +7: iteration 21250/ 173500 | consumed samples: 5440000 | consumed tokens: 11141120000 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.156933E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.089 | TFLOPs: 31.43 | +7: iteration 21260/ 173500 | consumed samples: 5442560 | consumed tokens: 11146362880 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.144678E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.949 | TFLOPs: 31.58 | +7: iteration 21270/ 173500 | consumed samples: 5445120 | consumed tokens: 11151605760 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.160901E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.677 | TFLOPs: 31.31 | +7: iteration 21280/ 173500 | consumed samples: 5447680 | consumed tokens: 11156848640 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.147016E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.978 | TFLOPs: 31.17 | +7: iteration 21290/ 173500 | consumed samples: 5450240 | consumed tokens: 11162091520 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.160445E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.240 | TFLOPs: 31.39 | +7: iteration 21300/ 173500 | consumed samples: 5452800 | consumed tokens: 11167334400 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.151617E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.511 | TFLOPs: 31.61 | +7: iteration 21310/ 173500 | consumed samples: 5455360 | consumed tokens: 11172577280 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.154834E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.288 | TFLOPs: 31.71 | +7: iteration 21320/ 173500 | consumed samples: 5457920 | consumed tokens: 11177820160 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.159350E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.490 | TFLOPs: 31.09 | +7: iteration 21330/ 173500 | consumed samples: 5460480 | consumed tokens: 11183063040 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.150577E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.260 | TFLOPs: 31.39 | +7: iteration 21340/ 173500 | consumed samples: 5463040 | consumed tokens: 11188305920 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.141898E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.898 | TFLOPs: 31.58 | +7: iteration 21350/ 173500 | consumed samples: 5465600 | consumed tokens: 11193548800 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.159194E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.429 | TFLOPs: 31.35 | +7: iteration 21360/ 173500 | consumed samples: 5468160 | consumed tokens: 11198791680 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.171711E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.726 | TFLOPs: 31.62 | +7: iteration 21370/ 173500 | consumed samples: 5470720 | consumed tokens: 11204034560 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.161413E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.989 | TFLOPs: 31.48 | +7: iteration 21380/ 173500 | consumed samples: 5473280 | consumed tokens: 11209277440 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.156589E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.600 | TFLOPs: 31.67 | +7: iteration 21390/ 173500 | consumed samples: 5475840 | consumed tokens: 11214520320 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.156081E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.261 | TFLOPs: 31.39 | +7: iteration 21400/ 173500 | consumed samples: 5478400 | consumed tokens: 11219763200 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.155730E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.358 | TFLOPs: 31.29 | +7: iteration 21410/ 173500 | consumed samples: 5480960 | consumed tokens: 11225006080 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.156264E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.254 | TFLOPs: 31.55 | +7: iteration 21420/ 173500 | consumed samples: 5483520 | consumed tokens: 11230248960 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.149706E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.470 | TFLOPs: 31.61 | +7: iteration 21430/ 173500 | consumed samples: 5486080 | consumed tokens: 11235491840 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.165253E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.636 | TFLOPs: 31.62 | +7: iteration 21440/ 173500 | consumed samples: 5488640 | consumed tokens: 11240734720 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.170061E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.029 | TFLOPs: 31.54 | +7: iteration 21450/ 173500 | consumed samples: 5491200 | consumed tokens: 11245977600 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.156427E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.525 | TFLOPs: 31.72 | +7: iteration 21460/ 173500 | consumed samples: 5493760 | consumed tokens: 11251220480 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.152952E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.666 | TFLOPs: 31.73 | +7: iteration 21470/ 173500 | consumed samples: 5496320 | consumed tokens: 11256463360 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.159799E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.058 | TFLOPs: 31.85 | +7: iteration 21480/ 173500 | consumed samples: 5498880 | consumed tokens: 11261706240 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.175371E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.836 | TFLOPs: 31.58 | +7: iteration 21490/ 173500 | consumed samples: 5501440 | consumed tokens: 11266949120 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.158047E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.706 | TFLOPs: 30.89 | +7: iteration 21500/ 173500 | consumed samples: 5504000 | consumed tokens: 11272192000 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.150863E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.342 | TFLOPs: 31.50 | +7: iteration 21510/ 173500 | consumed samples: 5506560 | consumed tokens: 11277434880 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.144713E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.436 | TFLOPs: 31.40 | +7: iteration 21520/ 173500 | consumed samples: 5509120 | consumed tokens: 11282677760 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.149859E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.861 | TFLOPs: 31.32 | +7: iteration 21530/ 173500 | consumed samples: 5511680 | consumed tokens: 11287920640 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.144940E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.234 | TFLOPs: 31.28 | +7: iteration 21540/ 173500 | consumed samples: 5514240 | consumed tokens: 11293163520 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.160572E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.192 | TFLOPs: 31.07 | +7: iteration 21550/ 173500 | consumed samples: 5516800 | consumed tokens: 11298406400 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.146080E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.860 | TFLOPs: 31.42 | +7: iteration 21560/ 173500 | consumed samples: 5519360 | consumed tokens: 11303649280 | elapsed time per iteration (s): 0.44 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.150502E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.537 | TFLOPs: 30.56 | +7: iteration 21570/ 173500 | consumed samples: 5521920 | consumed tokens: 11308892160 | elapsed time per iteration (s): 0.44 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.159013E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.143 | TFLOPs: 30.75 | +7: iteration 21580/ 173500 | consumed samples: 5524480 | consumed tokens: 11314135040 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.161948E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.358 | TFLOPs: 31.34 | +7: iteration 21590/ 173500 | consumed samples: 5527040 | consumed tokens: 11319377920 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.159525E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.761 | TFLOPs: 31.31 | +7: iteration 21600/ 173500 | consumed samples: 5529600 | consumed tokens: 11324620800 | elapsed time per iteration (s): 0.44 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.146000E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.842 | TFLOPs: 30.84 | +7: iteration 21610/ 173500 | consumed samples: 5532160 | consumed tokens: 11329863680 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.150143E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.308 | TFLOPs: 31.34 | +7: iteration 21620/ 173500 | consumed samples: 5534720 | consumed tokens: 11335106560 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.144306E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.030 | TFLOPs: 31.48 | +7: iteration 21630/ 173500 | consumed samples: 5537280 | consumed tokens: 11340349440 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.169927E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.041 | TFLOPs: 31.38 | +7: iteration 21640/ 173500 | consumed samples: 5539840 | consumed tokens: 11345592320 | elapsed time per iteration (s): 0.42 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.146508E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.063 | TFLOPs: 31.64 | +7: iteration 21650/ 173500 | consumed samples: 5542400 | consumed tokens: 11350835200 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.142233E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.489 | TFLOPs: 31.45 | +7: iteration 21660/ 173500 | consumed samples: 5544960 | consumed tokens: 11356078080 | elapsed time per iteration (s): 0.44 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.169148E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.327 | TFLOPs: 30.61 | +7: iteration 21670/ 173500 | consumed samples: 5547520 | consumed tokens: 11361320960 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.153096E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.121 | TFLOPs: 30.96 | +7: iteration 21680/ 173500 | consumed samples: 5550080 | consumed tokens: 11366563840 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.173591E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.460 | TFLOPs: 31.14 | +7: iteration 21690/ 173500 | consumed samples: 5552640 | consumed tokens: 11371806720 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.154113E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.705 | TFLOPs: 31.41 | +7: iteration 21700/ 173500 | consumed samples: 5555200 | consumed tokens: 11377049600 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.145065E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.568 | TFLOPs: 31.35 | +7: iteration 21710/ 173500 | consumed samples: 5557760 | consumed tokens: 11382292480 | elapsed time per iteration (s): 0.42 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.135236E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.740 | TFLOPs: 31.94 | +7: iteration 21720/ 173500 | consumed samples: 5560320 | consumed tokens: 11387535360 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.152185E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.964 | TFLOPs: 31.43 | +7: iteration 21730/ 173500 | consumed samples: 5562880 | consumed tokens: 11392778240 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.148692E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.731 | TFLOPs: 31.68 | +7: iteration 21740/ 173500 | consumed samples: 5565440 | consumed tokens: 11398021120 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.145634E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.737 | TFLOPs: 31.20 | +7: iteration 21750/ 173500 | consumed samples: 5568000 | consumed tokens: 11403264000 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.153278E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.895 | TFLOPs: 31.00 | +7: iteration 21760/ 173500 | consumed samples: 5570560 | consumed tokens: 11408506880 | elapsed time per iteration (s): 0.44 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.157276E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.004 | TFLOPs: 30.69 | +7: iteration 21770/ 173500 | consumed samples: 5573120 | consumed tokens: 11413749760 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.151934E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.856 | TFLOPs: 31.68 | +7: iteration 21780/ 173500 | consumed samples: 5575680 | consumed tokens: 11418992640 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.157201E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.065 | TFLOPs: 31.80 | +7: iteration 21790/ 173500 | consumed samples: 5578240 | consumed tokens: 11424235520 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.147568E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.300 | TFLOPs: 31.18 | +7: iteration 21800/ 173500 | consumed samples: 5580800 | consumed tokens: 11429478400 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.155059E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.628 | TFLOPs: 31.67 | +7: iteration 21810/ 173500 | consumed samples: 5583360 | consumed tokens: 11434721280 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.163175E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.473 | TFLOPs: 31.40 | +7: iteration 21820/ 173500 | consumed samples: 5585920 | consumed tokens: 11439964160 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.146320E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.893 | TFLOPs: 31.69 | +7: iteration 21830/ 173500 | consumed samples: 5588480 | consumed tokens: 11445207040 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.155175E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.730 | TFLOPs: 30.89 | +7: iteration 21840/ 173500 | consumed samples: 5591040 | consumed tokens: 11450449920 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.159962E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.697 | TFLOPs: 31.31 | +7: iteration 21850/ 173500 | consumed samples: 5593600 | consumed tokens: 11455692800 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.166689E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.369 | TFLOPs: 31.71 | +7: iteration 21860/ 173500 | consumed samples: 5596160 | consumed tokens: 11460935680 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.158877E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.467 | TFLOPs: 31.66 | +7: iteration 21870/ 173500 | consumed samples: 5598720 | consumed tokens: 11466178560 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.142399E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.417 | TFLOPs: 31.45 | +7: iteration 21880/ 173500 | consumed samples: 5601280 | consumed tokens: 11471421440 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.154288E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.923 | TFLOPs: 31.48 | +7: iteration 21890/ 173500 | consumed samples: 5603840 | consumed tokens: 11476664320 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.154826E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.583 | TFLOPs: 31.46 | +7: iteration 21900/ 173500 | consumed samples: 5606400 | consumed tokens: 11481907200 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.145728E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.543 | TFLOPs: 31.72 | +7: iteration 21910/ 173500 | consumed samples: 5608960 | consumed tokens: 11487150080 | elapsed time per iteration (s): 0.44 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.142315E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.641 | TFLOPs: 30.47 | +7: iteration 21920/ 173500 | consumed samples: 5611520 | consumed tokens: 11492392960 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.153323E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.943 | TFLOPs: 31.79 | +7: iteration 21930/ 173500 | consumed samples: 5614080 | consumed tokens: 11497635840 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.155415E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.531 | TFLOPs: 31.61 | +7: iteration 21940/ 173500 | consumed samples: 5616640 | consumed tokens: 11502878720 | elapsed time per iteration (s): 0.43 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.144928E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.979 | TFLOPs: 31.17 | +7: iteration 21950/ 173500 | consumed samples: 5619200 | consumed tokens: 11508121600 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.150521E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.478 | TFLOPs: 31.93 | +7: iteration 21960/ 173500 | consumed samples: 5621760 | consumed tokens: 11513364480 | elapsed time per iteration (s): 0.43 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.146487E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.790 | TFLOPs: 31.26 | +7: iteration 21970/ 173500 | consumed samples: 5624320 | consumed tokens: 11518607360 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.158835E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.082 | TFLOPs: 31.91 | +7: iteration 21980/ 173500 | consumed samples: 5626880 | consumed tokens: 11523850240 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.153433E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.404 | TFLOPs: 31.61 | +7: iteration 21990/ 173500 | consumed samples: 5629440 | consumed tokens: 11529093120 | elapsed time per iteration (s): 0.43 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.156900E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.500 | TFLOPs: 30.93 | +0: [2023-03-17 01:48:25,977] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=0, lr=[0.00019388839136370641, 0.00019388839136370641, 0.00019388839136370641], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 22000/ 173500 | consumed samples: 5632000 | consumed tokens: 11534336000 | elapsed time per iteration (s): 0.43 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.155657E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.063 | TFLOPs: 31.17 | +0: steps: 22000 loss: 3.2041 iter time (s): 0.428 samples/sec: 597.857 +7: iteration 22010/ 173500 | consumed samples: 5634560 | consumed tokens: 11539578880 | elapsed time per iteration (s): 0.43 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.153324E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.997 | TFLOPs: 31.22 | +7: iteration 22020/ 173500 | consumed samples: 5637120 | consumed tokens: 11544821760 | elapsed time per iteration (s): 0.43 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.161064E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.139 | TFLOPs: 30.96 | +7: iteration 22030/ 173500 | consumed samples: 5639680 | consumed tokens: 11550064640 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.161333E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.547 | TFLOPs: 31.72 | +7: iteration 22040/ 173500 | consumed samples: 5642240 | consumed tokens: 11555307520 | elapsed time per iteration (s): 0.43 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.126395E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.178 | TFLOPs: 31.39 | +7: iteration 22050/ 173500 | consumed samples: 5644800 | consumed tokens: 11560550400 | elapsed time per iteration (s): 0.43 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.148674E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.620 | TFLOPs: 31.20 | +7: iteration 22060/ 173500 | consumed samples: 5647360 | consumed tokens: 11565793280 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.140768E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.107 | TFLOPs: 31.75 | +7: iteration 22070/ 173500 | consumed samples: 5649920 | consumed tokens: 11571036160 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.162486E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.609 | TFLOPs: 31.93 | +7: iteration 22080/ 173500 | consumed samples: 5652480 | consumed tokens: 11576279040 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.150305E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.338 | TFLOPs: 31.92 | +7: iteration 22090/ 173500 | consumed samples: 5655040 | consumed tokens: 11581521920 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.142316E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.391 | TFLOPs: 31.66 | +7: iteration 22100/ 173500 | consumed samples: 5657600 | consumed tokens: 11586764800 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.156897E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.938 | TFLOPs: 31.06 | +7: iteration 22110/ 173500 | consumed samples: 5660160 | consumed tokens: 11592007680 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.160446E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.466 | TFLOPs: 31.61 | +7: iteration 22120/ 173500 | consumed samples: 5662720 | consumed tokens: 11597250560 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.163635E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.684 | TFLOPs: 31.25 | +7: iteration 22130/ 173500 | consumed samples: 5665280 | consumed tokens: 11602493440 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.156391E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.789 | TFLOPs: 31.68 | +7: iteration 22140/ 173500 | consumed samples: 5667840 | consumed tokens: 11607736320 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.153446E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.640 | TFLOPs: 31.15 | +7: iteration 22150/ 173500 | consumed samples: 5670400 | consumed tokens: 11612979200 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.160988E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.681 | TFLOPs: 31.36 | +7: iteration 22160/ 173500 | consumed samples: 5672960 | consumed tokens: 11618222080 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.141231E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.967 | TFLOPs: 31.48 | +7: iteration 22170/ 173500 | consumed samples: 5675520 | consumed tokens: 11623464960 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.165389E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.400 | TFLOPs: 31.50 | +7: iteration 22180/ 173500 | consumed samples: 5678080 | consumed tokens: 11628707840 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.147403E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.563 | TFLOPs: 31.88 | +7: iteration 22190/ 173500 | consumed samples: 5680640 | consumed tokens: 11633950720 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.150197E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.998 | TFLOPs: 31.85 | +7: iteration 22200/ 173500 | consumed samples: 5683200 | consumed tokens: 11639193600 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.144990E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.421 | TFLOPs: 31.66 | +7: iteration 22210/ 173500 | consumed samples: 5685760 | consumed tokens: 11644436480 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.151118E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.300 | TFLOPs: 31.23 | +7: iteration 22220/ 173500 | consumed samples: 5688320 | consumed tokens: 11649679360 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.140526E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.284 | TFLOPs: 31.39 | +7: iteration 22230/ 173500 | consumed samples: 5690880 | consumed tokens: 11654922240 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.148498E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.639 | TFLOPs: 31.57 | +7: iteration 22240/ 173500 | consumed samples: 5693440 | consumed tokens: 11660165120 | elapsed time per iteration (s): 0.44 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.145012E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.355 | TFLOPs: 30.61 | +7: iteration 22250/ 173500 | consumed samples: 5696000 | consumed tokens: 11665408000 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.133809E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.383 | TFLOPs: 31.40 | +7: iteration 22260/ 173500 | consumed samples: 5698560 | consumed tokens: 11670650880 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.152901E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.274 | TFLOPs: 31.23 | +7: iteration 22270/ 173500 | consumed samples: 5701120 | consumed tokens: 11675893760 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.149045E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.478 | TFLOPs: 31.24 | +7: iteration 22280/ 173500 | consumed samples: 5703680 | consumed tokens: 11681136640 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.145868E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.428 | TFLOPs: 31.50 | +7: iteration 22290/ 173500 | consumed samples: 5706240 | consumed tokens: 11686379520 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.127348E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.711 | TFLOPs: 31.41 | +7: iteration 22300/ 173500 | consumed samples: 5708800 | consumed tokens: 11691622400 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.155655E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.854 | TFLOPs: 31.74 | +7: iteration 22310/ 173500 | consumed samples: 5711360 | consumed tokens: 11696865280 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.146670E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.189 | TFLOPs: 31.44 | +7: iteration 22320/ 173500 | consumed samples: 5713920 | consumed tokens: 11702108160 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.153862E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.686 | TFLOPs: 31.67 | +7: iteration 22330/ 173500 | consumed samples: 5716480 | consumed tokens: 11707351040 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.154923E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.673 | TFLOPs: 31.67 | +7: iteration 22340/ 173500 | consumed samples: 5719040 | consumed tokens: 11712593920 | elapsed time per iteration (s): 0.44 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.131222E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.567 | TFLOPs: 30.78 | +7: iteration 22350/ 173500 | consumed samples: 5721600 | consumed tokens: 11717836800 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.131013E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.848 | TFLOPs: 31.42 | +7: iteration 22360/ 173500 | consumed samples: 5724160 | consumed tokens: 11723079680 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.145229E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.932 | TFLOPs: 31.32 | +7: iteration 22370/ 173500 | consumed samples: 5726720 | consumed tokens: 11728322560 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.150021E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.506 | TFLOPs: 30.93 | +7: iteration 22380/ 173500 | consumed samples: 5729280 | consumed tokens: 11733565440 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.148580E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.124 | TFLOPs: 31.28 | +7: iteration 22390/ 173500 | consumed samples: 5731840 | consumed tokens: 11738808320 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.146721E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.132 | TFLOPs: 31.44 | +7: iteration 22400/ 173500 | consumed samples: 5734400 | consumed tokens: 11744051200 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.150698E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.801 | TFLOPs: 31.79 | +7: iteration 22410/ 173500 | consumed samples: 5736960 | consumed tokens: 11749294080 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.141156E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.070 | TFLOPs: 31.90 | +7: iteration 22420/ 173500 | consumed samples: 5739520 | consumed tokens: 11754536960 | elapsed time per iteration (s): 0.43 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.145042E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.707 | TFLOPs: 31.41 | +7: iteration 22430/ 173500 | consumed samples: 5742080 | consumed tokens: 11759779840 | elapsed time per iteration (s): 0.44 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.153929E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.290 | TFLOPs: 30.55 | +7: iteration 22440/ 173500 | consumed samples: 5744640 | consumed tokens: 11765022720 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.161529E+00 | grad norm: 0.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.577 | TFLOPs: 31.77 | +7: iteration 22450/ 173500 | consumed samples: 5747200 | consumed tokens: 11770265600 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.228624E+00 | grad norm: 0.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.153 | TFLOPs: 31.70 | +7: iteration 22460/ 173500 | consumed samples: 5749760 | consumed tokens: 11775508480 | elapsed time per iteration (s): 0.43 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.187276E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.452 | TFLOPs: 31.50 | +7: iteration 22470/ 173500 | consumed samples: 5752320 | consumed tokens: 11780751360 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.188914E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.534 | TFLOPs: 31.72 | +7: iteration 22480/ 173500 | consumed samples: 5754880 | consumed tokens: 11785994240 | elapsed time per iteration (s): 0.44 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.174330E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.065 | TFLOPs: 30.85 | +7: iteration 22490/ 173500 | consumed samples: 5757440 | consumed tokens: 11791237120 | elapsed time per iteration (s): 0.43 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.173503E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.463 | TFLOPs: 31.56 | +7: iteration 22500/ 173500 | consumed samples: 5760000 | consumed tokens: 11796480000 | elapsed time per iteration (s): 0.46 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.157352E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.269 | TFLOPs: 29.50 | +7: iteration 22510/ 173500 | consumed samples: 5762560 | consumed tokens: 11801722880 | elapsed time per iteration (s): 0.45 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.141663E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.648 | TFLOPs: 29.78 | +7: iteration 22520/ 173500 | consumed samples: 5765120 | consumed tokens: 11806965760 | elapsed time per iteration (s): 0.43 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.151686E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.967 | TFLOPs: 31.53 | +7: iteration 22530/ 173500 | consumed samples: 5767680 | consumed tokens: 11812208640 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.145486E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.755 | TFLOPs: 31.94 | +7: iteration 22540/ 173500 | consumed samples: 5770240 | consumed tokens: 11817451520 | elapsed time per iteration (s): 0.43 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.156322E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.271 | TFLOPs: 31.39 | +7: iteration 22550/ 173500 | consumed samples: 5772800 | consumed tokens: 11822694400 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.149065E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.512 | TFLOPs: 31.72 | +7: iteration 22560/ 173500 | consumed samples: 5775360 | consumed tokens: 11827937280 | elapsed time per iteration (s): 0.43 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.163619E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.589 | TFLOPs: 31.41 | +7: iteration 22570/ 173500 | consumed samples: 5777920 | consumed tokens: 11833180160 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.135750E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.611 | TFLOPs: 31.57 | +7: iteration 22580/ 173500 | consumed samples: 5780480 | consumed tokens: 11838423040 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.143034E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.023 | TFLOPs: 31.69 | +7: iteration 22590/ 173500 | consumed samples: 5783040 | consumed tokens: 11843665920 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.149386E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.498 | TFLOPs: 31.51 | +7: iteration 22600/ 173500 | consumed samples: 5785600 | consumed tokens: 11848908800 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.170257E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.125 | TFLOPs: 31.75 | +7: iteration 22610/ 173500 | consumed samples: 5788160 | consumed tokens: 11854151680 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.137926E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.351 | TFLOPs: 31.39 | +7: iteration 22620/ 173500 | consumed samples: 5790720 | consumed tokens: 11859394560 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.153713E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.720 | TFLOPs: 31.26 | +7: iteration 22630/ 173500 | consumed samples: 5793280 | consumed tokens: 11864637440 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.140456E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.950 | TFLOPs: 31.48 | +7: iteration 22640/ 173500 | consumed samples: 5795840 | consumed tokens: 11869880320 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.138746E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.298 | TFLOPs: 31.60 | +7: iteration 22650/ 173500 | consumed samples: 5798400 | consumed tokens: 11875123200 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.139773E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.551 | TFLOPs: 31.51 | +7: iteration 22660/ 173500 | consumed samples: 5800960 | consumed tokens: 11880366080 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.154349E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.381 | TFLOPs: 31.61 | +7: iteration 22670/ 173500 | consumed samples: 5803520 | consumed tokens: 11885608960 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.154373E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.953 | TFLOPs: 31.53 | +7: iteration 22680/ 173500 | consumed samples: 5806080 | consumed tokens: 11890851840 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.150196E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.698 | TFLOPs: 31.88 | +7: iteration 22690/ 173500 | consumed samples: 5808640 | consumed tokens: 11896094720 | elapsed time per iteration (s): 0.44 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.147133E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.370 | TFLOPs: 30.71 | +7: iteration 22700/ 173500 | consumed samples: 5811200 | consumed tokens: 11901337600 | elapsed time per iteration (s): 0.44 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.161354E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.696 | TFLOPs: 30.73 | +7: iteration 22710/ 173500 | consumed samples: 5813760 | consumed tokens: 11906580480 | elapsed time per iteration (s): 0.45 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.137810E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.056 | TFLOPs: 30.07 | +7: iteration 22720/ 173500 | consumed samples: 5816320 | consumed tokens: 11911823360 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.162043E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.194 | TFLOPs: 31.49 | +7: iteration 22730/ 173500 | consumed samples: 5818880 | consumed tokens: 11917066240 | elapsed time per iteration (s): 0.43 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.151432E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.009 | TFLOPs: 30.90 | +7: iteration 22740/ 173500 | consumed samples: 5821440 | consumed tokens: 11922309120 | elapsed time per iteration (s): 0.44 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.141866E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.418 | TFLOPs: 30.40 | +7: iteration 22750/ 173500 | consumed samples: 5824000 | consumed tokens: 11927552000 | elapsed time per iteration (s): 0.45 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.145585E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.617 | TFLOPs: 30.10 | +7: iteration 22760/ 173500 | consumed samples: 5826560 | consumed tokens: 11932794880 | elapsed time per iteration (s): 0.44 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.154757E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.257 | TFLOPs: 30.24 | +7: iteration 22770/ 173500 | consumed samples: 5829120 | consumed tokens: 11938037760 | elapsed time per iteration (s): 0.44 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.155450E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.793 | TFLOPs: 30.58 | +7: iteration 22780/ 173500 | consumed samples: 5831680 | consumed tokens: 11943280640 | elapsed time per iteration (s): 0.43 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.151366E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.805 | TFLOPs: 31.16 | +7: iteration 22790/ 173500 | consumed samples: 5834240 | consumed tokens: 11948523520 | elapsed time per iteration (s): 0.44 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.134413E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.072 | TFLOPs: 30.23 | +7: iteration 22800/ 173500 | consumed samples: 5836800 | consumed tokens: 11953766400 | elapsed time per iteration (s): 0.44 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.136189E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.938 | TFLOPs: 30.53 | +7: iteration 22810/ 173500 | consumed samples: 5839360 | consumed tokens: 11959009280 | elapsed time per iteration (s): 0.44 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.146144E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.918 | TFLOPs: 30.74 | +7: iteration 22820/ 173500 | consumed samples: 5841920 | consumed tokens: 11964252160 | elapsed time per iteration (s): 0.43 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.143533E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.707 | TFLOPs: 31.57 | +7: iteration 22830/ 173500 | consumed samples: 5844480 | consumed tokens: 11969495040 | elapsed time per iteration (s): 0.43 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.142607E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.877 | TFLOPs: 31.26 | +7: iteration 22840/ 173500 | consumed samples: 5847040 | consumed tokens: 11974737920 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.149063E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.325 | TFLOPs: 31.81 | +7: iteration 22850/ 173500 | consumed samples: 5849600 | consumed tokens: 11979980800 | elapsed time per iteration (s): 0.43 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.146784E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.586 | TFLOPs: 31.56 | +7: iteration 22860/ 173500 | consumed samples: 5852160 | consumed tokens: 11985223680 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.139522E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.728 | TFLOPs: 31.62 | +7: iteration 22870/ 173500 | consumed samples: 5854720 | consumed tokens: 11990466560 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.132972E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.718 | TFLOPs: 31.68 | +7: iteration 22880/ 173500 | consumed samples: 5857280 | consumed tokens: 11995709440 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.154273E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.471 | TFLOPs: 31.61 | +7: iteration 22890/ 173500 | consumed samples: 5859840 | consumed tokens: 12000952320 | elapsed time per iteration (s): 0.43 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.148425E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.391 | TFLOPs: 31.19 | +7: iteration 22900/ 173500 | consumed samples: 5862400 | consumed tokens: 12006195200 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.136219E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.231 | TFLOPs: 31.86 | +7: iteration 22910/ 173500 | consumed samples: 5864960 | consumed tokens: 12011438080 | elapsed time per iteration (s): 0.44 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.160552E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.030 | TFLOPs: 30.64 | +7: iteration 22920/ 173500 | consumed samples: 5867520 | consumed tokens: 12016680960 | elapsed time per iteration (s): 0.43 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.148209E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.190 | TFLOPs: 31.49 | +7: iteration 22930/ 173500 | consumed samples: 5870080 | consumed tokens: 12021923840 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.141042E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.703 | TFLOPs: 31.73 | +7: iteration 22940/ 173500 | consumed samples: 5872640 | consumed tokens: 12027166720 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.140334E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.066 | TFLOPs: 31.64 | +7: iteration 22950/ 173500 | consumed samples: 5875200 | consumed tokens: 12032409600 | elapsed time per iteration (s): 0.45 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.122418E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.264 | TFLOPs: 29.97 | +7: iteration 22960/ 173500 | consumed samples: 5877760 | consumed tokens: 12037652480 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.135020E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.444 | TFLOPs: 31.77 | +7: iteration 22970/ 173500 | consumed samples: 5880320 | consumed tokens: 12042895360 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.137650E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.066 | TFLOPs: 31.80 | +7: iteration 22980/ 173500 | consumed samples: 5882880 | consumed tokens: 12048138240 | elapsed time per iteration (s): 0.44 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.133737E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.103 | TFLOPs: 30.65 | +7: iteration 22990/ 173500 | consumed samples: 5885440 | consumed tokens: 12053381120 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.138870E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.785 | TFLOPs: 31.68 | +7: iteration 23000/ 173500 | consumed samples: 5888000 | consumed tokens: 12058624000 | elapsed time per iteration (s): 0.43 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.128095E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.091 | TFLOPs: 31.38 | +7: iteration 23010/ 173500 | consumed samples: 5890560 | consumed tokens: 12063866880 | elapsed time per iteration (s): 0.44 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.132736E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.995 | TFLOPs: 30.33 | +7: iteration 23020/ 173500 | consumed samples: 5893120 | consumed tokens: 12069109760 | elapsed time per iteration (s): 0.43 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.149723E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.805 | TFLOPs: 31.37 | +7: iteration 23030/ 173500 | consumed samples: 5895680 | consumed tokens: 12074352640 | elapsed time per iteration (s): 0.43 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.146285E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.802 | TFLOPs: 31.31 | +7: iteration 23040/ 173500 | consumed samples: 5898240 | consumed tokens: 12079595520 | elapsed time per iteration (s): 0.43 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.148879E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.053 | TFLOPs: 31.38 | +7: iteration 23050/ 173500 | consumed samples: 5900800 | consumed tokens: 12084838400 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.135770E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.744 | TFLOPs: 31.83 | +7: iteration 23060/ 173500 | consumed samples: 5903360 | consumed tokens: 12090081280 | elapsed time per iteration (s): 0.43 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.128028E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.508 | TFLOPs: 31.56 | +7: iteration 23070/ 173500 | consumed samples: 5905920 | consumed tokens: 12095324160 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.143693E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.660 | TFLOPs: 31.67 | +7: iteration 23080/ 173500 | consumed samples: 5908480 | consumed tokens: 12100567040 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.151653E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.011 | TFLOPs: 31.90 | +7: iteration 23090/ 173500 | consumed samples: 5911040 | consumed tokens: 12105809920 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.127872E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.525 | TFLOPs: 31.67 | +7: iteration 23100/ 173500 | consumed samples: 5913600 | consumed tokens: 12111052800 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.143851E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.475 | TFLOPs: 31.66 | +7: iteration 23110/ 173500 | consumed samples: 5916160 | consumed tokens: 12116295680 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.140573E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.807 | TFLOPs: 31.68 | +7: iteration 23120/ 173500 | consumed samples: 5918720 | consumed tokens: 12121538560 | elapsed time per iteration (s): 0.43 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.141203E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.431 | TFLOPs: 31.35 | +7: iteration 23130/ 173500 | consumed samples: 5921280 | consumed tokens: 12126781440 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.148420E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.679 | TFLOPs: 31.78 | +7: iteration 23140/ 173500 | consumed samples: 5923840 | consumed tokens: 12132024320 | elapsed time per iteration (s): 0.44 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.119153E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.026 | TFLOPs: 30.64 | +7: iteration 23150/ 173500 | consumed samples: 5926400 | consumed tokens: 12137267200 | elapsed time per iteration (s): 0.43 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.129178E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.888 | TFLOPs: 31.00 | +7: iteration 23160/ 173500 | consumed samples: 5928960 | consumed tokens: 12142510080 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.137058E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.018 | TFLOPs: 31.90 | +7: iteration 23170/ 173500 | consumed samples: 5931520 | consumed tokens: 12147752960 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.152755E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.062 | TFLOPs: 31.75 | +7: iteration 23180/ 173500 | consumed samples: 5934080 | consumed tokens: 12152995840 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.152206E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.089 | TFLOPs: 31.85 | +7: iteration 23190/ 173500 | consumed samples: 5936640 | consumed tokens: 12158238720 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.136365E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.521 | TFLOPs: 31.88 | +7: iteration 23200/ 173500 | consumed samples: 5939200 | consumed tokens: 12163481600 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.144380E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.154 | TFLOPs: 31.86 | +7: iteration 23210/ 173500 | consumed samples: 5941760 | consumed tokens: 12168724480 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.156408E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.460 | TFLOPs: 31.56 | +7: iteration 23220/ 173500 | consumed samples: 5944320 | consumed tokens: 12173967360 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.146863E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.108 | TFLOPs: 31.43 | +7: iteration 23230/ 173500 | consumed samples: 5946880 | consumed tokens: 12179210240 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.135661E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.686 | TFLOPs: 31.57 | +7: iteration 23240/ 173500 | consumed samples: 5949440 | consumed tokens: 12184453120 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.137327E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.668 | TFLOPs: 31.20 | +7: iteration 23250/ 173500 | consumed samples: 5952000 | consumed tokens: 12189696000 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.138835E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.524 | TFLOPs: 31.56 | +7: iteration 23260/ 173500 | consumed samples: 5954560 | consumed tokens: 12194938880 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.137041E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.329 | TFLOPs: 31.87 | +7: iteration 23270/ 173500 | consumed samples: 5957120 | consumed tokens: 12200181760 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.126481E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.094 | TFLOPs: 31.64 | +7: iteration 23280/ 173500 | consumed samples: 5959680 | consumed tokens: 12205424640 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.148425E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.283 | TFLOPs: 31.65 | +7: iteration 23290/ 173500 | consumed samples: 5962240 | consumed tokens: 12210667520 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.136630E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.404 | TFLOPs: 31.45 | +7: iteration 23300/ 173500 | consumed samples: 5964800 | consumed tokens: 12215910400 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.130982E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.331 | TFLOPs: 31.87 | +7: iteration 23310/ 173500 | consumed samples: 5967360 | consumed tokens: 12221153280 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.144056E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.006 | TFLOPs: 31.85 | +7: iteration 23320/ 173500 | consumed samples: 5969920 | consumed tokens: 12226396160 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.146807E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.978 | TFLOPs: 31.69 | +7: iteration 23330/ 173500 | consumed samples: 5972480 | consumed tokens: 12231639040 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.147862E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.757 | TFLOPs: 31.73 | +7: iteration 23340/ 173500 | consumed samples: 5975040 | consumed tokens: 12236881920 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.127596E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.273 | TFLOPs: 31.86 | +7: iteration 23350/ 173500 | consumed samples: 5977600 | consumed tokens: 12242124800 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.143993E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.951 | TFLOPs: 31.37 | +7: iteration 23360/ 173500 | consumed samples: 5980160 | consumed tokens: 12247367680 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.146244E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.868 | TFLOPs: 31.37 | +7: iteration 23370/ 173500 | consumed samples: 5982720 | consumed tokens: 12252610560 | elapsed time per iteration (s): 0.43 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.129967E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.895 | TFLOPs: 31.21 | +7: iteration 23380/ 173500 | consumed samples: 5985280 | consumed tokens: 12257853440 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.145210E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.047 | TFLOPs: 31.85 | +7: iteration 23390/ 173500 | consumed samples: 5987840 | consumed tokens: 12263096320 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.142931E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.010 | TFLOPs: 31.85 | +7: iteration 23400/ 173500 | consumed samples: 5990400 | consumed tokens: 12268339200 | elapsed time per iteration (s): 0.43 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.141752E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.359 | TFLOPs: 31.39 | +7: iteration 23410/ 173500 | consumed samples: 5992960 | consumed tokens: 12273582080 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.136055E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.499 | TFLOPs: 31.82 | +7: iteration 23420/ 173500 | consumed samples: 5995520 | consumed tokens: 12278824960 | elapsed time per iteration (s): 0.43 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.150511E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.476 | TFLOPs: 31.51 | +7: iteration 23430/ 173500 | consumed samples: 5998080 | consumed tokens: 12284067840 | elapsed time per iteration (s): 0.43 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.115881E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.409 | TFLOPs: 31.08 | +7: iteration 23440/ 173500 | consumed samples: 6000640 | consumed tokens: 12289310720 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.148416E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.598 | TFLOPs: 31.88 | +7: iteration 23450/ 173500 | consumed samples: 6003200 | consumed tokens: 12294553600 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.141506E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.648 | TFLOPs: 31.72 | +7: iteration 23460/ 173500 | consumed samples: 6005760 | consumed tokens: 12299796480 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.135619E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.282 | TFLOPs: 31.86 | +7: iteration 23470/ 173500 | consumed samples: 6008320 | consumed tokens: 12305039360 | elapsed time per iteration (s): 0.43 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.137036E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.005 | TFLOPs: 31.53 | +7: iteration 23480/ 173500 | consumed samples: 6010880 | consumed tokens: 12310282240 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.133215E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.895 | TFLOPs: 31.74 | +7: iteration 23490/ 173500 | consumed samples: 6013440 | consumed tokens: 12315525120 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.135450E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.918 | TFLOPs: 31.84 | +7: iteration 23500/ 173500 | consumed samples: 6016000 | consumed tokens: 12320768000 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.138747E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.241 | TFLOPs: 31.70 | +7: iteration 23510/ 173500 | consumed samples: 6018560 | consumed tokens: 12326010880 | elapsed time per iteration (s): 0.43 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.128917E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.292 | TFLOPs: 31.60 | +7: iteration 23520/ 173500 | consumed samples: 6021120 | consumed tokens: 12331253760 | elapsed time per iteration (s): 0.43 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.153842E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.334 | TFLOPs: 31.50 | +7: iteration 23530/ 173500 | consumed samples: 6023680 | consumed tokens: 12336496640 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.146884E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.245 | TFLOPs: 31.76 | +7: iteration 23540/ 173500 | consumed samples: 6026240 | consumed tokens: 12341739520 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.131767E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.317 | TFLOPs: 31.86 | +7: iteration 23550/ 173500 | consumed samples: 6028800 | consumed tokens: 12346982400 | elapsed time per iteration (s): 0.43 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.134548E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.521 | TFLOPs: 31.30 | +7: iteration 23560/ 173500 | consumed samples: 6031360 | consumed tokens: 12352225280 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.132179E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.485 | TFLOPs: 31.77 | +7: iteration 23570/ 173500 | consumed samples: 6033920 | consumed tokens: 12357468160 | elapsed time per iteration (s): 0.43 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.120913E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.401 | TFLOPs: 31.50 | +7: iteration 23580/ 173500 | consumed samples: 6036480 | consumed tokens: 12362711040 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.157323E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.761 | TFLOPs: 31.63 | +7: iteration 23590/ 173500 | consumed samples: 6039040 | consumed tokens: 12367953920 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.137032E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.061 | TFLOPs: 31.85 | +7: iteration 23600/ 173500 | consumed samples: 6041600 | consumed tokens: 12373196800 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.141516E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.675 | TFLOPs: 31.73 | +7: iteration 23610/ 173500 | consumed samples: 6044160 | consumed tokens: 12378439680 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.139385E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.656 | TFLOPs: 31.73 | +7: iteration 23620/ 173500 | consumed samples: 6046720 | consumed tokens: 12383682560 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.137188E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.422 | TFLOPs: 31.71 | +7: iteration 23630/ 173500 | consumed samples: 6049280 | consumed tokens: 12388925440 | elapsed time per iteration (s): 0.43 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.141728E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.965 | TFLOPs: 31.43 | +7: iteration 23640/ 173500 | consumed samples: 6051840 | consumed tokens: 12394168320 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.131504E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.068 | TFLOPs: 31.69 | +7: iteration 23650/ 173500 | consumed samples: 6054400 | consumed tokens: 12399411200 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.127217E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.602 | TFLOPs: 31.88 | +7: iteration 23660/ 173500 | consumed samples: 6056960 | consumed tokens: 12404654080 | elapsed time per iteration (s): 0.43 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.119530E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.492 | TFLOPs: 31.51 | +7: iteration 23670/ 173500 | consumed samples: 6059520 | consumed tokens: 12409896960 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.136851E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.460 | TFLOPs: 31.87 | +7: iteration 23680/ 173500 | consumed samples: 6062080 | consumed tokens: 12415139840 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.129056E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.644 | TFLOPs: 31.72 | +7: iteration 23690/ 173500 | consumed samples: 6064640 | consumed tokens: 12420382720 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.138458E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.952 | TFLOPs: 31.58 | +7: iteration 23700/ 173500 | consumed samples: 6067200 | consumed tokens: 12425625600 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.138008E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.061 | TFLOPs: 31.43 | +7: iteration 23710/ 173500 | consumed samples: 6069760 | consumed tokens: 12430868480 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.133840E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.812 | TFLOPs: 31.68 | +7: iteration 23720/ 173500 | consumed samples: 6072320 | consumed tokens: 12436111360 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.126508E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.152 | TFLOPs: 31.86 | +7: iteration 23730/ 173500 | consumed samples: 6074880 | consumed tokens: 12441354240 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.139861E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.524 | TFLOPs: 31.88 | +7: iteration 23740/ 173500 | consumed samples: 6077440 | consumed tokens: 12446597120 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.125970E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.250 | TFLOPs: 31.49 | +7: iteration 23750/ 173500 | consumed samples: 6080000 | consumed tokens: 12451840000 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.144876E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.099 | TFLOPs: 31.07 | +7: iteration 23760/ 173500 | consumed samples: 6082560 | consumed tokens: 12457082880 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.138382E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.304 | TFLOPs: 31.86 | +7: iteration 23770/ 173500 | consumed samples: 6085120 | consumed tokens: 12462325760 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.133332E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.881 | TFLOPs: 31.68 | +7: iteration 23780/ 173500 | consumed samples: 6087680 | consumed tokens: 12467568640 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.136997E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.937 | TFLOPs: 31.84 | +7: iteration 23790/ 173500 | consumed samples: 6090240 | consumed tokens: 12472811520 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.134152E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.695 | TFLOPs: 31.41 | +7: iteration 23800/ 173500 | consumed samples: 6092800 | consumed tokens: 12478054400 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.132196E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.680 | TFLOPs: 31.73 | +7: iteration 23810/ 173500 | consumed samples: 6095360 | consumed tokens: 12483297280 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.131414E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.000 | TFLOPs: 31.85 | +7: iteration 23820/ 173500 | consumed samples: 6097920 | consumed tokens: 12488540160 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.134295E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.977 | TFLOPs: 31.53 | +7: iteration 23830/ 173500 | consumed samples: 6100480 | consumed tokens: 12493783040 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.136745E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.571 | TFLOPs: 31.67 | +7: iteration 23840/ 173500 | consumed samples: 6103040 | consumed tokens: 12499025920 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.137184E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.072 | TFLOPs: 31.22 | +7: iteration 23850/ 173500 | consumed samples: 6105600 | consumed tokens: 12504268800 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.138440E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.963 | TFLOPs: 31.85 | +7: iteration 23860/ 173500 | consumed samples: 6108160 | consumed tokens: 12509511680 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.135141E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.799 | TFLOPs: 31.26 | +7: iteration 23870/ 173500 | consumed samples: 6110720 | consumed tokens: 12514754560 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.160482E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.315 | TFLOPs: 31.65 | +7: iteration 23880/ 173500 | consumed samples: 6113280 | consumed tokens: 12519997440 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.129219E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.084 | TFLOPs: 31.59 | +7: iteration 23890/ 173500 | consumed samples: 6115840 | consumed tokens: 12525240320 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.127331E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.423 | TFLOPs: 31.61 | +7: iteration 23900/ 173500 | consumed samples: 6118400 | consumed tokens: 12530483200 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.132311E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.748 | TFLOPs: 31.89 | +7: iteration 23910/ 173500 | consumed samples: 6120960 | consumed tokens: 12535726080 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.140424E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.378 | TFLOPs: 31.24 | +7: iteration 23920/ 173500 | consumed samples: 6123520 | consumed tokens: 12540968960 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.140007E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.565 | TFLOPs: 31.88 | +7: iteration 23930/ 173500 | consumed samples: 6126080 | consumed tokens: 12546211840 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.135367E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.255 | TFLOPs: 31.60 | +7: iteration 23940/ 173500 | consumed samples: 6128640 | consumed tokens: 12551454720 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.144712E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.822 | TFLOPs: 31.37 | +7: iteration 23950/ 173500 | consumed samples: 6131200 | consumed tokens: 12556697600 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.120128E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.300 | TFLOPs: 31.44 | +7: iteration 23960/ 173500 | consumed samples: 6133760 | consumed tokens: 12561940480 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.136736E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.036 | TFLOPs: 31.17 | +7: iteration 23970/ 173500 | consumed samples: 6136320 | consumed tokens: 12567183360 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.126429E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.551 | TFLOPs: 31.67 | +7: iteration 23980/ 173500 | consumed samples: 6138880 | consumed tokens: 12572426240 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.129164E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.681 | TFLOPs: 31.62 | +7: iteration 23990/ 173500 | consumed samples: 6141440 | consumed tokens: 12577669120 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.130844E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.512 | TFLOPs: 31.72 | +0: [2023-03-17 02:02:40,155] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=0, lr=[0.00019264004235759096, 0.00019264004235759096, 0.00019264004235759096], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 24000/ 173500 | consumed samples: 6144000 | consumed tokens: 12582912000 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.127372E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.839 | TFLOPs: 31.84 | +0: steps: 24000 loss: 3.1211 iter time (s): 0.425 samples/sec: 602.746 +7: iteration 24010/ 173500 | consumed samples: 6146560 | consumed tokens: 12588154880 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.136149E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.159 | TFLOPs: 31.54 | +7: iteration 24020/ 173500 | consumed samples: 6149120 | consumed tokens: 12593397760 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.134276E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.782 | TFLOPs: 31.84 | +7: iteration 24030/ 173500 | consumed samples: 6151680 | consumed tokens: 12598640640 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.129732E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.807 | TFLOPs: 31.37 | +7: iteration 24040/ 173500 | consumed samples: 6154240 | consumed tokens: 12603883520 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.114272E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.827 | TFLOPs: 31.42 | +7: iteration 24050/ 173500 | consumed samples: 6156800 | consumed tokens: 12609126400 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.135672E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.500 | TFLOPs: 31.82 | +7: iteration 24060/ 173500 | consumed samples: 6159360 | consumed tokens: 12614369280 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.129571E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.642 | TFLOPs: 31.62 | +7: iteration 24070/ 173500 | consumed samples: 6161920 | consumed tokens: 12619612160 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.138525E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.271 | TFLOPs: 31.34 | +7: iteration 24080/ 173500 | consumed samples: 6164480 | consumed tokens: 12624855040 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.131178E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.242 | TFLOPs: 31.55 | +7: iteration 24090/ 173500 | consumed samples: 6167040 | consumed tokens: 12630097920 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.132539E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.435 | TFLOPs: 31.56 | +7: iteration 24100/ 173500 | consumed samples: 6169600 | consumed tokens: 12635340800 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.129411E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.818 | TFLOPs: 31.58 | +7: iteration 24110/ 173500 | consumed samples: 6172160 | consumed tokens: 12640583680 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.127733E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.594 | TFLOPs: 31.83 | +7: iteration 24120/ 173500 | consumed samples: 6174720 | consumed tokens: 12645826560 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.140735E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.135 | TFLOPs: 31.49 | +7: iteration 24130/ 173500 | consumed samples: 6177280 | consumed tokens: 12651069440 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.132178E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.255 | TFLOPs: 31.55 | +7: iteration 24140/ 173500 | consumed samples: 6179840 | consumed tokens: 12656312320 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.125336E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.750 | TFLOPs: 31.84 | +7: iteration 24150/ 173500 | consumed samples: 6182400 | consumed tokens: 12661555200 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.114100E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.612 | TFLOPs: 31.83 | +7: iteration 24160/ 173500 | consumed samples: 6184960 | consumed tokens: 12666798080 | elapsed time per iteration (s): 0.43 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.128781E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.123 | TFLOPs: 31.33 | +7: iteration 24170/ 173500 | consumed samples: 6187520 | consumed tokens: 12672040960 | elapsed time per iteration (s): 0.43 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.141548E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.974 | TFLOPs: 31.53 | +7: iteration 24180/ 173500 | consumed samples: 6190080 | consumed tokens: 12677283840 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.141664E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.783 | TFLOPs: 31.84 | +7: iteration 24190/ 173500 | consumed samples: 6192640 | consumed tokens: 12682526720 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.133055E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.724 | TFLOPs: 31.83 | +7: iteration 24200/ 173500 | consumed samples: 6195200 | consumed tokens: 12687769600 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.135984E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.936 | TFLOPs: 31.84 | +7: iteration 24210/ 173500 | consumed samples: 6197760 | consumed tokens: 12693012480 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.130803E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.752 | TFLOPs: 31.84 | +7: iteration 24220/ 173500 | consumed samples: 6200320 | consumed tokens: 12698255360 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.120435E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.607 | TFLOPs: 31.78 | +7: iteration 24230/ 173500 | consumed samples: 6202880 | consumed tokens: 12703498240 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.134839E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.839 | TFLOPs: 31.63 | +7: iteration 24240/ 173500 | consumed samples: 6205440 | consumed tokens: 12708741120 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.115162E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.740 | TFLOPs: 31.83 | +7: iteration 24250/ 173500 | consumed samples: 6208000 | consumed tokens: 12713984000 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.124946E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.410 | TFLOPs: 31.71 | +7: iteration 24260/ 173500 | consumed samples: 6210560 | consumed tokens: 12719226880 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.141596E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.535 | TFLOPs: 31.77 | +7: iteration 24270/ 173500 | consumed samples: 6213120 | consumed tokens: 12724469760 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.129572E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.187 | TFLOPs: 31.86 | +7: iteration 24280/ 173500 | consumed samples: 6215680 | consumed tokens: 12729712640 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.123082E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.698 | TFLOPs: 31.83 | +7: iteration 24290/ 173500 | consumed samples: 6218240 | consumed tokens: 12734955520 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.124503E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.911 | TFLOPs: 31.84 | +7: iteration 24300/ 173500 | consumed samples: 6220800 | consumed tokens: 12740198400 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.116989E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.654 | TFLOPs: 31.83 | +7: iteration 24310/ 173500 | consumed samples: 6223360 | consumed tokens: 12745441280 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.125585E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.922 | TFLOPs: 31.84 | +7: iteration 24320/ 173500 | consumed samples: 6225920 | consumed tokens: 12750684160 | elapsed time per iteration (s): 0.43 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.113142E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.085 | TFLOPs: 31.43 | +7: iteration 24330/ 173500 | consumed samples: 6228480 | consumed tokens: 12755927040 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.130314E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.317 | TFLOPs: 31.71 | +7: iteration 24340/ 173500 | consumed samples: 6231040 | consumed tokens: 12761169920 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.136621E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.072 | TFLOPs: 31.85 | +7: iteration 24350/ 173500 | consumed samples: 6233600 | consumed tokens: 12766412800 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.123442E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.005 | TFLOPs: 31.85 | +7: iteration 24360/ 173500 | consumed samples: 6236160 | consumed tokens: 12771655680 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.143452E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.038 | TFLOPs: 31.85 | +7: iteration 24370/ 173500 | consumed samples: 6238720 | consumed tokens: 12776898560 | elapsed time per iteration (s): 0.43 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.124585E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.209 | TFLOPs: 31.28 | +7: iteration 24380/ 173500 | consumed samples: 6241280 | consumed tokens: 12782141440 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.132296E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.165 | TFLOPs: 31.86 | +7: iteration 24390/ 173500 | consumed samples: 6243840 | consumed tokens: 12787384320 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.125466E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.012 | TFLOPs: 31.85 | +7: iteration 24400/ 173500 | consumed samples: 6246400 | consumed tokens: 12792627200 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.125094E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.240 | TFLOPs: 31.76 | +7: iteration 24410/ 173500 | consumed samples: 6248960 | consumed tokens: 12797870080 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.128781E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.940 | TFLOPs: 31.64 | +7: iteration 24420/ 173500 | consumed samples: 6251520 | consumed tokens: 12803112960 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.118118E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.046 | TFLOPs: 31.85 | +7: iteration 24430/ 173500 | consumed samples: 6254080 | consumed tokens: 12808355840 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.116394E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.599 | TFLOPs: 31.77 | +7: iteration 24440/ 173500 | consumed samples: 6256640 | consumed tokens: 12813598720 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.142042E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.716 | TFLOPs: 31.78 | +7: iteration 24450/ 173500 | consumed samples: 6259200 | consumed tokens: 12818841600 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.139253E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.457 | TFLOPs: 31.82 | +7: iteration 24460/ 173500 | consumed samples: 6261760 | consumed tokens: 12824084480 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.136047E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.888 | TFLOPs: 31.63 | +7: iteration 24470/ 173500 | consumed samples: 6264320 | consumed tokens: 12829327360 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.125684E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.498 | TFLOPs: 31.82 | +7: iteration 24480/ 173500 | consumed samples: 6266880 | consumed tokens: 12834570240 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.133136E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.886 | TFLOPs: 31.84 | +7: iteration 24490/ 173500 | consumed samples: 6269440 | consumed tokens: 12839813120 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.130397E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.637 | TFLOPs: 31.83 | +7: iteration 24500/ 173500 | consumed samples: 6272000 | consumed tokens: 12845056000 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.125092E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.436 | TFLOPs: 31.82 | +7: iteration 24510/ 173500 | consumed samples: 6274560 | consumed tokens: 12850298880 | elapsed time per iteration (s): 0.43 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.132806E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.540 | TFLOPs: 31.56 | +7: iteration 24520/ 173500 | consumed samples: 6277120 | consumed tokens: 12855541760 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.130388E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.164 | TFLOPs: 31.70 | +7: iteration 24530/ 173500 | consumed samples: 6279680 | consumed tokens: 12860784640 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.131265E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.226 | TFLOPs: 31.81 | +7: iteration 24540/ 173500 | consumed samples: 6282240 | consumed tokens: 12866027520 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.120170E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.757 | TFLOPs: 31.63 | +7: iteration 24550/ 173500 | consumed samples: 6284800 | consumed tokens: 12871270400 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.147650E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.043 | TFLOPs: 31.69 | +7: iteration 24560/ 173500 | consumed samples: 6287360 | consumed tokens: 12876513280 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.129883E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.922 | TFLOPs: 31.84 | +7: iteration 24570/ 173500 | consumed samples: 6289920 | consumed tokens: 12881756160 | elapsed time per iteration (s): 0.43 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.139516E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.333 | TFLOPs: 31.55 | +7: iteration 24580/ 173500 | consumed samples: 6292480 | consumed tokens: 12886999040 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.141920E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.640 | TFLOPs: 31.83 | +7: iteration 24590/ 173500 | consumed samples: 6295040 | consumed tokens: 12892241920 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.130761E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.505 | TFLOPs: 31.82 | +7: iteration 24600/ 173500 | consumed samples: 6297600 | consumed tokens: 12897484800 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.135872E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.242 | TFLOPs: 31.81 | +7: iteration 24610/ 173500 | consumed samples: 6300160 | consumed tokens: 12902727680 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.116300E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.495 | TFLOPs: 31.66 | +7: iteration 24620/ 173500 | consumed samples: 6302720 | consumed tokens: 12907970560 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.126657E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.578 | TFLOPs: 31.83 | +7: iteration 24630/ 173500 | consumed samples: 6305280 | consumed tokens: 12913213440 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.125152E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.542 | TFLOPs: 31.82 | +7: iteration 24640/ 173500 | consumed samples: 6307840 | consumed tokens: 12918456320 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.129910E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.839 | TFLOPs: 31.63 | +7: iteration 24650/ 173500 | consumed samples: 6310400 | consumed tokens: 12923699200 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.121423E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.588 | TFLOPs: 31.83 | +7: iteration 24660/ 173500 | consumed samples: 6312960 | consumed tokens: 12928942080 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.118567E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.614 | TFLOPs: 31.83 | +7: iteration 24670/ 173500 | consumed samples: 6315520 | consumed tokens: 12934184960 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.133447E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.465 | TFLOPs: 31.61 | +7: iteration 24680/ 173500 | consumed samples: 6318080 | consumed tokens: 12939427840 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.138355E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.580 | TFLOPs: 31.83 | +7: iteration 24690/ 173500 | consumed samples: 6320640 | consumed tokens: 12944670720 | elapsed time per iteration (s): 0.43 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.135820E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.905 | TFLOPs: 31.48 | +7: iteration 24700/ 173500 | consumed samples: 6323200 | consumed tokens: 12949913600 | elapsed time per iteration (s): 0.43 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.122228E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.557 | TFLOPs: 31.20 | +7: iteration 24710/ 173500 | consumed samples: 6325760 | consumed tokens: 12955156480 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.108181E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.390 | TFLOPs: 31.82 | +7: iteration 24720/ 173500 | consumed samples: 6328320 | consumed tokens: 12960399360 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.127436E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.761 | TFLOPs: 31.84 | +7: iteration 24730/ 173500 | consumed samples: 6330880 | consumed tokens: 12965642240 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.138998E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.488 | TFLOPs: 31.82 | +7: iteration 24740/ 173500 | consumed samples: 6333440 | consumed tokens: 12970885120 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.125169E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.153 | TFLOPs: 31.80 | +7: iteration 24750/ 173500 | consumed samples: 6336000 | consumed tokens: 12976128000 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.137516E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.455 | TFLOPs: 31.82 | +7: iteration 24760/ 173500 | consumed samples: 6338560 | consumed tokens: 12981370880 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.132637E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.022 | TFLOPs: 31.80 | +7: iteration 24770/ 173500 | consumed samples: 6341120 | consumed tokens: 12986613760 | elapsed time per iteration (s): 0.43 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.124538E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.126 | TFLOPs: 31.23 | +7: iteration 24780/ 173500 | consumed samples: 6343680 | consumed tokens: 12991856640 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.121817E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.427 | TFLOPs: 31.82 | +7: iteration 24790/ 173500 | consumed samples: 6346240 | consumed tokens: 12997099520 | elapsed time per iteration (s): 0.43 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.134476E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.762 | TFLOPs: 31.57 | +7: iteration 24800/ 173500 | consumed samples: 6348800 | consumed tokens: 13002342400 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.136308E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.979 | TFLOPs: 31.79 | +7: iteration 24810/ 173500 | consumed samples: 6351360 | consumed tokens: 13007585280 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.121586E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.696 | TFLOPs: 31.83 | +7: iteration 24820/ 173500 | consumed samples: 6353920 | consumed tokens: 13012828160 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.129495E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.783 | TFLOPs: 31.84 | +7: iteration 24830/ 173500 | consumed samples: 6356480 | consumed tokens: 13018071040 | elapsed time per iteration (s): 0.43 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.120835E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.600 | TFLOPs: 31.51 | +7: iteration 24840/ 173500 | consumed samples: 6359040 | consumed tokens: 13023313920 | elapsed time per iteration (s): 0.43 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.135706E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.685 | TFLOPs: 31.57 | +7: iteration 24850/ 173500 | consumed samples: 6361600 | consumed tokens: 13028556800 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.135550E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.964 | TFLOPs: 31.85 | +7: iteration 24860/ 173500 | consumed samples: 6364160 | consumed tokens: 13033799680 | elapsed time per iteration (s): 0.43 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.125673E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.258 | TFLOPs: 31.55 | +7: iteration 24870/ 173500 | consumed samples: 6366720 | consumed tokens: 13039042560 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.121297E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.977 | TFLOPs: 31.69 | +7: iteration 24880/ 173500 | consumed samples: 6369280 | consumed tokens: 13044285440 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.131787E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.787 | TFLOPs: 31.84 | +7: iteration 24890/ 173500 | consumed samples: 6371840 | consumed tokens: 13049528320 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.114706E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.528 | TFLOPs: 31.82 | +7: iteration 24900/ 173500 | consumed samples: 6374400 | consumed tokens: 13054771200 | elapsed time per iteration (s): 0.43 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.122291E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.698 | TFLOPs: 31.26 | +7: iteration 24910/ 173500 | consumed samples: 6376960 | consumed tokens: 13060014080 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.133948E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.705 | TFLOPs: 31.83 | +7: iteration 24920/ 173500 | consumed samples: 6379520 | consumed tokens: 13065256960 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.140188E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.411 | TFLOPs: 31.76 | +7: iteration 24930/ 173500 | consumed samples: 6382080 | consumed tokens: 13070499840 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.114776E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.664 | TFLOPs: 31.78 | +7: iteration 24940/ 173500 | consumed samples: 6384640 | consumed tokens: 13075742720 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.118792E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.683 | TFLOPs: 31.83 | +7: iteration 24950/ 173500 | consumed samples: 6387200 | consumed tokens: 13080985600 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.133689E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.862 | TFLOPs: 31.84 | +7: iteration 24960/ 173500 | consumed samples: 6389760 | consumed tokens: 13086228480 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.120153E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.092 | TFLOPs: 31.85 | +7: iteration 24970/ 173500 | consumed samples: 6392320 | consumed tokens: 13091471360 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.115824E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.187 | TFLOPs: 31.86 | +7: iteration 24980/ 173500 | consumed samples: 6394880 | consumed tokens: 13096714240 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.124307E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.023 | TFLOPs: 31.85 | +7: iteration 24990/ 173500 | consumed samples: 6397440 | consumed tokens: 13101957120 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.120844E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.334 | TFLOPs: 31.87 | +7: iteration 25000/ 173500 | consumed samples: 6400000 | consumed tokens: 13107200000 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.120721E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.698 | TFLOPs: 31.83 | +7: iteration 25010/ 173500 | consumed samples: 6402560 | consumed tokens: 13112442880 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.109309E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.757 | TFLOPs: 31.84 | +7: iteration 25020/ 173500 | consumed samples: 6405120 | consumed tokens: 13117685760 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.123486E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.061 | TFLOPs: 31.85 | +7: iteration 25030/ 173500 | consumed samples: 6407680 | consumed tokens: 13122928640 | elapsed time per iteration (s): 0.44 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.127691E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.617 | TFLOPs: 30.78 | +7: iteration 25040/ 173500 | consumed samples: 6410240 | consumed tokens: 13128171520 | elapsed time per iteration (s): 0.43 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.117877E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.884 | TFLOPs: 31.37 | +7: iteration 25050/ 173500 | consumed samples: 6412800 | consumed tokens: 13133414400 | elapsed time per iteration (s): 0.43 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.122155E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.632 | TFLOPs: 31.41 | +7: iteration 25060/ 173500 | consumed samples: 6415360 | consumed tokens: 13138657280 | elapsed time per iteration (s): 0.43 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.128702E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.586 | TFLOPs: 31.41 | +7: iteration 25070/ 173500 | consumed samples: 6417920 | consumed tokens: 13143900160 | elapsed time per iteration (s): 0.44 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.129222E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.936 | TFLOPs: 30.64 | +7: iteration 25080/ 173500 | consumed samples: 6420480 | consumed tokens: 13149143040 | elapsed time per iteration (s): 0.43 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.126197E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.595 | TFLOPs: 31.41 | +7: iteration 25090/ 173500 | consumed samples: 6423040 | consumed tokens: 13154385920 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.129051E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.514 | TFLOPs: 31.67 | +7: iteration 25100/ 173500 | consumed samples: 6425600 | consumed tokens: 13159628800 | elapsed time per iteration (s): 0.43 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.134739E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.358 | TFLOPs: 31.55 | +7: iteration 25110/ 173500 | consumed samples: 6428160 | consumed tokens: 13164871680 | elapsed time per iteration (s): 0.44 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.122501E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.179 | TFLOPs: 30.44 | +7: iteration 25120/ 173500 | consumed samples: 6430720 | consumed tokens: 13170114560 | elapsed time per iteration (s): 0.44 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.116405E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.222 | TFLOPs: 30.71 | +7: iteration 25130/ 173500 | consumed samples: 6433280 | consumed tokens: 13175357440 | elapsed time per iteration (s): 0.44 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.123246E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.214 | TFLOPs: 30.81 | +7: iteration 25140/ 173500 | consumed samples: 6435840 | consumed tokens: 13180600320 | elapsed time per iteration (s): 0.44 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.123808E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.959 | TFLOPs: 30.32 | +7: iteration 25150/ 173500 | consumed samples: 6438400 | consumed tokens: 13185843200 | elapsed time per iteration (s): 0.43 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.114939E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.796 | TFLOPs: 31.16 | +7: iteration 25160/ 173500 | consumed samples: 6440960 | consumed tokens: 13191086080 | elapsed time per iteration (s): 0.44 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.118693E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.891 | TFLOPs: 30.79 | +7: iteration 25170/ 173500 | consumed samples: 6443520 | consumed tokens: 13196328960 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.124638E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.826 | TFLOPs: 31.94 | +7: iteration 25180/ 173500 | consumed samples: 6446080 | consumed tokens: 13201571840 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.101800E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.663 | TFLOPs: 31.88 | +7: iteration 25190/ 173500 | consumed samples: 6448640 | consumed tokens: 13206814720 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.123816E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.141 | TFLOPs: 31.75 | +7: iteration 25200/ 173500 | consumed samples: 6451200 | consumed tokens: 13212057600 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.125244E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.515 | TFLOPs: 31.88 | +7: iteration 25210/ 173500 | consumed samples: 6453760 | consumed tokens: 13217300480 | elapsed time per iteration (s): 0.43 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.131815E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.818 | TFLOPs: 31.58 | +7: iteration 25220/ 173500 | consumed samples: 6456320 | consumed tokens: 13222543360 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.112408E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.114 | TFLOPs: 31.70 | +7: iteration 25230/ 173500 | consumed samples: 6458880 | consumed tokens: 13227786240 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.127011E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.043 | TFLOPs: 31.85 | +7: iteration 25240/ 173500 | consumed samples: 6461440 | consumed tokens: 13233029120 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.114322E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.009 | TFLOPs: 31.85 | +7: iteration 25250/ 173500 | consumed samples: 6464000 | consumed tokens: 13238272000 | elapsed time per iteration (s): 0.43 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.122787E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.888 | TFLOPs: 31.58 | +7: iteration 25260/ 173500 | consumed samples: 6466560 | consumed tokens: 13243514880 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.125467E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.931 | TFLOPs: 31.84 | +7: iteration 25270/ 173500 | consumed samples: 6469120 | consumed tokens: 13248757760 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.121656E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.097 | TFLOPs: 31.85 | +7: iteration 25280/ 173500 | consumed samples: 6471680 | consumed tokens: 13254000640 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.108106E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.233 | TFLOPs: 31.86 | +7: iteration 25290/ 173500 | consumed samples: 6474240 | consumed tokens: 13259243520 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.119225E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.968 | TFLOPs: 31.85 | +7: iteration 25300/ 173500 | consumed samples: 6476800 | consumed tokens: 13264486400 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.114360E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.830 | TFLOPs: 31.84 | +7: iteration 25310/ 173500 | consumed samples: 6479360 | consumed tokens: 13269729280 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.129434E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.222 | TFLOPs: 31.86 | +7: iteration 25320/ 173500 | consumed samples: 6481920 | consumed tokens: 13274972160 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.146139E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.959 | TFLOPs: 31.85 | +7: iteration 25330/ 173500 | consumed samples: 6484480 | consumed tokens: 13280215040 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.115017E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.909 | TFLOPs: 31.84 | +7: iteration 25340/ 173500 | consumed samples: 6487040 | consumed tokens: 13285457920 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.116259E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.034 | TFLOPs: 31.85 | +7: iteration 25350/ 173500 | consumed samples: 6489600 | consumed tokens: 13290700800 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.130818E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.997 | TFLOPs: 31.85 | +7: iteration 25360/ 173500 | consumed samples: 6492160 | consumed tokens: 13295943680 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.122371E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.450 | TFLOPs: 31.82 | +7: iteration 25370/ 173500 | consumed samples: 6494720 | consumed tokens: 13301186560 | elapsed time per iteration (s): 0.43 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.127727E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.668 | TFLOPs: 31.36 | +7: iteration 25380/ 173500 | consumed samples: 6497280 | consumed tokens: 13306429440 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.126951E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.379 | TFLOPs: 31.82 | +7: iteration 25390/ 173500 | consumed samples: 6499840 | consumed tokens: 13311672320 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.125264E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.838 | TFLOPs: 31.63 | +7: iteration 25400/ 173500 | consumed samples: 6502400 | consumed tokens: 13316915200 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.119379E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.077 | TFLOPs: 31.85 | +7: iteration 25410/ 173500 | consumed samples: 6504960 | consumed tokens: 13322158080 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.115764E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.271 | TFLOPs: 31.81 | +7: iteration 25420/ 173500 | consumed samples: 6507520 | consumed tokens: 13327400960 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.128902E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.579 | TFLOPs: 31.83 | +7: iteration 25430/ 173500 | consumed samples: 6510080 | consumed tokens: 13332643840 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.134204E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.455 | TFLOPs: 31.82 | +7: iteration 25440/ 173500 | consumed samples: 6512640 | consumed tokens: 13337886720 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.108764E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.707 | TFLOPs: 31.83 | +7: iteration 25450/ 173500 | consumed samples: 6515200 | consumed tokens: 13343129600 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.122060E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.313 | TFLOPs: 31.76 | +7: iteration 25460/ 173500 | consumed samples: 6517760 | consumed tokens: 13348372480 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.111712E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.595 | TFLOPs: 31.83 | +7: iteration 25470/ 173500 | consumed samples: 6520320 | consumed tokens: 13353615360 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.122251E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.635 | TFLOPs: 31.83 | +7: iteration 25480/ 173500 | consumed samples: 6522880 | consumed tokens: 13358858240 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.120952E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.363 | TFLOPs: 31.81 | +7: iteration 25490/ 173500 | consumed samples: 6525440 | consumed tokens: 13364101120 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.119195E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.135 | TFLOPs: 31.80 | +7: iteration 25500/ 173500 | consumed samples: 6528000 | consumed tokens: 13369344000 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.112993E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.447 | TFLOPs: 31.82 | +7: iteration 25510/ 173500 | consumed samples: 6530560 | consumed tokens: 13374586880 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.114468E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.677 | TFLOPs: 31.83 | +7: iteration 25520/ 173500 | consumed samples: 6533120 | consumed tokens: 13379829760 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.121045E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.523 | TFLOPs: 31.82 | +7: iteration 25530/ 173500 | consumed samples: 6535680 | consumed tokens: 13385072640 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.117653E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.417 | TFLOPs: 31.82 | +7: iteration 25540/ 173500 | consumed samples: 6538240 | consumed tokens: 13390315520 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.122741E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.158 | TFLOPs: 31.80 | +7: iteration 25550/ 173500 | consumed samples: 6540800 | consumed tokens: 13395558400 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.123557E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.623 | TFLOPs: 31.83 | +7: iteration 25560/ 173500 | consumed samples: 6543360 | consumed tokens: 13400801280 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.124968E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.301 | TFLOPs: 31.81 | +7: iteration 25570/ 173500 | consumed samples: 6545920 | consumed tokens: 13406044160 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.130976E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.527 | TFLOPs: 31.82 | +7: iteration 25580/ 173500 | consumed samples: 6548480 | consumed tokens: 13411287040 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.121254E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.527 | TFLOPs: 31.82 | +7: iteration 25590/ 173500 | consumed samples: 6551040 | consumed tokens: 13416529920 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.133575E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.174 | TFLOPs: 31.80 | +7: iteration 25600/ 173500 | consumed samples: 6553600 | consumed tokens: 13421772800 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.109409E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.357 | TFLOPs: 31.81 | +7: iteration 25610/ 173500 | consumed samples: 6556160 | consumed tokens: 13427015680 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.123620E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.555 | TFLOPs: 31.82 | +7: iteration 25620/ 173500 | consumed samples: 6558720 | consumed tokens: 13432258560 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.109314E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.967 | TFLOPs: 31.79 | +7: iteration 25630/ 173500 | consumed samples: 6561280 | consumed tokens: 13437501440 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.108405E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.595 | TFLOPs: 31.83 | +7: iteration 25640/ 173500 | consumed samples: 6563840 | consumed tokens: 13442744320 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.139711E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.029 | TFLOPs: 31.85 | +7: iteration 25650/ 173500 | consumed samples: 6566400 | consumed tokens: 13447987200 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.108989E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.250 | TFLOPs: 31.81 | +7: iteration 25660/ 173500 | consumed samples: 6568960 | consumed tokens: 13453230080 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.138476E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.154 | TFLOPs: 31.80 | +7: iteration 25670/ 173500 | consumed samples: 6571520 | consumed tokens: 13458472960 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.112527E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.542 | TFLOPs: 31.82 | +7: iteration 25680/ 173500 | consumed samples: 6574080 | consumed tokens: 13463715840 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.112365E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.225 | TFLOPs: 31.81 | +7: iteration 25690/ 173500 | consumed samples: 6576640 | consumed tokens: 13468958720 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.107794E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.329 | TFLOPs: 31.71 | +7: iteration 25700/ 173500 | consumed samples: 6579200 | consumed tokens: 13474201600 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.133060E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.784 | TFLOPs: 31.84 | +7: iteration 25710/ 173500 | consumed samples: 6581760 | consumed tokens: 13479444480 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.127739E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.441 | TFLOPs: 31.82 | +7: iteration 25720/ 173500 | consumed samples: 6584320 | consumed tokens: 13484687360 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.111287E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.996 | TFLOPs: 31.80 | +7: iteration 25730/ 173500 | consumed samples: 6586880 | consumed tokens: 13489930240 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.106900E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.533 | TFLOPs: 31.82 | +7: iteration 25740/ 173500 | consumed samples: 6589440 | consumed tokens: 13495173120 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.118572E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.565 | TFLOPs: 31.83 | +7: iteration 25750/ 173500 | consumed samples: 6592000 | consumed tokens: 13500416000 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.114508E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.306 | TFLOPs: 31.81 | +7: iteration 25760/ 173500 | consumed samples: 6594560 | consumed tokens: 13505658880 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.114162E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.418 | TFLOPs: 31.82 | +7: iteration 25770/ 173500 | consumed samples: 6597120 | consumed tokens: 13510901760 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.134200E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.705 | TFLOPs: 31.83 | +7: iteration 25780/ 173500 | consumed samples: 6599680 | consumed tokens: 13516144640 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.108502E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.812 | TFLOPs: 31.84 | +7: iteration 25790/ 173500 | consumed samples: 6602240 | consumed tokens: 13521387520 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.101526E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.416 | TFLOPs: 31.82 | +7: iteration 25800/ 173500 | consumed samples: 6604800 | consumed tokens: 13526630400 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.123755E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.504 | TFLOPs: 31.82 | +7: iteration 25810/ 173500 | consumed samples: 6607360 | consumed tokens: 13531873280 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.112742E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.210 | TFLOPs: 31.81 | +7: iteration 25820/ 173500 | consumed samples: 6609920 | consumed tokens: 13537116160 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.123922E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.548 | TFLOPs: 31.82 | +7: iteration 25830/ 173500 | consumed samples: 6612480 | consumed tokens: 13542359040 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.119336E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.238 | TFLOPs: 31.81 | +7: iteration 25840/ 173500 | consumed samples: 6615040 | consumed tokens: 13547601920 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.136777E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.266 | TFLOPs: 31.81 | +7: iteration 25850/ 173500 | consumed samples: 6617600 | consumed tokens: 13552844800 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.118439E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.189 | TFLOPs: 31.81 | +7: iteration 25860/ 173500 | consumed samples: 6620160 | consumed tokens: 13558087680 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.109684E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.953 | TFLOPs: 31.79 | +7: iteration 25870/ 173500 | consumed samples: 6622720 | consumed tokens: 13563330560 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.127559E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.907 | TFLOPs: 31.79 | +7: iteration 25880/ 173500 | consumed samples: 6625280 | consumed tokens: 13568573440 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.121963E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.405 | TFLOPs: 31.82 | +7: iteration 25890/ 173500 | consumed samples: 6627840 | consumed tokens: 13573816320 | elapsed time per iteration (s): 0.43 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.120622E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.123 | TFLOPs: 31.54 | +7: iteration 25900/ 173500 | consumed samples: 6630400 | consumed tokens: 13579059200 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.120297E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.388 | TFLOPs: 31.82 | +7: iteration 25910/ 173500 | consumed samples: 6632960 | consumed tokens: 13584302080 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.112309E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.976 | TFLOPs: 31.79 | +7: iteration 25920/ 173500 | consumed samples: 6635520 | consumed tokens: 13589544960 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.131123E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.152 | TFLOPs: 31.80 | +7: iteration 25930/ 173500 | consumed samples: 6638080 | consumed tokens: 13594787840 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.100059E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.439 | TFLOPs: 31.82 | +7: iteration 25940/ 173500 | consumed samples: 6640640 | consumed tokens: 13600030720 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.121209E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.454 | TFLOPs: 31.82 | +7: iteration 25950/ 173500 | consumed samples: 6643200 | consumed tokens: 13605273600 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.113072E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.358 | TFLOPs: 31.81 | +7: iteration 25960/ 173500 | consumed samples: 6645760 | consumed tokens: 13610516480 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.124807E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.285 | TFLOPs: 31.81 | +7: iteration 25970/ 173500 | consumed samples: 6648320 | consumed tokens: 13615759360 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.112393E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.735 | TFLOPs: 31.83 | +7: iteration 25980/ 173500 | consumed samples: 6650880 | consumed tokens: 13621002240 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.112018E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.725 | TFLOPs: 31.83 | +7: iteration 25990/ 173500 | consumed samples: 6653440 | consumed tokens: 13626245120 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.120164E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.339 | TFLOPs: 31.81 | +0: [2023-03-17 02:16:47,444] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=0, lr=[0.00019128112529201118, 0.00019128112529201118, 0.00019128112529201118], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 26000/ 173500 | consumed samples: 6656000 | consumed tokens: 13631488000 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.122185E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.628 | TFLOPs: 31.83 | +0: steps: 26000 loss: 3.0827 iter time (s): 0.421 samples/sec: 607.703 +7: iteration 26010/ 173500 | consumed samples: 6658560 | consumed tokens: 13636730880 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.139149E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.805 | TFLOPs: 31.73 | +7: iteration 26020/ 173500 | consumed samples: 6661120 | consumed tokens: 13641973760 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.116499E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.339 | TFLOPs: 31.81 | +7: iteration 26030/ 173500 | consumed samples: 6663680 | consumed tokens: 13647216640 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.107325E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.121 | TFLOPs: 31.80 | +7: iteration 26040/ 173500 | consumed samples: 6666240 | consumed tokens: 13652459520 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.125281E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.485 | TFLOPs: 31.82 | +7: iteration 26050/ 173500 | consumed samples: 6668800 | consumed tokens: 13657702400 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.121229E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.682 | TFLOPs: 31.83 | +7: iteration 26060/ 173500 | consumed samples: 6671360 | consumed tokens: 13662945280 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.124416E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.523 | TFLOPs: 31.82 | +7: iteration 26070/ 173500 | consumed samples: 6673920 | consumed tokens: 13668188160 | elapsed time per iteration (s): 0.43 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.106596E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.813 | TFLOPs: 31.00 | +7: iteration 26080/ 173500 | consumed samples: 6676480 | consumed tokens: 13673431040 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.124325E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.296 | TFLOPs: 31.86 | +7: iteration 26090/ 173500 | consumed samples: 6679040 | consumed tokens: 13678673920 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.119241E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.255 | TFLOPs: 31.81 | +7: iteration 26100/ 173500 | consumed samples: 6681600 | consumed tokens: 13683916800 | elapsed time per iteration (s): 0.43 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.120746E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.807 | TFLOPs: 31.58 | +7: iteration 26110/ 173500 | consumed samples: 6684160 | consumed tokens: 13689159680 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.117359E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.858 | TFLOPs: 31.84 | +7: iteration 26120/ 173500 | consumed samples: 6686720 | consumed tokens: 13694402560 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.115481E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.530 | TFLOPs: 31.82 | +7: iteration 26130/ 173500 | consumed samples: 6689280 | consumed tokens: 13699645440 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.115531E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.639 | TFLOPs: 31.83 | +7: iteration 26140/ 173500 | consumed samples: 6691840 | consumed tokens: 13704888320 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.122570E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.573 | TFLOPs: 31.83 | +7: iteration 26150/ 173500 | consumed samples: 6694400 | consumed tokens: 13710131200 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.114130E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.277 | TFLOPs: 31.81 | +7: iteration 26160/ 173500 | consumed samples: 6696960 | consumed tokens: 13715374080 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.117138E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.909 | TFLOPs: 31.84 | +7: iteration 26170/ 173500 | consumed samples: 6699520 | consumed tokens: 13720616960 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.121938E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.654 | TFLOPs: 31.83 | +7: iteration 26180/ 173500 | consumed samples: 6702080 | consumed tokens: 13725859840 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.095792E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.407 | TFLOPs: 31.82 | +7: iteration 26190/ 173500 | consumed samples: 6704640 | consumed tokens: 13731102720 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.127487E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.854 | TFLOPs: 31.79 | +7: iteration 26200/ 173500 | consumed samples: 6707200 | consumed tokens: 13736345600 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.122598E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.362 | TFLOPs: 31.81 | +7: iteration 26210/ 173500 | consumed samples: 6709760 | consumed tokens: 13741588480 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.123936E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.834 | TFLOPs: 31.79 | +7: iteration 26220/ 173500 | consumed samples: 6712320 | consumed tokens: 13746831360 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.125895E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.783 | TFLOPs: 31.78 | +7: iteration 26230/ 173500 | consumed samples: 6714880 | consumed tokens: 13752074240 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.120823E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.287 | TFLOPs: 31.81 | +7: iteration 26240/ 173500 | consumed samples: 6717440 | consumed tokens: 13757317120 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.109951E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.016 | TFLOPs: 31.80 | +7: iteration 26250/ 173500 | consumed samples: 6720000 | consumed tokens: 13762560000 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.145049E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.960 | TFLOPs: 31.79 | +7: iteration 26260/ 173500 | consumed samples: 6722560 | consumed tokens: 13767802880 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.112139E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.735 | TFLOPs: 31.78 | +7: iteration 26270/ 173500 | consumed samples: 6725120 | consumed tokens: 13773045760 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.121437E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.001 | TFLOPs: 31.80 | +7: iteration 26280/ 173500 | consumed samples: 6727680 | consumed tokens: 13778288640 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.117507E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.788 | TFLOPs: 31.78 | +7: iteration 26290/ 173500 | consumed samples: 6730240 | consumed tokens: 13783531520 | elapsed time per iteration (s): 0.43 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.104992E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.054 | TFLOPs: 31.59 | +7: iteration 26300/ 173500 | consumed samples: 6732800 | consumed tokens: 13788774400 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.119274E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.437 | TFLOPs: 31.82 | +7: iteration 26310/ 173500 | consumed samples: 6735360 | consumed tokens: 13794017280 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.100495E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.143 | TFLOPs: 31.80 | +7: iteration 26320/ 173500 | consumed samples: 6737920 | consumed tokens: 13799260160 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.122407E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.722 | TFLOPs: 31.78 | +7: iteration 26330/ 173500 | consumed samples: 6740480 | consumed tokens: 13804503040 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.111432E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.359 | TFLOPs: 31.76 | +7: iteration 26340/ 173500 | consumed samples: 6743040 | consumed tokens: 13809745920 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.118765E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.356 | TFLOPs: 31.81 | +7: iteration 26350/ 173500 | consumed samples: 6745600 | consumed tokens: 13814988800 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.124652E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.175 | TFLOPs: 31.81 | +7: iteration 26360/ 173500 | consumed samples: 6748160 | consumed tokens: 13820231680 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.122230E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.124 | TFLOPs: 31.80 | +7: iteration 26370/ 173500 | consumed samples: 6750720 | consumed tokens: 13825474560 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.108792E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.550 | TFLOPs: 31.77 | +7: iteration 26380/ 173500 | consumed samples: 6753280 | consumed tokens: 13830717440 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.127161E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.986 | TFLOPs: 31.80 | +7: iteration 26390/ 173500 | consumed samples: 6755840 | consumed tokens: 13835960320 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.108553E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.818 | TFLOPs: 31.79 | +7: iteration 26400/ 173500 | consumed samples: 6758400 | consumed tokens: 13841203200 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.132483E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.625 | TFLOPs: 31.78 | +7: iteration 26410/ 173500 | consumed samples: 6760960 | consumed tokens: 13846446080 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.118869E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.979 | TFLOPs: 31.79 | +7: iteration 26420/ 173500 | consumed samples: 6763520 | consumed tokens: 13851688960 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.116194E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.514 | TFLOPs: 31.77 | +7: iteration 26430/ 173500 | consumed samples: 6766080 | consumed tokens: 13856931840 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.121858E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.091 | TFLOPs: 31.80 | +7: iteration 26440/ 173500 | consumed samples: 6768640 | consumed tokens: 13862174720 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.105867E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.030 | TFLOPs: 31.80 | +7: iteration 26450/ 173500 | consumed samples: 6771200 | consumed tokens: 13867417600 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.111551E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.128 | TFLOPs: 31.80 | +7: iteration 26460/ 173500 | consumed samples: 6773760 | consumed tokens: 13872660480 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.108297E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.459 | TFLOPs: 31.82 | +7: iteration 26470/ 173500 | consumed samples: 6776320 | consumed tokens: 13877903360 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.114804E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.375 | TFLOPs: 31.82 | +7: iteration 26480/ 173500 | consumed samples: 6778880 | consumed tokens: 13883146240 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.114994E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.808 | TFLOPs: 31.79 | +7: iteration 26490/ 173500 | consumed samples: 6781440 | consumed tokens: 13888389120 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.116554E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.257 | TFLOPs: 31.81 | +7: iteration 26500/ 173500 | consumed samples: 6784000 | consumed tokens: 13893632000 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.124100E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.913 | TFLOPs: 31.79 | +7: iteration 26510/ 173500 | consumed samples: 6786560 | consumed tokens: 13898874880 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.121581E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.950 | TFLOPs: 31.79 | +7: iteration 26520/ 173500 | consumed samples: 6789120 | consumed tokens: 13904117760 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.117160E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.105 | TFLOPs: 31.80 | +7: iteration 26530/ 173500 | consumed samples: 6791680 | consumed tokens: 13909360640 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.119306E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.365 | TFLOPs: 31.82 | +7: iteration 26540/ 173500 | consumed samples: 6794240 | consumed tokens: 13914603520 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.107322E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.873 | TFLOPs: 31.79 | +7: iteration 26550/ 173500 | consumed samples: 6796800 | consumed tokens: 13919846400 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.113316E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.986 | TFLOPs: 31.80 | +7: iteration 26560/ 173500 | consumed samples: 6799360 | consumed tokens: 13925089280 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.111999E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.047 | TFLOPs: 31.80 | +7: iteration 26570/ 173500 | consumed samples: 6801920 | consumed tokens: 13930332160 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.124057E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.928 | TFLOPs: 31.79 | +7: iteration 26580/ 173500 | consumed samples: 6804480 | consumed tokens: 13935575040 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.118119E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.503 | TFLOPs: 31.77 | +7: iteration 26590/ 173500 | consumed samples: 6807040 | consumed tokens: 13940817920 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.121006E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.481 | TFLOPs: 31.82 | +7: iteration 26600/ 173500 | consumed samples: 6809600 | consumed tokens: 13946060800 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.122153E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.134 | TFLOPs: 31.80 | +7: iteration 26610/ 173500 | consumed samples: 6812160 | consumed tokens: 13951303680 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.109310E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.487 | TFLOPs: 31.82 | +7: iteration 26620/ 173500 | consumed samples: 6814720 | consumed tokens: 13956546560 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.116333E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.302 | TFLOPs: 31.81 | +7: iteration 26630/ 173500 | consumed samples: 6817280 | consumed tokens: 13961789440 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.118871E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.831 | TFLOPs: 31.79 | +7: iteration 26640/ 173500 | consumed samples: 6819840 | consumed tokens: 13967032320 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.118544E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.398 | TFLOPs: 31.82 | +7: iteration 26650/ 173500 | consumed samples: 6822400 | consumed tokens: 13972275200 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.106076E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.032 | TFLOPs: 31.80 | +7: iteration 26660/ 173500 | consumed samples: 6824960 | consumed tokens: 13977518080 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.108958E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.080 | TFLOPs: 31.80 | +7: iteration 26670/ 173500 | consumed samples: 6827520 | consumed tokens: 13982760960 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.108284E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.086 | TFLOPs: 31.80 | +7: iteration 26680/ 173500 | consumed samples: 6830080 | consumed tokens: 13988003840 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.125944E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.059 | TFLOPs: 31.80 | +7: iteration 26690/ 173500 | consumed samples: 6832640 | consumed tokens: 13993246720 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.115215E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.077 | TFLOPs: 31.80 | +7: iteration 26700/ 173500 | consumed samples: 6835200 | consumed tokens: 13998489600 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.135017E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.051 | TFLOPs: 31.80 | +7: iteration 26710/ 173500 | consumed samples: 6837760 | consumed tokens: 14003732480 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.119208E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.366 | TFLOPs: 31.66 | +7: iteration 26720/ 173500 | consumed samples: 6840320 | consumed tokens: 14008975360 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.112561E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.027 | TFLOPs: 31.80 | +7: iteration 26730/ 173500 | consumed samples: 6842880 | consumed tokens: 14014218240 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.116973E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.255 | TFLOPs: 31.65 | +7: iteration 26740/ 173500 | consumed samples: 6845440 | consumed tokens: 14019461120 | elapsed time per iteration (s): 0.42 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.126273E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.407 | TFLOPs: 31.82 | +7: iteration 26750/ 173500 | consumed samples: 6848000 | consumed tokens: 14024704000 | elapsed time per iteration (s): 0.43 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.110495E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.457 | TFLOPs: 31.51 | +7: iteration 26760/ 173500 | consumed samples: 6850560 | consumed tokens: 14029946880 | elapsed time per iteration (s): 0.42 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.111968E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.055 | TFLOPs: 31.80 | +7: iteration 26770/ 173500 | consumed samples: 6853120 | consumed tokens: 14035189760 | elapsed time per iteration (s): 0.43 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.127411E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.082 | TFLOPs: 31.33 | +7: iteration 26780/ 173500 | consumed samples: 6855680 | consumed tokens: 14040432640 | elapsed time per iteration (s): 0.42 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.108254E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.773 | TFLOPs: 31.84 | +7: iteration 26790/ 173500 | consumed samples: 6858240 | consumed tokens: 14045675520 | elapsed time per iteration (s): 0.44 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.090569E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.794 | TFLOPs: 30.79 | +7: iteration 26800/ 173500 | consumed samples: 6860800 | consumed tokens: 14050918400 | elapsed time per iteration (s): 0.42 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.115223E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.093 | TFLOPs: 31.85 | +7: iteration 26810/ 173500 | consumed samples: 6863360 | consumed tokens: 14056161280 | elapsed time per iteration (s): 0.43 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.115811E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.099 | TFLOPs: 31.28 | +7: iteration 26820/ 173500 | consumed samples: 6865920 | consumed tokens: 14061404160 | elapsed time per iteration (s): 0.42 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.103892E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.376 | TFLOPs: 31.87 | +7: iteration 26830/ 173500 | consumed samples: 6868480 | consumed tokens: 14066647040 | elapsed time per iteration (s): 0.43 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.112383E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.032 | TFLOPs: 30.96 | +7: iteration 26840/ 173500 | consumed samples: 6871040 | consumed tokens: 14071889920 | elapsed time per iteration (s): 0.44 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.106862E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.357 | TFLOPs: 30.45 | +7: iteration 26850/ 173500 | consumed samples: 6873600 | consumed tokens: 14077132800 | elapsed time per iteration (s): 0.42 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.117603E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.165 | TFLOPs: 31.91 | +7: iteration 26860/ 173500 | consumed samples: 6876160 | consumed tokens: 14082375680 | elapsed time per iteration (s): 0.42 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.105899E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.556 | TFLOPs: 31.67 | +7: iteration 26870/ 173500 | consumed samples: 6878720 | consumed tokens: 14087618560 | elapsed time per iteration (s): 0.42 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.111259E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.969 | TFLOPs: 31.85 | +7: iteration 26880/ 173500 | consumed samples: 6881280 | consumed tokens: 14092861440 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.111026E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.314 | TFLOPs: 31.86 | +7: iteration 26890/ 173500 | consumed samples: 6883840 | consumed tokens: 14098104320 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.111462E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.659 | TFLOPs: 31.83 | +7: iteration 26900/ 173500 | consumed samples: 6886400 | consumed tokens: 14103347200 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.113318E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.554 | TFLOPs: 31.82 | +7: iteration 26910/ 173500 | consumed samples: 6888960 | consumed tokens: 14108590080 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.107667E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.663 | TFLOPs: 31.83 | +7: iteration 26920/ 173500 | consumed samples: 6891520 | consumed tokens: 14113832960 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.110346E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.192 | TFLOPs: 31.70 | +7: iteration 26930/ 173500 | consumed samples: 6894080 | consumed tokens: 14119075840 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.121118E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.894 | TFLOPs: 31.84 | +7: iteration 26940/ 173500 | consumed samples: 6896640 | consumed tokens: 14124318720 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.120654E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.422 | TFLOPs: 31.82 | +7: iteration 26950/ 173500 | consumed samples: 6899200 | consumed tokens: 14129561600 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.116773E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.515 | TFLOPs: 31.82 | +7: iteration 26960/ 173500 | consumed samples: 6901760 | consumed tokens: 14134804480 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.116175E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.644 | TFLOPs: 31.83 | +7: iteration 26970/ 173500 | consumed samples: 6904320 | consumed tokens: 14140047360 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.118990E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.089 | TFLOPs: 31.80 | +7: iteration 26980/ 173500 | consumed samples: 6906880 | consumed tokens: 14145290240 | elapsed time per iteration (s): 0.43 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.100907E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.640 | TFLOPs: 31.57 | +7: iteration 26990/ 173500 | consumed samples: 6909440 | consumed tokens: 14150533120 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.100849E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.056 | TFLOPs: 31.80 | +7: iteration 27000/ 173500 | consumed samples: 6912000 | consumed tokens: 14155776000 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.103134E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.324 | TFLOPs: 31.81 | +7: iteration 27010/ 173500 | consumed samples: 6914560 | consumed tokens: 14161018880 | elapsed time per iteration (s): 0.42 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.116813E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.043 | TFLOPs: 31.75 | +7: iteration 27020/ 173500 | consumed samples: 6917120 | consumed tokens: 14166261760 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.110497E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.637 | TFLOPs: 31.83 | +7: iteration 27030/ 173500 | consumed samples: 6919680 | consumed tokens: 14171504640 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.118324E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.304 | TFLOPs: 31.86 | +7: iteration 27040/ 173500 | consumed samples: 6922240 | consumed tokens: 14176747520 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.109367E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.741 | TFLOPs: 31.83 | +7: iteration 27050/ 173500 | consumed samples: 6924800 | consumed tokens: 14181990400 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.121176E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.639 | TFLOPs: 31.72 | +7: iteration 27060/ 173500 | consumed samples: 6927360 | consumed tokens: 14187233280 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.110364E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.677 | TFLOPs: 31.78 | +7: iteration 27070/ 173500 | consumed samples: 6929920 | consumed tokens: 14192476160 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.115984E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.518 | TFLOPs: 31.82 | +7: iteration 27080/ 173500 | consumed samples: 6932480 | consumed tokens: 14197719040 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.112357E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.142 | TFLOPs: 31.80 | +7: iteration 27090/ 173500 | consumed samples: 6935040 | consumed tokens: 14202961920 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.111622E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.801 | TFLOPs: 31.84 | +7: iteration 27100/ 173500 | consumed samples: 6937600 | consumed tokens: 14208204800 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.109163E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.174 | TFLOPs: 31.86 | +7: iteration 27110/ 173500 | consumed samples: 6940160 | consumed tokens: 14213447680 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.109795E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.260 | TFLOPs: 31.86 | +7: iteration 27120/ 173500 | consumed samples: 6942720 | consumed tokens: 14218690560 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.113257E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.887 | TFLOPs: 31.74 | +7: iteration 27130/ 173500 | consumed samples: 6945280 | consumed tokens: 14223933440 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.114402E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.935 | TFLOPs: 31.64 | +7: iteration 27140/ 173500 | consumed samples: 6947840 | consumed tokens: 14229176320 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.111557E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.737 | TFLOPs: 31.83 | +7: iteration 27150/ 173500 | consumed samples: 6950400 | consumed tokens: 14234419200 | elapsed time per iteration (s): 0.42 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.123686E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.177 | TFLOPs: 31.75 | +7: iteration 27160/ 173500 | consumed samples: 6952960 | consumed tokens: 14239662080 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.116002E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.684 | TFLOPs: 31.83 | +7: iteration 27170/ 173500 | consumed samples: 6955520 | consumed tokens: 14244904960 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.111176E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.401 | TFLOPs: 31.82 | +7: iteration 27180/ 173500 | consumed samples: 6958080 | consumed tokens: 14250147840 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.114355E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.150 | TFLOPs: 31.80 | +7: iteration 27190/ 173500 | consumed samples: 6960640 | consumed tokens: 14255390720 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.113836E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.975 | TFLOPs: 31.85 | +7: iteration 27200/ 173500 | consumed samples: 6963200 | consumed tokens: 14260633600 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.106770E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.238 | TFLOPs: 31.81 | +7: iteration 27210/ 173500 | consumed samples: 6965760 | consumed tokens: 14265876480 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.114388E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.855 | TFLOPs: 31.63 | +7: iteration 27220/ 173500 | consumed samples: 6968320 | consumed tokens: 14271119360 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.116760E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.502 | TFLOPs: 31.61 | +7: iteration 27230/ 173500 | consumed samples: 6970880 | consumed tokens: 14276362240 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.109897E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.968 | TFLOPs: 31.74 | +7: iteration 27240/ 173500 | consumed samples: 6973440 | consumed tokens: 14281605120 | elapsed time per iteration (s): 0.44 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.108214E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.079 | TFLOPs: 30.28 | +7: iteration 27250/ 173500 | consumed samples: 6976000 | consumed tokens: 14286848000 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.121220E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.543 | TFLOPs: 31.61 | +7: iteration 27260/ 173500 | consumed samples: 6978560 | consumed tokens: 14292090880 | elapsed time per iteration (s): 0.44 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.117255E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.400 | TFLOPs: 30.61 | +7: iteration 27270/ 173500 | consumed samples: 6981120 | consumed tokens: 14297333760 | elapsed time per iteration (s): 0.43 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.115407E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.140 | TFLOPs: 31.02 | +7: iteration 27280/ 173500 | consumed samples: 6983680 | consumed tokens: 14302576640 | elapsed time per iteration (s): 0.44 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.100272E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.957 | TFLOPs: 30.74 | +7: iteration 27290/ 173500 | consumed samples: 6986240 | consumed tokens: 14307819520 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.119316E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.748 | TFLOPs: 31.21 | +7: iteration 27300/ 173500 | consumed samples: 6988800 | consumed tokens: 14313062400 | elapsed time per iteration (s): 0.44 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.105886E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.218 | TFLOPs: 30.65 | +7: iteration 27310/ 173500 | consumed samples: 6991360 | consumed tokens: 14318305280 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.134484E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.861 | TFLOPs: 31.26 | +7: iteration 27320/ 173500 | consumed samples: 6993920 | consumed tokens: 14323548160 | elapsed time per iteration (s): 0.42 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.119711E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.659 | TFLOPs: 31.67 | +7: iteration 27330/ 173500 | consumed samples: 6996480 | consumed tokens: 14328791040 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.106068E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.221 | TFLOPs: 31.07 | +7: iteration 27340/ 173500 | consumed samples: 6999040 | consumed tokens: 14334033920 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.104650E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.982 | TFLOPs: 31.38 | +7: iteration 27350/ 173500 | consumed samples: 7001600 | consumed tokens: 14339276800 | elapsed time per iteration (s): 0.44 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.092182E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.083 | TFLOPs: 30.86 | +7: iteration 27360/ 173500 | consumed samples: 7004160 | consumed tokens: 14344519680 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.107851E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.210 | TFLOPs: 31.07 | +7: iteration 27370/ 173500 | consumed samples: 7006720 | consumed tokens: 14349762560 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.105783E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.091 | TFLOPs: 31.59 | +7: iteration 27380/ 173500 | consumed samples: 7009280 | consumed tokens: 14355005440 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.115991E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.705 | TFLOPs: 31.26 | +7: iteration 27390/ 173500 | consumed samples: 7011840 | consumed tokens: 14360248320 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.110162E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.979 | TFLOPs: 31.22 | +7: iteration 27400/ 173500 | consumed samples: 7014400 | consumed tokens: 14365491200 | elapsed time per iteration (s): 0.47 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.091562E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 549.642 | TFLOPs: 28.84 | +7: iteration 27410/ 173500 | consumed samples: 7016960 | consumed tokens: 14370734080 | elapsed time per iteration (s): 0.45 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.102528E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.352 | TFLOPs: 29.77 | +7: iteration 27420/ 173500 | consumed samples: 7019520 | consumed tokens: 14375976960 | elapsed time per iteration (s): 0.44 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.106107E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.542 | TFLOPs: 30.25 | +7: iteration 27430/ 173500 | consumed samples: 7022080 | consumed tokens: 14381219840 | elapsed time per iteration (s): 0.45 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.105339E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.866 | TFLOPs: 29.64 | +7: iteration 27440/ 173500 | consumed samples: 7024640 | consumed tokens: 14386462720 | elapsed time per iteration (s): 0.44 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.116461E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.957 | TFLOPs: 30.43 | +7: iteration 27450/ 173500 | consumed samples: 7027200 | consumed tokens: 14391705600 | elapsed time per iteration (s): 0.45 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.094979E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.647 | TFLOPs: 29.84 | +7: iteration 27460/ 173500 | consumed samples: 7029760 | consumed tokens: 14396948480 | elapsed time per iteration (s): 0.43 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.106717E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.648 | TFLOPs: 31.41 | +7: iteration 27470/ 173500 | consumed samples: 7032320 | consumed tokens: 14402191360 | elapsed time per iteration (s): 0.47 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.108903E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 547.317 | TFLOPs: 28.72 | +7: iteration 27480/ 173500 | consumed samples: 7034880 | consumed tokens: 14407434240 | elapsed time per iteration (s): 0.46 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.093312E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.802 | TFLOPs: 29.00 | +7: iteration 27490/ 173500 | consumed samples: 7037440 | consumed tokens: 14412677120 | elapsed time per iteration (s): 0.47 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.101680E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 546.951 | TFLOPs: 28.70 | +7: iteration 27500/ 173500 | consumed samples: 7040000 | consumed tokens: 14417920000 | elapsed time per iteration (s): 0.46 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.109034E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.185 | TFLOPs: 28.92 | +7: iteration 27510/ 173500 | consumed samples: 7042560 | consumed tokens: 14423162880 | elapsed time per iteration (s): 0.48 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.113886E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.794 | TFLOPs: 28.22 | +7: iteration 27520/ 173500 | consumed samples: 7045120 | consumed tokens: 14428405760 | elapsed time per iteration (s): 0.46 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.122502E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.925 | TFLOPs: 29.27 | +7: iteration 27530/ 173500 | consumed samples: 7047680 | consumed tokens: 14433648640 | elapsed time per iteration (s): 0.44 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.109357E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.708 | TFLOPs: 30.52 | +7: iteration 27540/ 173500 | consumed samples: 7050240 | consumed tokens: 14438891520 | elapsed time per iteration (s): 0.45 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.117112E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.817 | TFLOPs: 30.16 | +7: iteration 27550/ 173500 | consumed samples: 7052800 | consumed tokens: 14444134400 | elapsed time per iteration (s): 0.45 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 3.111164E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.547 | TFLOPs: 29.99 | +7: iteration 27560/ 173500 | consumed samples: 7055360 | consumed tokens: 14449377280 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.115102E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.518 | TFLOPs: 31.46 | +7: iteration 27570/ 173500 | consumed samples: 7057920 | consumed tokens: 14454620160 | elapsed time per iteration (s): 0.42 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.113281E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.780 | TFLOPs: 31.63 | +7: iteration 27580/ 173500 | consumed samples: 7060480 | consumed tokens: 14459863040 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.114030E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.340 | TFLOPs: 31.55 | +7: iteration 27590/ 173500 | consumed samples: 7063040 | consumed tokens: 14465105920 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.108208E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.791 | TFLOPs: 31.10 | +7: iteration 27600/ 173500 | consumed samples: 7065600 | consumed tokens: 14470348800 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.117946E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.636 | TFLOPs: 30.88 | +7: iteration 27610/ 173500 | consumed samples: 7068160 | consumed tokens: 14475591680 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.109875E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.032 | TFLOPs: 30.91 | +7: iteration 27620/ 173500 | consumed samples: 7070720 | consumed tokens: 14480834560 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.110271E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.529 | TFLOPs: 30.93 | +7: iteration 27630/ 173500 | consumed samples: 7073280 | consumed tokens: 14486077440 | elapsed time per iteration (s): 0.44 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.107541E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.774 | TFLOPs: 30.42 | +7: iteration 27640/ 173500 | consumed samples: 7075840 | consumed tokens: 14491320320 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.122554E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.827 | TFLOPs: 31.00 | +7: iteration 27650/ 173500 | consumed samples: 7078400 | consumed tokens: 14496563200 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.115632E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.214 | TFLOPs: 31.28 | +7: iteration 27660/ 173500 | consumed samples: 7080960 | consumed tokens: 14501806080 | elapsed time per iteration (s): 0.44 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.107777E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.860 | TFLOPs: 30.74 | +7: iteration 27670/ 173500 | consumed samples: 7083520 | consumed tokens: 14507048960 | elapsed time per iteration (s): 0.44 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.103627E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.090 | TFLOPs: 30.80 | +7: iteration 27680/ 173500 | consumed samples: 7086080 | consumed tokens: 14512291840 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.109715E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.985 | TFLOPs: 31.38 | +7: iteration 27690/ 173500 | consumed samples: 7088640 | consumed tokens: 14517534720 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.108323E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.158 | TFLOPs: 31.44 | +7: iteration 27700/ 173500 | consumed samples: 7091200 | consumed tokens: 14522777600 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.124189E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.632 | TFLOPs: 30.88 | +7: iteration 27710/ 173500 | consumed samples: 7093760 | consumed tokens: 14528020480 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.105547E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.422 | TFLOPs: 31.08 | +7: iteration 27720/ 173500 | consumed samples: 7096320 | consumed tokens: 14533263360 | elapsed time per iteration (s): 0.44 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.099237E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.999 | TFLOPs: 30.69 | +7: iteration 27730/ 173500 | consumed samples: 7098880 | consumed tokens: 14538506240 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.099468E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.013 | TFLOPs: 31.53 | +7: iteration 27740/ 173500 | consumed samples: 7101440 | consumed tokens: 14543749120 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.111264E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.690 | TFLOPs: 31.05 | +7: iteration 27750/ 173500 | consumed samples: 7104000 | consumed tokens: 14548992000 | elapsed time per iteration (s): 0.44 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.104782E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.387 | TFLOPs: 30.40 | +7: iteration 27760/ 173500 | consumed samples: 7106560 | consumed tokens: 14554234880 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.108934E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.393 | TFLOPs: 31.19 | +7: iteration 27770/ 173500 | consumed samples: 7109120 | consumed tokens: 14559477760 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.117970E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.209 | TFLOPs: 31.28 | +7: iteration 27780/ 173500 | consumed samples: 7111680 | consumed tokens: 14564720640 | elapsed time per iteration (s): 0.44 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.116629E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.632 | TFLOPs: 30.52 | +7: iteration 27790/ 173500 | consumed samples: 7114240 | consumed tokens: 14569963520 | elapsed time per iteration (s): 0.42 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.101537E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.752 | TFLOPs: 31.63 | +7: iteration 27800/ 173500 | consumed samples: 7116800 | consumed tokens: 14575206400 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.111938E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.340 | TFLOPs: 31.50 | +7: iteration 27810/ 173500 | consumed samples: 7119360 | consumed tokens: 14580449280 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.111230E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.784 | TFLOPs: 31.31 | +7: iteration 27820/ 173500 | consumed samples: 7121920 | consumed tokens: 14585692160 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.102117E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.764 | TFLOPs: 31.57 | +7: iteration 27830/ 173500 | consumed samples: 7124480 | consumed tokens: 14590935040 | elapsed time per iteration (s): 0.44 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.109589E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.904 | TFLOPs: 30.37 | +7: iteration 27840/ 173500 | consumed samples: 7127040 | consumed tokens: 14596177920 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.106171E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.600 | TFLOPs: 31.30 | +7: iteration 27850/ 173500 | consumed samples: 7129600 | consumed tokens: 14601420800 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.114991E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.545 | TFLOPs: 31.30 | +7: iteration 27860/ 173500 | consumed samples: 7132160 | consumed tokens: 14606663680 | elapsed time per iteration (s): 0.44 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.108047E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.553 | TFLOPs: 30.67 | +7: iteration 27870/ 173500 | consumed samples: 7134720 | consumed tokens: 14611906560 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.107830E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.595 | TFLOPs: 31.30 | +7: iteration 27880/ 173500 | consumed samples: 7137280 | consumed tokens: 14617149440 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.107125E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.935 | TFLOPs: 30.95 | +7: iteration 27890/ 173500 | consumed samples: 7139840 | consumed tokens: 14622392320 | elapsed time per iteration (s): 0.44 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.108677E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.963 | TFLOPs: 30.32 | +7: iteration 27900/ 173500 | consumed samples: 7142400 | consumed tokens: 14627635200 | elapsed time per iteration (s): 0.44 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.104905E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.241 | TFLOPs: 30.81 | +7: iteration 27910/ 173500 | consumed samples: 7144960 | consumed tokens: 14632878080 | elapsed time per iteration (s): 0.44 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.127838E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.431 | TFLOPs: 30.61 | +7: iteration 27920/ 173500 | consumed samples: 7147520 | consumed tokens: 14638120960 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.090836E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.858 | TFLOPs: 31.21 | +7: iteration 27930/ 173500 | consumed samples: 7150080 | consumed tokens: 14643363840 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.105752E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.298 | TFLOPs: 31.23 | +7: iteration 27940/ 173500 | consumed samples: 7152640 | consumed tokens: 14648606720 | elapsed time per iteration (s): 0.44 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.110250E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.513 | TFLOPs: 30.25 | +7: iteration 27950/ 173500 | consumed samples: 7155200 | consumed tokens: 14653849600 | elapsed time per iteration (s): 0.44 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.097160E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.435 | TFLOPs: 30.35 | +7: iteration 27960/ 173500 | consumed samples: 7157760 | consumed tokens: 14659092480 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.109487E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.589 | TFLOPs: 30.99 | +7: iteration 27970/ 173500 | consumed samples: 7160320 | consumed tokens: 14664335360 | elapsed time per iteration (s): 0.45 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.101188E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.116 | TFLOPs: 29.97 | +7: iteration 27980/ 173500 | consumed samples: 7162880 | consumed tokens: 14669578240 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.106449E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.323 | TFLOPs: 31.03 | +7: iteration 27990/ 173500 | consumed samples: 7165440 | consumed tokens: 14674821120 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.107019E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.748 | TFLOPs: 31.36 | +0: [2023-03-17 02:31:04,565] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=0, lr=[0.00018981345832700956, 0.00018981345832700956, 0.00018981345832700956], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 28000/ 173500 | consumed samples: 7168000 | consumed tokens: 14680064000 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.106044E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.907 | TFLOPs: 31.11 | +0: steps: 28000 loss: 3.1629 iter time (s): 0.426 samples/sec: 600.697 +7: iteration 28010/ 173500 | consumed samples: 7170560 | consumed tokens: 14685306880 | elapsed time per iteration (s): 0.44 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.102994E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.778 | TFLOPs: 30.47 | +7: iteration 28020/ 173500 | consumed samples: 7173120 | consumed tokens: 14690549760 | elapsed time per iteration (s): 0.45 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.113950E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.264 | TFLOPs: 29.61 | +7: iteration 28030/ 173500 | consumed samples: 7175680 | consumed tokens: 14695792640 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.102240E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.022 | TFLOPs: 31.27 | +7: iteration 28040/ 173500 | consumed samples: 7178240 | consumed tokens: 14701035520 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.107539E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.151 | TFLOPs: 31.38 | +7: iteration 28050/ 173500 | consumed samples: 7180800 | consumed tokens: 14706278400 | elapsed time per iteration (s): 0.45 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.116384E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.768 | TFLOPs: 30.16 | +7: iteration 28060/ 173500 | consumed samples: 7183360 | consumed tokens: 14711521280 | elapsed time per iteration (s): 0.46 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.102695E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.147 | TFLOPs: 29.34 | +7: iteration 28070/ 173500 | consumed samples: 7185920 | consumed tokens: 14716764160 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.101303E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.393 | TFLOPs: 31.40 | +7: iteration 28080/ 173500 | consumed samples: 7188480 | consumed tokens: 14722007040 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.111362E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.265 | TFLOPs: 31.18 | +7: iteration 28090/ 173500 | consumed samples: 7191040 | consumed tokens: 14727249920 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.103503E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.560 | TFLOPs: 31.20 | +7: iteration 28100/ 173500 | consumed samples: 7193600 | consumed tokens: 14732492800 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.086157E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.248 | TFLOPs: 31.23 | +7: iteration 28110/ 173500 | consumed samples: 7196160 | consumed tokens: 14737735680 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.100742E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.024 | TFLOPs: 31.17 | +7: iteration 28120/ 173500 | consumed samples: 7198720 | consumed tokens: 14742978560 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.110059E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.076 | TFLOPs: 30.91 | +7: iteration 28130/ 173500 | consumed samples: 7201280 | consumed tokens: 14748221440 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.104626E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.483 | TFLOPs: 30.98 | +7: iteration 28140/ 173500 | consumed samples: 7203840 | consumed tokens: 14753464320 | elapsed time per iteration (s): 0.44 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.102365E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.376 | TFLOPs: 30.87 | +7: iteration 28150/ 173500 | consumed samples: 7206400 | consumed tokens: 14758707200 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.104352E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.681 | TFLOPs: 31.20 | +7: iteration 28160/ 173500 | consumed samples: 7208960 | consumed tokens: 14763950080 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.102989E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.992 | TFLOPs: 31.06 | +7: iteration 28170/ 173500 | consumed samples: 7211520 | consumed tokens: 14769192960 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.124479E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.558 | TFLOPs: 30.99 | +7: iteration 28180/ 173500 | consumed samples: 7214080 | consumed tokens: 14774435840 | elapsed time per iteration (s): 0.44 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.107750E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.628 | TFLOPs: 30.36 | +7: iteration 28190/ 173500 | consumed samples: 7216640 | consumed tokens: 14779678720 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.109406E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.654 | TFLOPs: 31.52 | +7: iteration 28200/ 173500 | consumed samples: 7219200 | consumed tokens: 14784921600 | elapsed time per iteration (s): 0.45 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.107862E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.853 | TFLOPs: 29.90 | +7: iteration 28210/ 173500 | consumed samples: 7221760 | consumed tokens: 14790164480 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.093115E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.426 | TFLOPs: 31.45 | +7: iteration 28220/ 173500 | consumed samples: 7224320 | consumed tokens: 14795407360 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.098763E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.855 | TFLOPs: 31.11 | +7: iteration 28230/ 173500 | consumed samples: 7226880 | consumed tokens: 14800650240 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.087860E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.231 | TFLOPs: 31.39 | +7: iteration 28240/ 173500 | consumed samples: 7229440 | consumed tokens: 14805893120 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.106660E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.715 | TFLOPs: 31.57 | +7: iteration 28250/ 173500 | consumed samples: 7232000 | consumed tokens: 14811136000 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.092868E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.620 | TFLOPs: 31.57 | +7: iteration 28260/ 173500 | consumed samples: 7234560 | consumed tokens: 14816378880 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.104470E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.172 | TFLOPs: 31.44 | +7: iteration 28270/ 173500 | consumed samples: 7237120 | consumed tokens: 14821621760 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.095763E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.049 | TFLOPs: 31.22 | +7: iteration 28280/ 173500 | consumed samples: 7239680 | consumed tokens: 14826864640 | elapsed time per iteration (s): 0.42 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.116288E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.623 | TFLOPs: 31.62 | +7: iteration 28290/ 173500 | consumed samples: 7242240 | consumed tokens: 14832107520 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.117962E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.927 | TFLOPs: 30.90 | +7: iteration 28300/ 173500 | consumed samples: 7244800 | consumed tokens: 14837350400 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.102661E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.689 | TFLOPs: 31.20 | +7: iteration 28310/ 173500 | consumed samples: 7247360 | consumed tokens: 14842593280 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.105572E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.155 | TFLOPs: 31.23 | +7: iteration 28320/ 173500 | consumed samples: 7249920 | consumed tokens: 14847836160 | elapsed time per iteration (s): 0.44 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.094288E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.521 | TFLOPs: 30.35 | +7: iteration 28330/ 173500 | consumed samples: 7252480 | consumed tokens: 14853079040 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.104207E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.304 | TFLOPs: 31.29 | +7: iteration 28340/ 173500 | consumed samples: 7255040 | consumed tokens: 14858321920 | elapsed time per iteration (s): 0.44 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.110551E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.022 | TFLOPs: 30.85 | +7: iteration 28350/ 173500 | consumed samples: 7257600 | consumed tokens: 14863564800 | elapsed time per iteration (s): 0.42 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.108887E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.397 | TFLOPs: 31.66 | +7: iteration 28360/ 173500 | consumed samples: 7260160 | consumed tokens: 14868807680 | elapsed time per iteration (s): 0.44 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.100471E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.091 | TFLOPs: 30.86 | +7: iteration 28370/ 173500 | consumed samples: 7262720 | consumed tokens: 14874050560 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.103294E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.486 | TFLOPs: 31.40 | +7: iteration 28380/ 173500 | consumed samples: 7265280 | consumed tokens: 14879293440 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.112838E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.053 | TFLOPs: 31.22 | +7: iteration 28390/ 173500 | consumed samples: 7267840 | consumed tokens: 14884536320 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.117558E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.185 | TFLOPs: 31.39 | +7: iteration 28400/ 173500 | consumed samples: 7270400 | consumed tokens: 14889779200 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.113081E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.788 | TFLOPs: 31.42 | +7: iteration 28410/ 173500 | consumed samples: 7272960 | consumed tokens: 14895022080 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.091263E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.921 | TFLOPs: 31.48 | +7: iteration 28420/ 173500 | consumed samples: 7275520 | consumed tokens: 14900264960 | elapsed time per iteration (s): 0.42 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.110865E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.034 | TFLOPs: 31.69 | +7: iteration 28430/ 173500 | consumed samples: 7278080 | consumed tokens: 14905507840 | elapsed time per iteration (s): 0.42 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.099881E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.343 | TFLOPs: 31.92 | +7: iteration 28440/ 173500 | consumed samples: 7280640 | consumed tokens: 14910750720 | elapsed time per iteration (s): 0.42 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.118947E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.249 | TFLOPs: 31.70 | +7: iteration 28450/ 173500 | consumed samples: 7283200 | consumed tokens: 14915993600 | elapsed time per iteration (s): 0.42 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.100473E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.122 | TFLOPs: 31.70 | +7: iteration 28460/ 173500 | consumed samples: 7285760 | consumed tokens: 14921236480 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.090362E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.120 | TFLOPs: 31.59 | +7: iteration 28470/ 173500 | consumed samples: 7288320 | consumed tokens: 14926479360 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.100182E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.809 | TFLOPs: 31.00 | +7: iteration 28480/ 173500 | consumed samples: 7290880 | consumed tokens: 14931722240 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.108792E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.247 | TFLOPs: 31.49 | +7: iteration 28490/ 173500 | consumed samples: 7293440 | consumed tokens: 14936965120 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.100560E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.105 | TFLOPs: 31.49 | +7: iteration 28500/ 173500 | consumed samples: 7296000 | consumed tokens: 14942208000 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.108122E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.180 | TFLOPs: 31.02 | +7: iteration 28510/ 173500 | consumed samples: 7298560 | consumed tokens: 14947450880 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.091021E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.981 | TFLOPs: 31.38 | +7: iteration 28520/ 173500 | consumed samples: 7301120 | consumed tokens: 14952693760 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.102297E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.525 | TFLOPs: 31.14 | +7: iteration 28530/ 173500 | consumed samples: 7303680 | consumed tokens: 14957936640 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.107765E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.485 | TFLOPs: 31.51 | +7: iteration 28540/ 173500 | consumed samples: 7306240 | consumed tokens: 14963179520 | elapsed time per iteration (s): 0.42 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.103743E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.805 | TFLOPs: 31.68 | +7: iteration 28550/ 173500 | consumed samples: 7308800 | consumed tokens: 14968422400 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.114558E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.513 | TFLOPs: 31.35 | +7: iteration 28560/ 173500 | consumed samples: 7311360 | consumed tokens: 14973665280 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.099579E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.390 | TFLOPs: 31.50 | +7: iteration 28570/ 173500 | consumed samples: 7313920 | consumed tokens: 14978908160 | elapsed time per iteration (s): 0.42 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.106068E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.072 | TFLOPs: 31.69 | +7: iteration 28580/ 173500 | consumed samples: 7316480 | consumed tokens: 14984151040 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.089623E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.410 | TFLOPs: 31.45 | +7: iteration 28590/ 173500 | consumed samples: 7319040 | consumed tokens: 14989393920 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.104281E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.100 | TFLOPs: 31.22 | +7: iteration 28600/ 173500 | consumed samples: 7321600 | consumed tokens: 14994636800 | elapsed time per iteration (s): 0.42 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.111063E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.123 | TFLOPs: 31.85 | +7: iteration 28610/ 173500 | consumed samples: 7324160 | consumed tokens: 14999879680 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.101346E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.265 | TFLOPs: 31.18 | +7: iteration 28620/ 173500 | consumed samples: 7326720 | consumed tokens: 15005122560 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.115474E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.812 | TFLOPs: 31.26 | +7: iteration 28630/ 173500 | consumed samples: 7329280 | consumed tokens: 15010365440 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.106505E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.972 | TFLOPs: 31.48 | +7: iteration 28640/ 173500 | consumed samples: 7331840 | consumed tokens: 15015608320 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.092089E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.331 | TFLOPs: 31.50 | +7: iteration 28650/ 173500 | consumed samples: 7334400 | consumed tokens: 15020851200 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.097552E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.596 | TFLOPs: 31.46 | +7: iteration 28660/ 173500 | consumed samples: 7336960 | consumed tokens: 15026094080 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.094140E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.542 | TFLOPs: 31.46 | +7: iteration 28670/ 173500 | consumed samples: 7339520 | consumed tokens: 15031336960 | elapsed time per iteration (s): 0.42 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.103059E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.362 | TFLOPs: 31.81 | +7: iteration 28680/ 173500 | consumed samples: 7342080 | consumed tokens: 15036579840 | elapsed time per iteration (s): 0.42 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.109698E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.004 | TFLOPs: 31.64 | +7: iteration 28690/ 173500 | consumed samples: 7344640 | consumed tokens: 15041822720 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.102332E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.742 | TFLOPs: 31.52 | +7: iteration 28700/ 173500 | consumed samples: 7347200 | consumed tokens: 15047065600 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.087925E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.827 | TFLOPs: 31.52 | +7: iteration 28710/ 173500 | consumed samples: 7349760 | consumed tokens: 15052308480 | elapsed time per iteration (s): 0.42 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.104789E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.534 | TFLOPs: 31.61 | +7: iteration 28720/ 173500 | consumed samples: 7352320 | consumed tokens: 15057551360 | elapsed time per iteration (s): 0.44 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.102217E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.022 | TFLOPs: 30.80 | +7: iteration 28730/ 173500 | consumed samples: 7354880 | consumed tokens: 15062794240 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.104041E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.199 | TFLOPs: 31.49 | +7: iteration 28740/ 173500 | consumed samples: 7357440 | consumed tokens: 15068037120 | elapsed time per iteration (s): 0.42 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.109740E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.148 | TFLOPs: 31.70 | +7: iteration 28750/ 173500 | consumed samples: 7360000 | consumed tokens: 15073280000 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.106096E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.659 | TFLOPs: 31.36 | +7: iteration 28760/ 173500 | consumed samples: 7362560 | consumed tokens: 15078522880 | elapsed time per iteration (s): 0.42 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.118191E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.363 | TFLOPs: 31.71 | +7: iteration 28770/ 173500 | consumed samples: 7365120 | consumed tokens: 15083765760 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.108689E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.368 | TFLOPs: 31.55 | +7: iteration 28780/ 173500 | consumed samples: 7367680 | consumed tokens: 15089008640 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.113102E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.874 | TFLOPs: 31.58 | +7: iteration 28790/ 173500 | consumed samples: 7370240 | consumed tokens: 15094251520 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.107895E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.940 | TFLOPs: 31.37 | +7: iteration 28800/ 173500 | consumed samples: 7372800 | consumed tokens: 15099494400 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.099953E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.497 | TFLOPs: 31.03 | +7: iteration 28810/ 173500 | consumed samples: 7375360 | consumed tokens: 15104737280 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.101858E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.499 | TFLOPs: 31.35 | +7: iteration 28820/ 173500 | consumed samples: 7377920 | consumed tokens: 15109980160 | elapsed time per iteration (s): 0.42 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.103689E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.397 | TFLOPs: 31.61 | +7: iteration 28830/ 173500 | consumed samples: 7380480 | consumed tokens: 15115223040 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.098834E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.561 | TFLOPs: 31.56 | +7: iteration 28840/ 173500 | consumed samples: 7383040 | consumed tokens: 15120465920 | elapsed time per iteration (s): 0.42 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.099900E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.860 | TFLOPs: 31.89 | +7: iteration 28850/ 173500 | consumed samples: 7385600 | consumed tokens: 15125708800 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.105163E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.895 | TFLOPs: 31.21 | +7: iteration 28860/ 173500 | consumed samples: 7388160 | consumed tokens: 15130951680 | elapsed time per iteration (s): 0.42 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.108236E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.325 | TFLOPs: 31.66 | +7: iteration 28870/ 173500 | consumed samples: 7390720 | consumed tokens: 15136194560 | elapsed time per iteration (s): 0.42 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.094494E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.548 | TFLOPs: 31.67 | +7: iteration 28880/ 173500 | consumed samples: 7393280 | consumed tokens: 15141437440 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.093381E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.105 | TFLOPs: 31.54 | +7: iteration 28890/ 173500 | consumed samples: 7395840 | consumed tokens: 15146680320 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.110844E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.983 | TFLOPs: 31.43 | +7: iteration 28900/ 173500 | consumed samples: 7398400 | consumed tokens: 15151923200 | elapsed time per iteration (s): 0.42 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.100367E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.376 | TFLOPs: 31.76 | +7: iteration 28910/ 173500 | consumed samples: 7400960 | consumed tokens: 15157166080 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.104606E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.258 | TFLOPs: 31.49 | +7: iteration 28920/ 173500 | consumed samples: 7403520 | consumed tokens: 15162408960 | elapsed time per iteration (s): 0.42 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.090261E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.264 | TFLOPs: 31.65 | +7: iteration 28930/ 173500 | consumed samples: 7406080 | consumed tokens: 15167651840 | elapsed time per iteration (s): 0.44 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.086815E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.785 | TFLOPs: 30.74 | +7: iteration 28940/ 173500 | consumed samples: 7408640 | consumed tokens: 15172894720 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.100572E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.549 | TFLOPs: 31.30 | +7: iteration 28950/ 173500 | consumed samples: 7411200 | consumed tokens: 15178137600 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.105526E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.409 | TFLOPs: 31.03 | +7: iteration 28960/ 173500 | consumed samples: 7413760 | consumed tokens: 15183380480 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.102239E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.769 | TFLOPs: 31.31 | +7: iteration 28970/ 173500 | consumed samples: 7416320 | consumed tokens: 15188623360 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.100788E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.497 | TFLOPs: 31.35 | +7: iteration 28980/ 173500 | consumed samples: 7418880 | consumed tokens: 15193866240 | elapsed time per iteration (s): 0.42 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.108104E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.187 | TFLOPs: 31.91 | +7: iteration 28990/ 173500 | consumed samples: 7421440 | consumed tokens: 15199109120 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.105117E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.472 | TFLOPs: 31.45 | +7: iteration 29000/ 173500 | consumed samples: 7424000 | consumed tokens: 15204352000 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.108254E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.023 | TFLOPs: 31.32 | +7: iteration 29010/ 173500 | consumed samples: 7426560 | consumed tokens: 15209594880 | elapsed time per iteration (s): 0.42 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.096150E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.604 | TFLOPs: 31.62 | +7: iteration 29020/ 173500 | consumed samples: 7429120 | consumed tokens: 15214837760 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.100856E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.264 | TFLOPs: 31.49 | +7: iteration 29030/ 173500 | consumed samples: 7431680 | consumed tokens: 15220080640 | elapsed time per iteration (s): 0.42 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.105948E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.078 | TFLOPs: 31.75 | +7: iteration 29040/ 173500 | consumed samples: 7434240 | consumed tokens: 15225323520 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.089635E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.995 | TFLOPs: 31.22 | +7: iteration 29050/ 173500 | consumed samples: 7436800 | consumed tokens: 15230566400 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.426192E+00 | grad norm: 6.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.044 | TFLOPs: 31.06 | +7: iteration 29060/ 173500 | consumed samples: 7439360 | consumed tokens: 15235809280 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.186609E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.219 | TFLOPs: 31.23 | +7: iteration 29070/ 173500 | consumed samples: 7441920 | consumed tokens: 15241052160 | elapsed time per iteration (s): 0.42 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.121786E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.856 | TFLOPs: 31.74 | +7: iteration 29080/ 173500 | consumed samples: 7444480 | consumed tokens: 15246295040 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.124547E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.859 | TFLOPs: 31.47 | +7: iteration 29090/ 173500 | consumed samples: 7447040 | consumed tokens: 15251537920 | elapsed time per iteration (s): 0.42 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.132480E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.555 | TFLOPs: 31.67 | +7: iteration 29100/ 173500 | consumed samples: 7449600 | consumed tokens: 15256780800 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.118257E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.913 | TFLOPs: 31.58 | +7: iteration 29110/ 173500 | consumed samples: 7452160 | consumed tokens: 15262023680 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.113541E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.976 | TFLOPs: 31.43 | +7: iteration 29120/ 173500 | consumed samples: 7454720 | consumed tokens: 15267266560 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.110229E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.899 | TFLOPs: 31.06 | +7: iteration 29130/ 173500 | consumed samples: 7457280 | consumed tokens: 15272509440 | elapsed time per iteration (s): 0.45 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.101330E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.910 | TFLOPs: 29.90 | +7: iteration 29140/ 173500 | consumed samples: 7459840 | consumed tokens: 15277752320 | elapsed time per iteration (s): 0.44 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.107007E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.324 | TFLOPs: 30.71 | +7: iteration 29150/ 173500 | consumed samples: 7462400 | consumed tokens: 15282995200 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.107847E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.143 | TFLOPs: 31.33 | +7: iteration 29160/ 173500 | consumed samples: 7464960 | consumed tokens: 15288238080 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.103169E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.167 | TFLOPs: 31.44 | +7: iteration 29170/ 173500 | consumed samples: 7467520 | consumed tokens: 15293480960 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.104544E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.896 | TFLOPs: 31.00 | +7: iteration 29180/ 173500 | consumed samples: 7470080 | consumed tokens: 15298723840 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.099089E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.929 | TFLOPs: 31.53 | +7: iteration 29190/ 173500 | consumed samples: 7472640 | consumed tokens: 15303966720 | elapsed time per iteration (s): 0.42 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.084719E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.179 | TFLOPs: 31.75 | +7: iteration 29200/ 173500 | consumed samples: 7475200 | consumed tokens: 15309209600 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.106272E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.707 | TFLOPs: 31.10 | +7: iteration 29210/ 173500 | consumed samples: 7477760 | consumed tokens: 15314452480 | elapsed time per iteration (s): 0.45 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.098693E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.308 | TFLOPs: 29.61 | +7: iteration 29220/ 173500 | consumed samples: 7480320 | consumed tokens: 15319695360 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.095282E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.260 | TFLOPs: 31.55 | +7: iteration 29230/ 173500 | consumed samples: 7482880 | consumed tokens: 15324938240 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.109414E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.166 | TFLOPs: 31.02 | +7: iteration 29240/ 173500 | consumed samples: 7485440 | consumed tokens: 15330181120 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.095497E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.921 | TFLOPs: 31.42 | +7: iteration 29250/ 173500 | consumed samples: 7488000 | consumed tokens: 15335424000 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.094565E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.967 | TFLOPs: 31.01 | +7: iteration 29260/ 173500 | consumed samples: 7490560 | consumed tokens: 15340666880 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.085073E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.845 | TFLOPs: 31.37 | +7: iteration 29270/ 173500 | consumed samples: 7493120 | consumed tokens: 15345909760 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.108541E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.615 | TFLOPs: 31.46 | +7: iteration 29280/ 173500 | consumed samples: 7495680 | consumed tokens: 15351152640 | elapsed time per iteration (s): 0.42 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.078314E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.949 | TFLOPs: 31.74 | +7: iteration 29290/ 173500 | consumed samples: 7498240 | consumed tokens: 15356395520 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.084197E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.124 | TFLOPs: 31.38 | +7: iteration 29300/ 173500 | consumed samples: 7500800 | consumed tokens: 15361638400 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.107504E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.328 | TFLOPs: 31.29 | +7: iteration 29310/ 173500 | consumed samples: 7503360 | consumed tokens: 15366881280 | elapsed time per iteration (s): 0.42 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.105016E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.699 | TFLOPs: 31.68 | +7: iteration 29320/ 173500 | consumed samples: 7505920 | consumed tokens: 15372124160 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.099094E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.791 | TFLOPs: 31.16 | +7: iteration 29330/ 173500 | consumed samples: 7508480 | consumed tokens: 15377367040 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.093945E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.008 | TFLOPs: 31.53 | +7: iteration 29340/ 173500 | consumed samples: 7511040 | consumed tokens: 15382609920 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.105879E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.097 | TFLOPs: 31.01 | +7: iteration 29350/ 173500 | consumed samples: 7513600 | consumed tokens: 15387852800 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.105292E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.282 | TFLOPs: 31.29 | +7: iteration 29360/ 173500 | consumed samples: 7516160 | consumed tokens: 15393095680 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.092759E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.033 | TFLOPs: 31.59 | +7: iteration 29370/ 173500 | consumed samples: 7518720 | consumed tokens: 15398338560 | elapsed time per iteration (s): 0.42 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.095178E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.877 | TFLOPs: 31.68 | +7: iteration 29380/ 173500 | consumed samples: 7521280 | consumed tokens: 15403581440 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.101914E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.829 | TFLOPs: 31.26 | +7: iteration 29390/ 173500 | consumed samples: 7523840 | consumed tokens: 15408824320 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.097824E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.870 | TFLOPs: 31.47 | +7: iteration 29400/ 173500 | consumed samples: 7526400 | consumed tokens: 15414067200 | elapsed time per iteration (s): 0.42 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.108678E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.917 | TFLOPs: 31.69 | +7: iteration 29410/ 173500 | consumed samples: 7528960 | consumed tokens: 15419310080 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.085105E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.829 | TFLOPs: 31.05 | +7: iteration 29420/ 173500 | consumed samples: 7531520 | consumed tokens: 15424552960 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.096597E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.725 | TFLOPs: 31.47 | +7: iteration 29430/ 173500 | consumed samples: 7534080 | consumed tokens: 15429795840 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.098650E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.765 | TFLOPs: 31.05 | +7: iteration 29440/ 173500 | consumed samples: 7536640 | consumed tokens: 15435038720 | elapsed time per iteration (s): 0.42 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.094807E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.990 | TFLOPs: 31.90 | +7: iteration 29450/ 173500 | consumed samples: 7539200 | consumed tokens: 15440281600 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.106907E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.635 | TFLOPs: 31.51 | +7: iteration 29460/ 173500 | consumed samples: 7541760 | consumed tokens: 15445524480 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.090165E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.501 | TFLOPs: 31.35 | +7: iteration 29470/ 173500 | consumed samples: 7544320 | consumed tokens: 15450767360 | elapsed time per iteration (s): 0.42 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.088663E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.033 | TFLOPs: 31.85 | +7: iteration 29480/ 173500 | consumed samples: 7546880 | consumed tokens: 15456010240 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.100414E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.200 | TFLOPs: 31.18 | +7: iteration 29490/ 173500 | consumed samples: 7549440 | consumed tokens: 15461253120 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.094862E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.051 | TFLOPs: 31.43 | +7: iteration 29500/ 173500 | consumed samples: 7552000 | consumed tokens: 15466496000 | elapsed time per iteration (s): 0.42 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.090057E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.779 | TFLOPs: 31.68 | +7: iteration 29510/ 173500 | consumed samples: 7554560 | consumed tokens: 15471738880 | elapsed time per iteration (s): 0.42 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.088147E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.688 | TFLOPs: 31.83 | +7: iteration 29520/ 173500 | consumed samples: 7557120 | consumed tokens: 15476981760 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.112967E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.260 | TFLOPs: 31.49 | +7: iteration 29530/ 173500 | consumed samples: 7559680 | consumed tokens: 15482224640 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.091899E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.453 | TFLOPs: 31.35 | +7: iteration 29540/ 173500 | consumed samples: 7562240 | consumed tokens: 15487467520 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.099458E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.773 | TFLOPs: 31.05 | +7: iteration 29550/ 173500 | consumed samples: 7564800 | consumed tokens: 15492710400 | elapsed time per iteration (s): 0.42 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.113795E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.343 | TFLOPs: 31.87 | +7: iteration 29560/ 173500 | consumed samples: 7567360 | consumed tokens: 15497953280 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.085460E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.221 | TFLOPs: 31.13 | +7: iteration 29570/ 173500 | consumed samples: 7569920 | consumed tokens: 15503196160 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.102925E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.492 | TFLOPs: 31.03 | +7: iteration 29580/ 173500 | consumed samples: 7572480 | consumed tokens: 15508439040 | elapsed time per iteration (s): 0.42 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.103052E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.787 | TFLOPs: 31.73 | +7: iteration 29590/ 173500 | consumed samples: 7575040 | consumed tokens: 15513681920 | elapsed time per iteration (s): 0.44 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.098678E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.203 | TFLOPs: 30.55 | +7: iteration 29600/ 173500 | consumed samples: 7577600 | consumed tokens: 15518924800 | elapsed time per iteration (s): 0.42 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.101972E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.575 | TFLOPs: 31.67 | +7: iteration 29610/ 173500 | consumed samples: 7580160 | consumed tokens: 15524167680 | elapsed time per iteration (s): 0.44 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.105125E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.758 | TFLOPs: 30.84 | +7: iteration 29620/ 173500 | consumed samples: 7582720 | consumed tokens: 15529410560 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.091289E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.999 | TFLOPs: 31.11 | +7: iteration 29630/ 173500 | consumed samples: 7585280 | consumed tokens: 15534653440 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.083150E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.207 | TFLOPs: 31.33 | +7: iteration 29640/ 173500 | consumed samples: 7587840 | consumed tokens: 15539896320 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.102953E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.266 | TFLOPs: 31.44 | +7: iteration 29650/ 173500 | consumed samples: 7590400 | consumed tokens: 15545139200 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.093526E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.720 | TFLOPs: 31.47 | +7: iteration 29660/ 173500 | consumed samples: 7592960 | consumed tokens: 15550382080 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.083361E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.092 | TFLOPs: 31.54 | +7: iteration 29670/ 173500 | consumed samples: 7595520 | consumed tokens: 15555624960 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.090911E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.331 | TFLOPs: 31.29 | +7: iteration 29680/ 173500 | consumed samples: 7598080 | consumed tokens: 15560867840 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.096049E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.873 | TFLOPs: 31.37 | +7: iteration 29690/ 173500 | consumed samples: 7600640 | consumed tokens: 15566110720 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.089088E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.427 | TFLOPs: 31.56 | +7: iteration 29700/ 173500 | consumed samples: 7603200 | consumed tokens: 15571353600 | elapsed time per iteration (s): 0.42 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.090365E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.628 | TFLOPs: 31.67 | +7: iteration 29710/ 173500 | consumed samples: 7605760 | consumed tokens: 15576596480 | elapsed time per iteration (s): 0.42 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.097031E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.577 | TFLOPs: 31.67 | +7: iteration 29720/ 173500 | consumed samples: 7608320 | consumed tokens: 15581839360 | elapsed time per iteration (s): 0.42 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.095193E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.647 | TFLOPs: 31.67 | +7: iteration 29730/ 173500 | consumed samples: 7610880 | consumed tokens: 15587082240 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.105026E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.897 | TFLOPs: 31.21 | +7: iteration 29740/ 173500 | consumed samples: 7613440 | consumed tokens: 15592325120 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.104106E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.424 | TFLOPs: 31.56 | +7: iteration 29750/ 173500 | consumed samples: 7616000 | consumed tokens: 15597568000 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.097706E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.355 | TFLOPs: 31.50 | +7: iteration 29760/ 173500 | consumed samples: 7618560 | consumed tokens: 15602810880 | elapsed time per iteration (s): 0.42 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.102657E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.480 | TFLOPs: 31.72 | +7: iteration 29770/ 173500 | consumed samples: 7621120 | consumed tokens: 15608053760 | elapsed time per iteration (s): 0.42 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.087424E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.035 | TFLOPs: 31.75 | +7: iteration 29780/ 173500 | consumed samples: 7623680 | consumed tokens: 15613296640 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.102314E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.261 | TFLOPs: 31.49 | +7: iteration 29790/ 173500 | consumed samples: 7626240 | consumed tokens: 15618539520 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.096795E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.334 | TFLOPs: 31.55 | +7: iteration 29800/ 173500 | consumed samples: 7628800 | consumed tokens: 15623782400 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.090785E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.642 | TFLOPs: 31.30 | +7: iteration 29810/ 173500 | consumed samples: 7631360 | consumed tokens: 15629025280 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.091533E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.002 | TFLOPs: 30.96 | +7: iteration 29820/ 173500 | consumed samples: 7633920 | consumed tokens: 15634268160 | elapsed time per iteration (s): 0.42 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.089796E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.765 | TFLOPs: 31.63 | +7: iteration 29830/ 173500 | consumed samples: 7636480 | consumed tokens: 15639511040 | elapsed time per iteration (s): 0.44 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.091313E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.419 | TFLOPs: 30.82 | +7: iteration 29840/ 173500 | consumed samples: 7639040 | consumed tokens: 15644753920 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.083711E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.683 | TFLOPs: 31.04 | +7: iteration 29850/ 173500 | consumed samples: 7641600 | consumed tokens: 15649996800 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.109459E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.581 | TFLOPs: 31.14 | +7: iteration 29860/ 173500 | consumed samples: 7644160 | consumed tokens: 15655239680 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.090508E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.874 | TFLOPs: 31.32 | +7: iteration 29870/ 173500 | consumed samples: 7646720 | consumed tokens: 15660482560 | elapsed time per iteration (s): 0.44 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.098598E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.401 | TFLOPs: 30.72 | +7: iteration 29880/ 173500 | consumed samples: 7649280 | consumed tokens: 15665725440 | elapsed time per iteration (s): 0.43 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.090560E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.587 | TFLOPs: 31.25 | +7: iteration 29890/ 173500 | consumed samples: 7651840 | consumed tokens: 15670968320 | elapsed time per iteration (s): 0.43 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.091143E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.143 | TFLOPs: 31.17 | +7: iteration 29900/ 173500 | consumed samples: 7654400 | consumed tokens: 15676211200 | elapsed time per iteration (s): 0.45 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.070787E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.401 | TFLOPs: 30.03 | +7: iteration 29910/ 173500 | consumed samples: 7656960 | consumed tokens: 15681454080 | elapsed time per iteration (s): 0.44 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.097961E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.639 | TFLOPs: 30.83 | +7: iteration 29920/ 173500 | consumed samples: 7659520 | consumed tokens: 15686696960 | elapsed time per iteration (s): 0.46 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.081301E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.304 | TFLOPs: 29.50 | +7: iteration 29930/ 173500 | consumed samples: 7662080 | consumed tokens: 15691939840 | elapsed time per iteration (s): 0.44 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.097582E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.764 | TFLOPs: 30.47 | +7: iteration 29940/ 173500 | consumed samples: 7664640 | consumed tokens: 15697182720 | elapsed time per iteration (s): 0.44 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.113645E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.998 | TFLOPs: 30.22 | +7: iteration 29950/ 173500 | consumed samples: 7667200 | consumed tokens: 15702425600 | elapsed time per iteration (s): 0.44 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.088525E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.203 | TFLOPs: 30.70 | +7: iteration 29960/ 173500 | consumed samples: 7669760 | consumed tokens: 15707668480 | elapsed time per iteration (s): 0.44 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.087451E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.031 | TFLOPs: 30.85 | +7: iteration 29970/ 173500 | consumed samples: 7672320 | consumed tokens: 15712911360 | elapsed time per iteration (s): 0.45 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.085518E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.851 | TFLOPs: 30.11 | +7: iteration 29980/ 173500 | consumed samples: 7674880 | consumed tokens: 15718154240 | elapsed time per iteration (s): 0.44 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.092735E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.021 | TFLOPs: 30.64 | +7: iteration 29990/ 173500 | consumed samples: 7677440 | consumed tokens: 15723397120 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.099683E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.773 | TFLOPs: 31.21 | +0: [2023-03-17 02:45:23,665] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=0, lr=[0.00018823900512431258, 0.00018823900512431258, 0.00018823900512431258], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 30000/ 173500 | consumed samples: 7680000 | consumed tokens: 15728640000 | elapsed time per iteration (s): 0.42 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.112239E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.677 | TFLOPs: 31.78 | +0: steps: 30000 loss: 3.1033 iter time (s): 0.427 samples/sec: 599.087 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 30000 | lm loss value: 3.299119E+00 | lm loss PPL: 2.708875E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 30000 to checkpoints_221m91b400m +0: [2023-03-17 02:45:23,829] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step30000 is begin to save! +0: [2023-03-17 02:45:23,834] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_01-model_00-model_states.pt... +0: [2023-03-17 02:45:23,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_01-model_00-model_states.pt. +0: [2023-03-17 02:45:23,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_03-model_00-model_states.pt... +0: [2023-03-17 02:45:23,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_03-model_00-model_states.pt. +0: [2023-03-17 02:45:23,991] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_04-model_00-model_states.pt... +0: [2023-03-17 02:45:24,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_04-model_00-model_states.pt. +0: [2023-03-17 02:45:24,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_05-model_00-model_states.pt... +0: [2023-03-17 02:45:24,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_05-model_00-model_states.pt. +0: [2023-03-17 02:45:24,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_06-model_00-model_states.pt... +0: [2023-03-17 02:45:24,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_06-model_00-model_states.pt. +0: [2023-03-17 02:45:24,067] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_07-model_00-model_states.pt... +0: [2023-03-17 02:45:24,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_07-model_00-model_states.pt. +0: [2023-03-17 02:45:24,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_08-model_00-model_states.pt... +0: [2023-03-17 02:45:24,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_08-model_00-model_states.pt. +0: [2023-03-17 02:45:24,119] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_09-model_00-model_states.pt... +0: [2023-03-17 02:45:24,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_09-model_00-model_states.pt. +0: [2023-03-17 02:45:24,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_10-model_00-model_states.pt... +0: [2023-03-17 02:45:24,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_10-model_00-model_states.pt. +0: [2023-03-17 02:45:24,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_11-model_00-model_states.pt... +0: [2023-03-17 02:45:24,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_11-model_00-model_states.pt. +0: [2023-03-17 02:45:24,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_12-model_00-model_states.pt... +0: [2023-03-17 02:45:24,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_12-model_00-model_states.pt. +0: [2023-03-17 02:45:24,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_13-model_00-model_states.pt... +0: [2023-03-17 02:45:24,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_13-model_00-model_states.pt. +0: [2023-03-17 02:45:24,240] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_14-model_00-model_states.pt... +0: [2023-03-17 02:45:24,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_14-model_00-model_states.pt. +0: [2023-03-17 02:45:24,267] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_15-model_00-model_states.pt... +0: [2023-03-17 02:45:24,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_15-model_00-model_states.pt. +0: [2023-03-17 02:45:24,295] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_16-model_00-model_states.pt... +0: [2023-03-17 02:45:24,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_16-model_00-model_states.pt. +0: [2023-03-17 02:45:24,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_17-model_00-model_states.pt... +0: [2023-03-17 02:45:24,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_17-model_00-model_states.pt. +0: [2023-03-17 02:45:24,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_18-model_00-model_states.pt... +0: [2023-03-17 02:45:24,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_18-model_00-model_states.pt. +0: [2023-03-17 02:45:24,369] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_19-model_00-model_states.pt... +0: [2023-03-17 02:45:24,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_19-model_00-model_states.pt. +0: [2023-03-17 02:45:24,393] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_20-model_00-model_states.pt... +0: [2023-03-17 02:45:24,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_20-model_00-model_states.pt. +0: [2023-03-17 02:45:24,417] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/layer_22-model_00-model_states.pt... +0: [2023-03-17 02:45:24,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/layer_22-model_00-model_states.pt. +0: [2023-03-17 02:45:24,421] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step30000/mp_rank_00_model_states.pt +0: [2023-03-17 02:45:24,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/mp_rank_00_model_states.pt... +0: [2023-03-17 02:45:24,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/mp_rank_00_model_states.pt. +0: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +3: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +1: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +7: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +6: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +2: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +4: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +5: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +0: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +1: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +7: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +6: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +2: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +4: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +5: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +1: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +7: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +2: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +5: [2023-03-17 02:45:24,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +0: [2023-03-17 02:45:24,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 02:45:24,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 02:45:24,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +0: [2023-03-17 02:45:24,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 02:45:24,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 02:45:24,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 02:45:24,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +0: [2023-03-17 02:45:24,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 02:45:24,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 02:45:24,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +0: [2023-03-17 02:45:24,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 02:45:24,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 02:45:24,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 02:45:24,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 02:45:24,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 02:45:24,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 02:45:24,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 02:45:24,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +0: [2023-03-17 02:45:24,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +0: [2023-03-17 02:45:24,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 02:45:24,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +0: [2023-03-17 02:45:24,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +6: [2023-03-17 02:45:24,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 02:45:24,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 02:45:24,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 02:45:24,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 02:45:24,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 02:45:24,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 02:45:24,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 02:45:24,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 02:45:24,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +6: [2023-03-17 02:45:24,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +6: [2023-03-17 02:45:24,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +6: [2023-03-17 02:45:24,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +6: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +6: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +6: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +6: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +7: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +4: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +4: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +7: [2023-03-17 02:45:24,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +4: [2023-03-17 02:45:24,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +4: [2023-03-17 02:45:24,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 02:45:24,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 02:45:24,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 02:45:24,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 02:45:24,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 02:45:24,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 02:45:24,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +4: [2023-03-17 02:45:24,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 02:45:24,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 02:45:24,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +0: [2023-03-17 02:45:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 02:45:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +1: [2023-03-17 02:45:24,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 02:45:24,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +1: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 02:45:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +1: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 02:45:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +1: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 02:45:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +1: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 02:45:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 02:45:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 02:45:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 02:45:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 02:45:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 02:45:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 02:45:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +2: [2023-03-17 02:45:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +2: [2023-03-17 02:45:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 02:45:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 02:45:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 02:45:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +2: [2023-03-17 02:45:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +2: [2023-03-17 02:45:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 02:45:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +1: [2023-03-17 02:45:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 02:45:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 02:45:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 02:45:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 02:45:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 02:45:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 02:45:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +1: [2023-03-17 02:45:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +1: [2023-03-17 02:45:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 02:45:24,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 02:45:24,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +5: [2023-03-17 02:45:24,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 02:45:24,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 02:45:24,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 02:45:24,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 02:45:24,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 02:45:24,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 02:45:24,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 02:45:24,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 02:45:24,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +5: [2023-03-17 02:45:24,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +5: [2023-03-17 02:45:24,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +5: [2023-03-17 02:45:24,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +5: [2023-03-17 02:45:24,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 02:45:24,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 02:45:24,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 02:45:24,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 02:45:24,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 02:45:24,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 02:45:24,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +5: [2023-03-17 02:45:24,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +5: [2023-03-17 02:45:24,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +5: [2023-03-17 02:45:24,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 02:45:24,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 02:45:24,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 02:45:24,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 02:45:24,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 02:45:24,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 02:45:24,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 02:45:24,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 02:45:24,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 02:45:24,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 02:45:24,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +3: [2023-03-17 02:45:24,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! +0: successfully saved checkpoint at iteration 30000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 769.49 +7: iteration 30010/ 173500 | consumed samples: 7682560 | consumed tokens: 15733882880 | elapsed time per iteration (s): 0.52 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.096023E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 495.911 | TFLOPs: 26.02 | +7: iteration 30020/ 173500 | consumed samples: 7685120 | consumed tokens: 15739125760 | elapsed time per iteration (s): 0.42 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.092266E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.493 | TFLOPs: 31.66 | +7: iteration 30030/ 173500 | consumed samples: 7687680 | consumed tokens: 15744368640 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.083974E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.394 | TFLOPs: 31.40 | +7: iteration 30040/ 173500 | consumed samples: 7690240 | consumed tokens: 15749611520 | elapsed time per iteration (s): 0.42 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.101919E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.309 | TFLOPs: 32.02 | +7: iteration 30050/ 173500 | consumed samples: 7692800 | consumed tokens: 15754854400 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.095735E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.358 | TFLOPs: 31.29 | +7: iteration 30060/ 173500 | consumed samples: 7695360 | consumed tokens: 15760097280 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.096088E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.921 | TFLOPs: 31.16 | +7: iteration 30070/ 173500 | consumed samples: 7697920 | consumed tokens: 15765340160 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.085823E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.564 | TFLOPs: 31.56 | +7: iteration 30080/ 173500 | consumed samples: 7700480 | consumed tokens: 15770583040 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.097724E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.889 | TFLOPs: 30.90 | +7: iteration 30090/ 173500 | consumed samples: 7703040 | consumed tokens: 15775825920 | elapsed time per iteration (s): 0.44 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.088592E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.174 | TFLOPs: 30.55 | +7: iteration 30100/ 173500 | consumed samples: 7705600 | consumed tokens: 15781068800 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.121279E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.675 | TFLOPs: 31.31 | +7: iteration 30110/ 173500 | consumed samples: 7708160 | consumed tokens: 15786311680 | elapsed time per iteration (s): 0.45 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.101120E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.803 | TFLOPs: 30.16 | +7: iteration 30120/ 173500 | consumed samples: 7710720 | consumed tokens: 15791554560 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.098841E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.714 | TFLOPs: 31.05 | +7: iteration 30130/ 173500 | consumed samples: 7713280 | consumed tokens: 15796797440 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.096434E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.741 | TFLOPs: 31.21 | +7: iteration 30140/ 173500 | consumed samples: 7715840 | consumed tokens: 15802040320 | elapsed time per iteration (s): 0.44 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.106894E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.086 | TFLOPs: 30.80 | +7: iteration 30150/ 173500 | consumed samples: 7718400 | consumed tokens: 15807283200 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.104952E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.773 | TFLOPs: 30.89 | +7: iteration 30160/ 173500 | consumed samples: 7720960 | consumed tokens: 15812526080 | elapsed time per iteration (s): 0.44 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.089035E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.779 | TFLOPs: 30.73 | +7: iteration 30170/ 173500 | consumed samples: 7723520 | consumed tokens: 15817768960 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.092539E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.244 | TFLOPs: 30.92 | +7: iteration 30180/ 173500 | consumed samples: 7726080 | consumed tokens: 15823011840 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.082927E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.733 | TFLOPs: 31.41 | +7: iteration 30190/ 173500 | consumed samples: 7728640 | consumed tokens: 15828254720 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.085495E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.187 | TFLOPs: 31.28 | +7: iteration 30200/ 173500 | consumed samples: 7731200 | consumed tokens: 15833497600 | elapsed time per iteration (s): 0.45 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.099206E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.785 | TFLOPs: 30.16 | +7: iteration 30210/ 173500 | consumed samples: 7733760 | consumed tokens: 15838740480 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.085818E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.429 | TFLOPs: 31.14 | +7: iteration 30220/ 173500 | consumed samples: 7736320 | consumed tokens: 15843983360 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.079937E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.172 | TFLOPs: 31.07 | +7: iteration 30230/ 173500 | consumed samples: 7738880 | consumed tokens: 15849226240 | elapsed time per iteration (s): 0.44 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.096264E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.130 | TFLOPs: 30.60 | +7: iteration 30240/ 173500 | consumed samples: 7741440 | consumed tokens: 15854469120 | elapsed time per iteration (s): 0.44 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.085781E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.465 | TFLOPs: 30.72 | +7: iteration 30250/ 173500 | consumed samples: 7744000 | consumed tokens: 15859712000 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.084859E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.212 | TFLOPs: 31.49 | +7: iteration 30260/ 173500 | consumed samples: 7746560 | consumed tokens: 15864954880 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.090292E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.775 | TFLOPs: 31.36 | +7: iteration 30270/ 173500 | consumed samples: 7749120 | consumed tokens: 15870197760 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.111069E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.464 | TFLOPs: 31.45 | +7: iteration 30280/ 173500 | consumed samples: 7751680 | consumed tokens: 15875440640 | elapsed time per iteration (s): 0.42 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.107089E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.149 | TFLOPs: 31.65 | +7: iteration 30290/ 173500 | consumed samples: 7754240 | consumed tokens: 15880683520 | elapsed time per iteration (s): 0.42 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.089248E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.280 | TFLOPs: 31.65 | +7: iteration 30300/ 173500 | consumed samples: 7756800 | consumed tokens: 15885926400 | elapsed time per iteration (s): 0.42 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.093091E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.124 | TFLOPs: 31.75 | +7: iteration 30310/ 173500 | consumed samples: 7759360 | consumed tokens: 15891169280 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.102438E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.627 | TFLOPs: 31.51 | +7: iteration 30320/ 173500 | consumed samples: 7761920 | consumed tokens: 15896412160 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.091263E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.549 | TFLOPs: 31.56 | +7: iteration 30330/ 173500 | consumed samples: 7764480 | consumed tokens: 15901655040 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.097550E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.177 | TFLOPs: 30.91 | +7: iteration 30340/ 173500 | consumed samples: 7767040 | consumed tokens: 15906897920 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.087012E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.627 | TFLOPs: 31.09 | +7: iteration 30350/ 173500 | consumed samples: 7769600 | consumed tokens: 15912140800 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.108799E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.994 | TFLOPs: 31.48 | +7: iteration 30360/ 173500 | consumed samples: 7772160 | consumed tokens: 15917383680 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.083237E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.554 | TFLOPs: 31.30 | +7: iteration 30370/ 173500 | consumed samples: 7774720 | consumed tokens: 15922626560 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.095499E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.602 | TFLOPs: 31.04 | +7: iteration 30380/ 173500 | consumed samples: 7777280 | consumed tokens: 15927869440 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.097524E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.409 | TFLOPs: 31.55 | +7: iteration 30390/ 173500 | consumed samples: 7779840 | consumed tokens: 15933112320 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.091758E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.593 | TFLOPs: 31.41 | +7: iteration 30400/ 173500 | consumed samples: 7782400 | consumed tokens: 15938355200 | elapsed time per iteration (s): 0.44 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.096787E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.134 | TFLOPs: 30.75 | +7: iteration 30410/ 173500 | consumed samples: 7784960 | consumed tokens: 15943598080 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.088346E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.663 | TFLOPs: 31.46 | +7: iteration 30420/ 173500 | consumed samples: 7787520 | consumed tokens: 15948840960 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.083807E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.324 | TFLOPs: 31.18 | +7: iteration 30430/ 173500 | consumed samples: 7790080 | consumed tokens: 15954083840 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.077158E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.877 | TFLOPs: 31.47 | +7: iteration 30440/ 173500 | consumed samples: 7792640 | consumed tokens: 15959326720 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.084960E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.674 | TFLOPs: 31.20 | +7: iteration 30450/ 173500 | consumed samples: 7795200 | consumed tokens: 15964569600 | elapsed time per iteration (s): 0.42 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.094374E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.858 | TFLOPs: 31.63 | +7: iteration 30460/ 173500 | consumed samples: 7797760 | consumed tokens: 15969812480 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.094845E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.331 | TFLOPs: 31.39 | +7: iteration 30470/ 173500 | consumed samples: 7800320 | consumed tokens: 15975055360 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.089460E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.440 | TFLOPs: 31.50 | +7: iteration 30480/ 173500 | consumed samples: 7802880 | consumed tokens: 15980298240 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.080472E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.147 | TFLOPs: 31.02 | +7: iteration 30490/ 173500 | consumed samples: 7805440 | consumed tokens: 15985541120 | elapsed time per iteration (s): 0.44 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.087714E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.802 | TFLOPs: 30.53 | +7: iteration 30500/ 173500 | consumed samples: 7808000 | consumed tokens: 15990784000 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.092846E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.828 | TFLOPs: 31.52 | +7: iteration 30510/ 173500 | consumed samples: 7810560 | consumed tokens: 15996026880 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.097235E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.187 | TFLOPs: 31.39 | +7: iteration 30520/ 173500 | consumed samples: 7813120 | consumed tokens: 16001269760 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.084936E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.169 | TFLOPs: 31.49 | +7: iteration 30530/ 173500 | consumed samples: 7815680 | consumed tokens: 16006512640 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.105037E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.029 | TFLOPs: 31.59 | +7: iteration 30540/ 173500 | consumed samples: 7818240 | consumed tokens: 16011755520 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.105395E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.050 | TFLOPs: 31.22 | +7: iteration 30550/ 173500 | consumed samples: 7820800 | consumed tokens: 16016998400 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.079659E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.439 | TFLOPs: 31.40 | +7: iteration 30560/ 173500 | consumed samples: 7823360 | consumed tokens: 16022241280 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.095480E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.289 | TFLOPs: 31.55 | +7: iteration 30570/ 173500 | consumed samples: 7825920 | consumed tokens: 16027484160 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.091825E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.446 | TFLOPs: 31.56 | +7: iteration 30580/ 173500 | consumed samples: 7828480 | consumed tokens: 16032727040 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.090232E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.397 | TFLOPs: 31.29 | +7: iteration 30590/ 173500 | consumed samples: 7831040 | consumed tokens: 16037969920 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.095998E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.338 | TFLOPs: 31.29 | +7: iteration 30600/ 173500 | consumed samples: 7833600 | consumed tokens: 16043212800 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.082150E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.620 | TFLOPs: 31.41 | +7: iteration 30610/ 173500 | consumed samples: 7836160 | consumed tokens: 16048455680 | elapsed time per iteration (s): 0.44 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.092970E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.148 | TFLOPs: 30.86 | +7: iteration 30620/ 173500 | consumed samples: 7838720 | consumed tokens: 16053698560 | elapsed time per iteration (s): 0.42 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.106268E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.394 | TFLOPs: 31.61 | +7: iteration 30630/ 173500 | consumed samples: 7841280 | consumed tokens: 16058941440 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.083185E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.041 | TFLOPs: 31.54 | +7: iteration 30640/ 173500 | consumed samples: 7843840 | consumed tokens: 16064184320 | elapsed time per iteration (s): 0.44 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.082237E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.182 | TFLOPs: 30.86 | +7: iteration 30650/ 173500 | consumed samples: 7846400 | consumed tokens: 16069427200 | elapsed time per iteration (s): 0.42 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.100687E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.655 | TFLOPs: 31.83 | +7: iteration 30660/ 173500 | consumed samples: 7848960 | consumed tokens: 16074670080 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.100347E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.719 | TFLOPs: 31.41 | +7: iteration 30670/ 173500 | consumed samples: 7851520 | consumed tokens: 16079912960 | elapsed time per iteration (s): 0.44 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.093366E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.200 | TFLOPs: 30.55 | +7: iteration 30680/ 173500 | consumed samples: 7854080 | consumed tokens: 16085155840 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.101181E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.809 | TFLOPs: 31.47 | +7: iteration 30690/ 173500 | consumed samples: 7856640 | consumed tokens: 16090398720 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.090973E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.932 | TFLOPs: 31.48 | +7: iteration 30700/ 173500 | consumed samples: 7859200 | consumed tokens: 16095641600 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.084848E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.199 | TFLOPs: 31.44 | +7: iteration 30710/ 173500 | consumed samples: 7861760 | consumed tokens: 16100884480 | elapsed time per iteration (s): 0.42 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.091021E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.484 | TFLOPs: 31.66 | +7: iteration 30720/ 173500 | consumed samples: 7864320 | consumed tokens: 16106127360 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.097927E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.059 | TFLOPs: 31.54 | +7: iteration 30730/ 173500 | consumed samples: 7866880 | consumed tokens: 16111370240 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.079257E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.468 | TFLOPs: 31.45 | +7: iteration 30740/ 173500 | consumed samples: 7869440 | consumed tokens: 16116613120 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.083380E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.662 | TFLOPs: 31.52 | +7: iteration 30750/ 173500 | consumed samples: 7872000 | consumed tokens: 16121856000 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.080568E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.590 | TFLOPs: 31.51 | +7: iteration 30760/ 173500 | consumed samples: 7874560 | consumed tokens: 16127098880 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.099147E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.315 | TFLOPs: 31.45 | +7: iteration 30770/ 173500 | consumed samples: 7877120 | consumed tokens: 16132341760 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.096592E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.491 | TFLOPs: 31.51 | +7: iteration 30780/ 173500 | consumed samples: 7879680 | consumed tokens: 16137584640 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.079359E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.386 | TFLOPs: 31.34 | +7: iteration 30790/ 173500 | consumed samples: 7882240 | consumed tokens: 16142827520 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.088517E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.973 | TFLOPs: 31.58 | +7: iteration 30800/ 173500 | consumed samples: 7884800 | consumed tokens: 16148070400 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.066392E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.254 | TFLOPs: 31.60 | +7: iteration 30810/ 173500 | consumed samples: 7887360 | consumed tokens: 16153313280 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.073714E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.741 | TFLOPs: 31.36 | +7: iteration 30820/ 173500 | consumed samples: 7889920 | consumed tokens: 16158556160 | elapsed time per iteration (s): 0.42 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.096662E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.340 | TFLOPs: 31.76 | +7: iteration 30830/ 173500 | consumed samples: 7892480 | consumed tokens: 16163799040 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.092033E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.314 | TFLOPs: 31.50 | +7: iteration 30840/ 173500 | consumed samples: 7895040 | consumed tokens: 16169041920 | elapsed time per iteration (s): 0.42 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.095904E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.609 | TFLOPs: 31.62 | +7: iteration 30850/ 173500 | consumed samples: 7897600 | consumed tokens: 16174284800 | elapsed time per iteration (s): 0.42 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.090376E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.002 | TFLOPs: 31.90 | +7: iteration 30860/ 173500 | consumed samples: 7900160 | consumed tokens: 16179527680 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.090642E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.130 | TFLOPs: 31.44 | +7: iteration 30870/ 173500 | consumed samples: 7902720 | consumed tokens: 16184770560 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.085816E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.140 | TFLOPs: 31.38 | +7: iteration 30880/ 173500 | consumed samples: 7905280 | consumed tokens: 16190013440 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.078559E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.684 | TFLOPs: 31.46 | +7: iteration 30890/ 173500 | consumed samples: 7907840 | consumed tokens: 16195256320 | elapsed time per iteration (s): 0.42 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.089939E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.684 | TFLOPs: 31.62 | +7: iteration 30900/ 173500 | consumed samples: 7910400 | consumed tokens: 16200499200 | elapsed time per iteration (s): 0.42 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.081671E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.481 | TFLOPs: 31.82 | +7: iteration 30910/ 173500 | consumed samples: 7912960 | consumed tokens: 16205742080 | elapsed time per iteration (s): 0.42 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.097757E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.579 | TFLOPs: 31.88 | +7: iteration 30920/ 173500 | consumed samples: 7915520 | consumed tokens: 16210984960 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.089986E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.229 | TFLOPs: 31.23 | +7: iteration 30930/ 173500 | consumed samples: 7918080 | consumed tokens: 16216227840 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.084961E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.823 | TFLOPs: 31.37 | +7: iteration 30940/ 173500 | consumed samples: 7920640 | consumed tokens: 16221470720 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.102633E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.461 | TFLOPs: 31.56 | +7: iteration 30950/ 173500 | consumed samples: 7923200 | consumed tokens: 16226713600 | elapsed time per iteration (s): 0.42 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.093424E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.693 | TFLOPs: 31.67 | +7: iteration 30960/ 173500 | consumed samples: 7925760 | consumed tokens: 16231956480 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.090988E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.743 | TFLOPs: 31.26 | +7: iteration 30970/ 173500 | consumed samples: 7928320 | consumed tokens: 16237199360 | elapsed time per iteration (s): 0.42 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.099067E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.994 | TFLOPs: 31.64 | +7: iteration 30980/ 173500 | consumed samples: 7930880 | consumed tokens: 16242442240 | elapsed time per iteration (s): 0.42 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.082458E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.479 | TFLOPs: 31.66 | +7: iteration 30990/ 173500 | consumed samples: 7933440 | consumed tokens: 16247685120 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.083597E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.846 | TFLOPs: 31.53 | +7: iteration 31000/ 173500 | consumed samples: 7936000 | consumed tokens: 16252928000 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.085975E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.214 | TFLOPs: 31.60 | +7: iteration 31010/ 173500 | consumed samples: 7938560 | consumed tokens: 16258170880 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.083876E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.924 | TFLOPs: 31.27 | +7: iteration 31020/ 173500 | consumed samples: 7941120 | consumed tokens: 16263413760 | elapsed time per iteration (s): 0.42 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.088266E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.325 | TFLOPs: 31.66 | +7: iteration 31030/ 173500 | consumed samples: 7943680 | consumed tokens: 16268656640 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.102473E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.773 | TFLOPs: 31.47 | +7: iteration 31040/ 173500 | consumed samples: 7946240 | consumed tokens: 16273899520 | elapsed time per iteration (s): 0.42 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.072911E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.595 | TFLOPs: 31.62 | +7: iteration 31050/ 173500 | consumed samples: 7948800 | consumed tokens: 16279142400 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.076937E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.886 | TFLOPs: 31.37 | +7: iteration 31060/ 173500 | consumed samples: 7951360 | consumed tokens: 16284385280 | elapsed time per iteration (s): 0.42 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.099846E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.142 | TFLOPs: 31.65 | +7: iteration 31070/ 173500 | consumed samples: 7953920 | consumed tokens: 16289628160 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.090171E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.577 | TFLOPs: 31.56 | +7: iteration 31080/ 173500 | consumed samples: 7956480 | consumed tokens: 16294871040 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.094593E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.870 | TFLOPs: 31.21 | +7: iteration 31090/ 173500 | consumed samples: 7959040 | consumed tokens: 16300113920 | elapsed time per iteration (s): 0.42 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.089448E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.827 | TFLOPs: 31.63 | +7: iteration 31100/ 173500 | consumed samples: 7961600 | consumed tokens: 16305356800 | elapsed time per iteration (s): 0.42 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.078144E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.387 | TFLOPs: 31.61 | +7: iteration 31110/ 173500 | consumed samples: 7964160 | consumed tokens: 16310599680 | elapsed time per iteration (s): 0.42 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.087279E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.945 | TFLOPs: 31.64 | +7: iteration 31120/ 173500 | consumed samples: 7966720 | consumed tokens: 16315842560 | elapsed time per iteration (s): 0.42 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.083932E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.787 | TFLOPs: 31.63 | +7: iteration 31130/ 173500 | consumed samples: 7969280 | consumed tokens: 16321085440 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.083591E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.728 | TFLOPs: 31.41 | +7: iteration 31140/ 173500 | consumed samples: 7971840 | consumed tokens: 16326328320 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.083405E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.550 | TFLOPs: 31.51 | +7: iteration 31150/ 173500 | consumed samples: 7974400 | consumed tokens: 16331571200 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.082938E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.713 | TFLOPs: 31.52 | +7: iteration 31160/ 173500 | consumed samples: 7976960 | consumed tokens: 16336814080 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.076406E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.523 | TFLOPs: 31.46 | +7: iteration 31170/ 173500 | consumed samples: 7979520 | consumed tokens: 16342056960 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.081955E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.372 | TFLOPs: 31.55 | +7: iteration 31180/ 173500 | consumed samples: 7982080 | consumed tokens: 16347299840 | elapsed time per iteration (s): 0.42 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.078003E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.949 | TFLOPs: 31.85 | +7: iteration 31190/ 173500 | consumed samples: 7984640 | consumed tokens: 16352542720 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.094929E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.424 | TFLOPs: 30.98 | +7: iteration 31200/ 173500 | consumed samples: 7987200 | consumed tokens: 16357785600 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.092300E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.323 | TFLOPs: 31.45 | +7: iteration 31210/ 173500 | consumed samples: 7989760 | consumed tokens: 16363028480 | elapsed time per iteration (s): 0.42 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.071229E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.179 | TFLOPs: 31.86 | +7: iteration 31220/ 173500 | consumed samples: 7992320 | consumed tokens: 16368271360 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.098178E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.722 | TFLOPs: 31.31 | +7: iteration 31230/ 173500 | consumed samples: 7994880 | consumed tokens: 16373514240 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.074431E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.827 | TFLOPs: 30.95 | +7: iteration 31240/ 173500 | consumed samples: 7997440 | consumed tokens: 16378757120 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.078790E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.157 | TFLOPs: 31.12 | +7: iteration 31250/ 173500 | consumed samples: 8000000 | consumed tokens: 16384000000 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.077506E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.510 | TFLOPs: 31.25 | +7: iteration 31260/ 173500 | consumed samples: 8002560 | consumed tokens: 16389242880 | elapsed time per iteration (s): 0.42 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.090972E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.186 | TFLOPs: 31.86 | +7: iteration 31270/ 173500 | consumed samples: 8005120 | consumed tokens: 16394485760 | elapsed time per iteration (s): 0.42 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.083116E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.105 | TFLOPs: 31.85 | +7: iteration 31280/ 173500 | consumed samples: 8007680 | consumed tokens: 16399728640 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.093736E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.002 | TFLOPs: 31.38 | +7: iteration 31290/ 173500 | consumed samples: 8010240 | consumed tokens: 16404971520 | elapsed time per iteration (s): 0.42 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.090411E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.739 | TFLOPs: 31.68 | +7: iteration 31300/ 173500 | consumed samples: 8012800 | consumed tokens: 16410214400 | elapsed time per iteration (s): 0.42 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.094545E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.889 | TFLOPs: 31.63 | +7: iteration 31310/ 173500 | consumed samples: 8015360 | consumed tokens: 16415457280 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.075737E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.649 | TFLOPs: 31.20 | +7: iteration 31320/ 173500 | consumed samples: 8017920 | consumed tokens: 16420700160 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.084152E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.389 | TFLOPs: 31.29 | +7: iteration 31330/ 173500 | consumed samples: 8020480 | consumed tokens: 16425943040 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.086112E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.621 | TFLOPs: 31.46 | +7: iteration 31340/ 173500 | consumed samples: 8023040 | consumed tokens: 16431185920 | elapsed time per iteration (s): 0.42 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.085867E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.809 | TFLOPs: 31.84 | +7: iteration 31350/ 173500 | consumed samples: 8025600 | consumed tokens: 16436428800 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.086069E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.623 | TFLOPs: 31.41 | +7: iteration 31360/ 173500 | consumed samples: 8028160 | consumed tokens: 16441671680 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.085673E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.204 | TFLOPs: 31.33 | +7: iteration 31370/ 173500 | consumed samples: 8030720 | consumed tokens: 16446914560 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.099069E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.477 | TFLOPs: 31.56 | +7: iteration 31380/ 173500 | consumed samples: 8033280 | consumed tokens: 16452157440 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.085832E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.199 | TFLOPs: 31.60 | +7: iteration 31390/ 173500 | consumed samples: 8035840 | consumed tokens: 16457400320 | elapsed time per iteration (s): 0.42 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.096211E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.638 | TFLOPs: 31.67 | +7: iteration 31400/ 173500 | consumed samples: 8038400 | consumed tokens: 16462643200 | elapsed time per iteration (s): 0.42 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.083028E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.091 | TFLOPs: 31.70 | +7: iteration 31410/ 173500 | consumed samples: 8040960 | consumed tokens: 16467886080 | elapsed time per iteration (s): 0.45 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.085428E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.219 | TFLOPs: 29.81 | +7: iteration 31420/ 173500 | consumed samples: 8043520 | consumed tokens: 16473128960 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.090541E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.335 | TFLOPs: 31.45 | +7: iteration 31430/ 173500 | consumed samples: 8046080 | consumed tokens: 16478371840 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.081798E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.734 | TFLOPs: 30.94 | +7: iteration 31440/ 173500 | consumed samples: 8048640 | consumed tokens: 16483614720 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.080294E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.873 | TFLOPs: 31.26 | +7: iteration 31450/ 173500 | consumed samples: 8051200 | consumed tokens: 16488857600 | elapsed time per iteration (s): 0.42 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.078541E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.763 | TFLOPs: 31.68 | +7: iteration 31460/ 173500 | consumed samples: 8053760 | consumed tokens: 16494100480 | elapsed time per iteration (s): 0.42 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.092917E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.975 | TFLOPs: 31.85 | +7: iteration 31470/ 173500 | consumed samples: 8056320 | consumed tokens: 16499343360 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.085472E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.593 | TFLOPs: 31.35 | +7: iteration 31480/ 173500 | consumed samples: 8058880 | consumed tokens: 16504586240 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.079540E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.043 | TFLOPs: 31.17 | +7: iteration 31490/ 173500 | consumed samples: 8061440 | consumed tokens: 16509829120 | elapsed time per iteration (s): 0.42 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.086248E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.009 | TFLOPs: 31.80 | +7: iteration 31500/ 173500 | consumed samples: 8064000 | consumed tokens: 16515072000 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.089332E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.095 | TFLOPs: 31.59 | +7: iteration 31510/ 173500 | consumed samples: 8066560 | consumed tokens: 16520314880 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.094463E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.079 | TFLOPs: 31.49 | +7: iteration 31520/ 173500 | consumed samples: 8069120 | consumed tokens: 16525557760 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.100339E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.405 | TFLOPs: 31.45 | +7: iteration 31530/ 173500 | consumed samples: 8071680 | consumed tokens: 16530800640 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.103653E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.273 | TFLOPs: 31.23 | +7: iteration 31540/ 173500 | consumed samples: 8074240 | consumed tokens: 16536043520 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.086707E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.523 | TFLOPs: 31.46 | +7: iteration 31550/ 173500 | consumed samples: 8076800 | consumed tokens: 16541286400 | elapsed time per iteration (s): 0.42 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.092163E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.986 | TFLOPs: 31.64 | +7: iteration 31560/ 173500 | consumed samples: 8079360 | consumed tokens: 16546529280 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.089697E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.006 | TFLOPs: 31.43 | +7: iteration 31570/ 173500 | consumed samples: 8081920 | consumed tokens: 16551772160 | elapsed time per iteration (s): 0.42 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.091231E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.801 | TFLOPs: 31.63 | +7: iteration 31580/ 173500 | consumed samples: 8084480 | consumed tokens: 16557015040 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.088832E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.120 | TFLOPs: 31.38 | +7: iteration 31590/ 173500 | consumed samples: 8087040 | consumed tokens: 16562257920 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.082191E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.736 | TFLOPs: 31.57 | +7: iteration 31600/ 173500 | consumed samples: 8089600 | consumed tokens: 16567500800 | elapsed time per iteration (s): 0.42 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.082267E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.948 | TFLOPs: 31.74 | +7: iteration 31610/ 173500 | consumed samples: 8092160 | consumed tokens: 16572743680 | elapsed time per iteration (s): 0.42 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.093140E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.442 | TFLOPs: 31.71 | +7: iteration 31620/ 173500 | consumed samples: 8094720 | consumed tokens: 16577986560 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.068220E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.281 | TFLOPs: 31.23 | +7: iteration 31630/ 173500 | consumed samples: 8097280 | consumed tokens: 16583229440 | elapsed time per iteration (s): 0.42 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.086582E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.051 | TFLOPs: 31.69 | +7: iteration 31640/ 173500 | consumed samples: 8099840 | consumed tokens: 16588472320 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.101789E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.811 | TFLOPs: 31.10 | +7: iteration 31650/ 173500 | consumed samples: 8102400 | consumed tokens: 16593715200 | elapsed time per iteration (s): 0.42 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.083004E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.845 | TFLOPs: 31.63 | +7: iteration 31660/ 173500 | consumed samples: 8104960 | consumed tokens: 16598958080 | elapsed time per iteration (s): 0.42 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.092451E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.589 | TFLOPs: 31.88 | +7: iteration 31670/ 173500 | consumed samples: 8107520 | consumed tokens: 16604200960 | elapsed time per iteration (s): 0.42 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.088006E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.587 | TFLOPs: 31.72 | +7: iteration 31680/ 173500 | consumed samples: 8110080 | consumed tokens: 16609443840 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.094553E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.392 | TFLOPs: 31.55 | +7: iteration 31690/ 173500 | consumed samples: 8112640 | consumed tokens: 16614686720 | elapsed time per iteration (s): 0.42 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.081577E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.675 | TFLOPs: 31.73 | +7: iteration 31700/ 173500 | consumed samples: 8115200 | consumed tokens: 16619929600 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.089732E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.535 | TFLOPs: 31.35 | +7: iteration 31710/ 173500 | consumed samples: 8117760 | consumed tokens: 16625172480 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.083953E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.070 | TFLOPs: 31.22 | +7: iteration 31720/ 173500 | consumed samples: 8120320 | consumed tokens: 16630415360 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.086838E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.388 | TFLOPs: 31.50 | +7: iteration 31730/ 173500 | consumed samples: 8122880 | consumed tokens: 16635658240 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.065198E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.243 | TFLOPs: 31.18 | +7: iteration 31740/ 173500 | consumed samples: 8125440 | consumed tokens: 16640901120 | elapsed time per iteration (s): 0.42 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.085915E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.666 | TFLOPs: 31.67 | +7: iteration 31750/ 173500 | consumed samples: 8128000 | consumed tokens: 16646144000 | elapsed time per iteration (s): 0.42 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.078583E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.381 | TFLOPs: 31.66 | +7: iteration 31760/ 173500 | consumed samples: 8130560 | consumed tokens: 16651386880 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.090250E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.090 | TFLOPs: 31.59 | +7: iteration 31770/ 173500 | consumed samples: 8133120 | consumed tokens: 16656629760 | elapsed time per iteration (s): 0.42 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.081990E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.554 | TFLOPs: 31.67 | +7: iteration 31780/ 173500 | consumed samples: 8135680 | consumed tokens: 16661872640 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.078644E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.447 | TFLOPs: 31.24 | +7: iteration 31790/ 173500 | consumed samples: 8138240 | consumed tokens: 16667115520 | elapsed time per iteration (s): 0.42 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.089405E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.363 | TFLOPs: 31.87 | +7: iteration 31800/ 173500 | consumed samples: 8140800 | consumed tokens: 16672358400 | elapsed time per iteration (s): 0.42 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.081821E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.907 | TFLOPs: 31.63 | +7: iteration 31810/ 173500 | consumed samples: 8143360 | consumed tokens: 16677601280 | elapsed time per iteration (s): 0.44 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.081262E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.717 | TFLOPs: 30.26 | +7: iteration 31820/ 173500 | consumed samples: 8145920 | consumed tokens: 16682844160 | elapsed time per iteration (s): 0.42 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.091801E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.305 | TFLOPs: 31.92 | +7: iteration 31830/ 173500 | consumed samples: 8148480 | consumed tokens: 16688087040 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.092805E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.425 | TFLOPs: 31.29 | +7: iteration 31840/ 173500 | consumed samples: 8151040 | consumed tokens: 16693329920 | elapsed time per iteration (s): 0.42 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.083110E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.387 | TFLOPs: 31.87 | +7: iteration 31850/ 173500 | consumed samples: 8153600 | consumed tokens: 16698572800 | elapsed time per iteration (s): 0.42 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.095327E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.969 | TFLOPs: 31.85 | +7: iteration 31860/ 173500 | consumed samples: 8156160 | consumed tokens: 16703815680 | elapsed time per iteration (s): 0.44 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.076895E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.996 | TFLOPs: 30.59 | +7: iteration 31870/ 173500 | consumed samples: 8158720 | consumed tokens: 16709058560 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.072911E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.319 | TFLOPs: 31.45 | +7: iteration 31880/ 173500 | consumed samples: 8161280 | consumed tokens: 16714301440 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.074190E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.512 | TFLOPs: 31.35 | +7: iteration 31890/ 173500 | consumed samples: 8163840 | consumed tokens: 16719544320 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.079185E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.156 | TFLOPs: 31.54 | +7: iteration 31900/ 173500 | consumed samples: 8166400 | consumed tokens: 16724787200 | elapsed time per iteration (s): 0.42 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.079010E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.026 | TFLOPs: 31.85 | +7: iteration 31910/ 173500 | consumed samples: 8168960 | consumed tokens: 16730030080 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.083466E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.911 | TFLOPs: 31.58 | +7: iteration 31920/ 173500 | consumed samples: 8171520 | consumed tokens: 16735272960 | elapsed time per iteration (s): 0.42 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.081188E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.920 | TFLOPs: 31.84 | +7: iteration 31930/ 173500 | consumed samples: 8174080 | consumed tokens: 16740515840 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.086559E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.427 | TFLOPs: 31.56 | +7: iteration 31940/ 173500 | consumed samples: 8176640 | consumed tokens: 16745758720 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.072434E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.834 | TFLOPs: 31.37 | +7: iteration 31950/ 173500 | consumed samples: 8179200 | consumed tokens: 16751001600 | elapsed time per iteration (s): 0.42 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.088911E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.465 | TFLOPs: 31.66 | +7: iteration 31960/ 173500 | consumed samples: 8181760 | consumed tokens: 16756244480 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.077379E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.069 | TFLOPs: 31.01 | +7: iteration 31970/ 173500 | consumed samples: 8184320 | consumed tokens: 16761487360 | elapsed time per iteration (s): 0.42 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.081704E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.717 | TFLOPs: 31.89 | +7: iteration 31980/ 173500 | consumed samples: 8186880 | consumed tokens: 16766730240 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.083604E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.168 | TFLOPs: 31.54 | +7: iteration 31990/ 173500 | consumed samples: 8189440 | consumed tokens: 16771973120 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.091039E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.029 | TFLOPs: 31.53 | +0: [2023-03-17 02:59:39,873] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=0, lr=[0.00018655987222005428, 0.00018655987222005428, 0.00018655987222005428], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 32000/ 173500 | consumed samples: 8192000 | consumed tokens: 16777216000 | elapsed time per iteration (s): 0.42 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.074583E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.731 | TFLOPs: 31.78 | +0: steps: 32000 loss: 3.0568 iter time (s): 0.425 samples/sec: 601.907 +7: iteration 32010/ 173500 | consumed samples: 8194560 | consumed tokens: 16782458880 | elapsed time per iteration (s): 0.44 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.080767E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.593 | TFLOPs: 30.78 | +7: iteration 32020/ 173500 | consumed samples: 8197120 | consumed tokens: 16787701760 | elapsed time per iteration (s): 0.42 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.083789E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.598 | TFLOPs: 31.88 | +7: iteration 32030/ 173500 | consumed samples: 8199680 | consumed tokens: 16792944640 | elapsed time per iteration (s): 0.42 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.093803E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.582 | TFLOPs: 31.83 | +7: iteration 32040/ 173500 | consumed samples: 8202240 | consumed tokens: 16798187520 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.089452E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.051 | TFLOPs: 31.59 | +7: iteration 32050/ 173500 | consumed samples: 8204800 | consumed tokens: 16803430400 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.091041E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.490 | TFLOPs: 31.24 | +7: iteration 32060/ 173500 | consumed samples: 8207360 | consumed tokens: 16808673280 | elapsed time per iteration (s): 0.42 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.079495E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.976 | TFLOPs: 31.74 | +7: iteration 32070/ 173500 | consumed samples: 8209920 | consumed tokens: 16813916160 | elapsed time per iteration (s): 0.42 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.085872E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.420 | TFLOPs: 31.71 | +7: iteration 32080/ 173500 | consumed samples: 8212480 | consumed tokens: 16819159040 | elapsed time per iteration (s): 0.42 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.079935E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.727 | TFLOPs: 31.83 | +7: iteration 32090/ 173500 | consumed samples: 8215040 | consumed tokens: 16824401920 | elapsed time per iteration (s): 0.42 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.086350E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.608 | TFLOPs: 31.67 | +7: iteration 32100/ 173500 | consumed samples: 8217600 | consumed tokens: 16829644800 | elapsed time per iteration (s): 0.42 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.083572E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.523 | TFLOPs: 31.61 | +7: iteration 32110/ 173500 | consumed samples: 8220160 | consumed tokens: 16834887680 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.085205E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.875 | TFLOPs: 31.42 | +7: iteration 32120/ 173500 | consumed samples: 8222720 | consumed tokens: 16840130560 | elapsed time per iteration (s): 0.44 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.081736E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.430 | TFLOPs: 30.72 | +7: iteration 32130/ 173500 | consumed samples: 8225280 | consumed tokens: 16845373440 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.076644E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.300 | TFLOPs: 31.86 | +7: iteration 32140/ 173500 | consumed samples: 8227840 | consumed tokens: 16850616320 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.095476E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.997 | TFLOPs: 31.85 | +7: iteration 32150/ 173500 | consumed samples: 8230400 | consumed tokens: 16855859200 | elapsed time per iteration (s): 0.43 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.092653E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.306 | TFLOPs: 31.55 | +7: iteration 32160/ 173500 | consumed samples: 8232960 | consumed tokens: 16861102080 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.074169E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.132 | TFLOPs: 31.70 | +7: iteration 32170/ 173500 | consumed samples: 8235520 | consumed tokens: 16866344960 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.082267E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.608 | TFLOPs: 31.78 | +7: iteration 32180/ 173500 | consumed samples: 8238080 | consumed tokens: 16871587840 | elapsed time per iteration (s): 0.43 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.088588E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.372 | TFLOPs: 31.45 | +7: iteration 32190/ 173500 | consumed samples: 8240640 | consumed tokens: 16876830720 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.092260E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.135 | TFLOPs: 31.86 | +7: iteration 32200/ 173500 | consumed samples: 8243200 | consumed tokens: 16882073600 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.086874E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.903 | TFLOPs: 31.69 | +7: iteration 32210/ 173500 | consumed samples: 8245760 | consumed tokens: 16887316480 | elapsed time per iteration (s): 0.43 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.091976E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.149 | TFLOPs: 31.38 | +7: iteration 32220/ 173500 | consumed samples: 8248320 | consumed tokens: 16892559360 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.080191E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.219 | TFLOPs: 31.86 | +7: iteration 32230/ 173500 | consumed samples: 8250880 | consumed tokens: 16897802240 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.089242E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.458 | TFLOPs: 31.82 | +7: iteration 32240/ 173500 | consumed samples: 8253440 | consumed tokens: 16903045120 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.083150E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.910 | TFLOPs: 31.84 | +7: iteration 32250/ 173500 | consumed samples: 8256000 | consumed tokens: 16908288000 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.086436E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.006 | TFLOPs: 31.53 | +7: iteration 32260/ 173500 | consumed samples: 8258560 | consumed tokens: 16913530880 | elapsed time per iteration (s): 0.42 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.102635E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.917 | TFLOPs: 31.69 | +7: iteration 32270/ 173500 | consumed samples: 8261120 | consumed tokens: 16918773760 | elapsed time per iteration (s): 0.42 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.077575E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.652 | TFLOPs: 31.83 | +7: iteration 32280/ 173500 | consumed samples: 8263680 | consumed tokens: 16924016640 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.084145E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.924 | TFLOPs: 31.58 | +7: iteration 32290/ 173500 | consumed samples: 8266240 | consumed tokens: 16929259520 | elapsed time per iteration (s): 0.42 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.089530E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.488 | TFLOPs: 31.61 | +7: iteration 32300/ 173500 | consumed samples: 8268800 | consumed tokens: 16934502400 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.095689E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.723 | TFLOPs: 31.52 | +7: iteration 32310/ 173500 | consumed samples: 8271360 | consumed tokens: 16939745280 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.086058E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.457 | TFLOPs: 31.56 | +7: iteration 32320/ 173500 | consumed samples: 8273920 | consumed tokens: 16944988160 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.081210E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.381 | TFLOPs: 31.55 | +7: iteration 32330/ 173500 | consumed samples: 8276480 | consumed tokens: 16950231040 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.093473E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.990 | TFLOPs: 31.53 | +7: iteration 32340/ 173500 | consumed samples: 8279040 | consumed tokens: 16955473920 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.076005E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.008 | TFLOPs: 30.96 | +7: iteration 32350/ 173500 | consumed samples: 8281600 | consumed tokens: 16960716800 | elapsed time per iteration (s): 0.42 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.078100E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.457 | TFLOPs: 31.77 | +7: iteration 32360/ 173500 | consumed samples: 8284160 | consumed tokens: 16965959680 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.080388E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.420 | TFLOPs: 31.35 | +7: iteration 32370/ 173500 | consumed samples: 8286720 | consumed tokens: 16971202560 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.084239E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.729 | TFLOPs: 30.94 | +7: iteration 32380/ 173500 | consumed samples: 8289280 | consumed tokens: 16976445440 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.101087E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.259 | TFLOPs: 31.28 | +7: iteration 32390/ 173500 | consumed samples: 8291840 | consumed tokens: 16981688320 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.072487E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.968 | TFLOPs: 31.43 | +7: iteration 32400/ 173500 | consumed samples: 8294400 | consumed tokens: 16986931200 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.085376E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.670 | TFLOPs: 31.10 | +7: iteration 32410/ 173500 | consumed samples: 8296960 | consumed tokens: 16992174080 | elapsed time per iteration (s): 0.44 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.076889E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.529 | TFLOPs: 30.56 | +7: iteration 32420/ 173500 | consumed samples: 8299520 | consumed tokens: 16997416960 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.091016E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.374 | TFLOPs: 30.92 | +7: iteration 32430/ 173500 | consumed samples: 8302080 | consumed tokens: 17002659840 | elapsed time per iteration (s): 0.44 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.086453E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.983 | TFLOPs: 30.43 | +7: iteration 32440/ 173500 | consumed samples: 8304640 | consumed tokens: 17007902720 | elapsed time per iteration (s): 0.44 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.090573E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.229 | TFLOPs: 30.81 | +7: iteration 32450/ 173500 | consumed samples: 8307200 | consumed tokens: 17013145600 | elapsed time per iteration (s): 0.45 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.074509E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.183 | TFLOPs: 29.86 | +7: iteration 32460/ 173500 | consumed samples: 8309760 | consumed tokens: 17018388480 | elapsed time per iteration (s): 0.44 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.073831E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.905 | TFLOPs: 30.74 | +7: iteration 32470/ 173500 | consumed samples: 8312320 | consumed tokens: 17023631360 | elapsed time per iteration (s): 0.42 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.063206E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.226 | TFLOPs: 31.97 | +7: iteration 32480/ 173500 | consumed samples: 8314880 | consumed tokens: 17028874240 | elapsed time per iteration (s): 0.43 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.080813E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.717 | TFLOPs: 30.99 | +7: iteration 32490/ 173500 | consumed samples: 8317440 | consumed tokens: 17034117120 | elapsed time per iteration (s): 0.43 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.081034E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.512 | TFLOPs: 31.09 | +7: iteration 32500/ 173500 | consumed samples: 8320000 | consumed tokens: 17039360000 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.102852E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.891 | TFLOPs: 31.84 | +7: iteration 32510/ 173500 | consumed samples: 8322560 | consumed tokens: 17044602880 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.077766E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.923 | TFLOPs: 31.69 | +7: iteration 32520/ 173500 | consumed samples: 8325120 | consumed tokens: 17049845760 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.083025E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.050 | TFLOPs: 31.69 | +7: iteration 32530/ 173500 | consumed samples: 8327680 | consumed tokens: 17055088640 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.071932E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.003 | TFLOPs: 31.90 | +7: iteration 32540/ 173500 | consumed samples: 8330240 | consumed tokens: 17060331520 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.072040E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.447 | TFLOPs: 31.87 | +7: iteration 32550/ 173500 | consumed samples: 8332800 | consumed tokens: 17065574400 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.085476E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.405 | TFLOPs: 31.61 | +7: iteration 32560/ 173500 | consumed samples: 8335360 | consumed tokens: 17070817280 | elapsed time per iteration (s): 0.43 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.060808E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.990 | TFLOPs: 31.53 | +7: iteration 32570/ 173500 | consumed samples: 8337920 | consumed tokens: 17076060160 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.078835E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.121 | TFLOPs: 31.80 | +7: iteration 32580/ 173500 | consumed samples: 8340480 | consumed tokens: 17081303040 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.092718E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.675 | TFLOPs: 31.73 | +7: iteration 32590/ 173500 | consumed samples: 8343040 | consumed tokens: 17086545920 | elapsed time per iteration (s): 0.42 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.080597E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.896 | TFLOPs: 31.79 | +7: iteration 32600/ 173500 | consumed samples: 8345600 | consumed tokens: 17091788800 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.087396E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.061 | TFLOPs: 31.38 | +7: iteration 32610/ 173500 | consumed samples: 8348160 | consumed tokens: 17097031680 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.082550E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.549 | TFLOPs: 31.51 | +7: iteration 32620/ 173500 | consumed samples: 8350720 | consumed tokens: 17102274560 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.086534E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.621 | TFLOPs: 31.36 | +7: iteration 32630/ 173500 | consumed samples: 8353280 | consumed tokens: 17107517440 | elapsed time per iteration (s): 0.42 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.082720E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.426 | TFLOPs: 31.87 | +7: iteration 32640/ 173500 | consumed samples: 8355840 | consumed tokens: 17112760320 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.077138E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.650 | TFLOPs: 31.15 | +7: iteration 32650/ 173500 | consumed samples: 8358400 | consumed tokens: 17118003200 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.092982E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.059 | TFLOPs: 31.27 | +7: iteration 32660/ 173500 | consumed samples: 8360960 | consumed tokens: 17123246080 | elapsed time per iteration (s): 0.42 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.071284E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.525 | TFLOPs: 31.72 | +7: iteration 32670/ 173500 | consumed samples: 8363520 | consumed tokens: 17128488960 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.083542E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.851 | TFLOPs: 31.11 | +7: iteration 32680/ 173500 | consumed samples: 8366080 | consumed tokens: 17133731840 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.066359E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.573 | TFLOPs: 31.09 | +7: iteration 32690/ 173500 | consumed samples: 8368640 | consumed tokens: 17138974720 | elapsed time per iteration (s): 0.42 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.079278E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.682 | TFLOPs: 31.94 | +7: iteration 32700/ 173500 | consumed samples: 8371200 | consumed tokens: 17144217600 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.094208E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.980 | TFLOPs: 31.69 | +7: iteration 32710/ 173500 | consumed samples: 8373760 | consumed tokens: 17149460480 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.083586E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.799 | TFLOPs: 31.79 | +7: iteration 32720/ 173500 | consumed samples: 8376320 | consumed tokens: 17154703360 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.088382E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.317 | TFLOPs: 31.92 | +7: iteration 32730/ 173500 | consumed samples: 8378880 | consumed tokens: 17159946240 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.078860E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.059 | TFLOPs: 31.85 | +7: iteration 32740/ 173500 | consumed samples: 8381440 | consumed tokens: 17165189120 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.073126E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.176 | TFLOPs: 31.65 | +7: iteration 32750/ 173500 | consumed samples: 8384000 | consumed tokens: 17170432000 | elapsed time per iteration (s): 0.43 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.076854E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.690 | TFLOPs: 31.57 | +7: iteration 32760/ 173500 | consumed samples: 8386560 | consumed tokens: 17175674880 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.072151E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.913 | TFLOPs: 31.84 | +7: iteration 32770/ 173500 | consumed samples: 8389120 | consumed tokens: 17180917760 | elapsed time per iteration (s): 0.43 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.077395E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.336 | TFLOPs: 31.60 | +7: iteration 32780/ 173500 | consumed samples: 8391680 | consumed tokens: 17186160640 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.087857E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.994 | TFLOPs: 31.64 | +7: iteration 32790/ 173500 | consumed samples: 8394240 | consumed tokens: 17191403520 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.080877E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.361 | TFLOPs: 31.60 | +7: iteration 32800/ 173500 | consumed samples: 8396800 | consumed tokens: 17196646400 | elapsed time per iteration (s): 0.43 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.070785E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.956 | TFLOPs: 31.53 | +7: iteration 32810/ 173500 | consumed samples: 8399360 | consumed tokens: 17201889280 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.063964E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.425 | TFLOPs: 31.87 | +7: iteration 32820/ 173500 | consumed samples: 8401920 | consumed tokens: 17207132160 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.085735E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.228 | TFLOPs: 31.86 | +7: iteration 32830/ 173500 | consumed samples: 8404480 | consumed tokens: 17212375040 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.091671E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.221 | TFLOPs: 31.86 | +7: iteration 32840/ 173500 | consumed samples: 8407040 | consumed tokens: 17217617920 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.070419E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.454 | TFLOPs: 31.87 | +7: iteration 32850/ 173500 | consumed samples: 8409600 | consumed tokens: 17222860800 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.080366E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.278 | TFLOPs: 31.86 | +7: iteration 32860/ 173500 | consumed samples: 8412160 | consumed tokens: 17228103680 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.071349E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.586 | TFLOPs: 31.88 | +7: iteration 32870/ 173500 | consumed samples: 8414720 | consumed tokens: 17233346560 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.090467E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.732 | TFLOPs: 31.89 | +7: iteration 32880/ 173500 | consumed samples: 8417280 | consumed tokens: 17238589440 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.081348E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.343 | TFLOPs: 31.87 | +7: iteration 32890/ 173500 | consumed samples: 8419840 | consumed tokens: 17243832320 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.080083E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.896 | TFLOPs: 31.90 | +7: iteration 32900/ 173500 | consumed samples: 8422400 | consumed tokens: 17249075200 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.087181E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.931 | TFLOPs: 31.90 | +7: iteration 32910/ 173500 | consumed samples: 8424960 | consumed tokens: 17254318080 | elapsed time per iteration (s): 0.43 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.093118E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.615 | TFLOPs: 31.57 | +7: iteration 32920/ 173500 | consumed samples: 8427520 | consumed tokens: 17259560960 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.082394E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.044 | TFLOPs: 31.90 | +7: iteration 32930/ 173500 | consumed samples: 8430080 | consumed tokens: 17264803840 | elapsed time per iteration (s): 0.42 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.078477E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.442 | TFLOPs: 31.87 | +7: iteration 32940/ 173500 | consumed samples: 8432640 | consumed tokens: 17270046720 | elapsed time per iteration (s): 0.42 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.090507E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.591 | TFLOPs: 31.88 | +7: iteration 32950/ 173500 | consumed samples: 8435200 | consumed tokens: 17275289600 | elapsed time per iteration (s): 0.42 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.079801E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.805 | TFLOPs: 31.89 | +7: iteration 32960/ 173500 | consumed samples: 8437760 | consumed tokens: 17280532480 | elapsed time per iteration (s): 0.43 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.077085E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.699 | TFLOPs: 31.36 | +7: iteration 32970/ 173500 | consumed samples: 8440320 | consumed tokens: 17285775360 | elapsed time per iteration (s): 0.42 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.076909E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.965 | TFLOPs: 31.90 | +7: iteration 32980/ 173500 | consumed samples: 8442880 | consumed tokens: 17291018240 | elapsed time per iteration (s): 0.42 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.077622E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.964 | TFLOPs: 31.90 | +7: iteration 32990/ 173500 | consumed samples: 8445440 | consumed tokens: 17296261120 | elapsed time per iteration (s): 0.42 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.087828E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.291 | TFLOPs: 31.92 | +7: iteration 33000/ 173500 | consumed samples: 8448000 | consumed tokens: 17301504000 | elapsed time per iteration (s): 0.42 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.083647E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.935 | TFLOPs: 31.69 | +7: iteration 33010/ 173500 | consumed samples: 8450560 | consumed tokens: 17306746880 | elapsed time per iteration (s): 0.42 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.065081E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.832 | TFLOPs: 31.89 | +7: iteration 33020/ 173500 | consumed samples: 8453120 | consumed tokens: 17311989760 | elapsed time per iteration (s): 0.42 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.067269E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.145 | TFLOPs: 31.91 | +7: iteration 33030/ 173500 | consumed samples: 8455680 | consumed tokens: 17317232640 | elapsed time per iteration (s): 0.43 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.070892E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.301 | TFLOPs: 31.39 | +7: iteration 33040/ 173500 | consumed samples: 8458240 | consumed tokens: 17322475520 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.079000E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.484 | TFLOPs: 31.93 | +7: iteration 33050/ 173500 | consumed samples: 8460800 | consumed tokens: 17327718400 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.066709E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.955 | TFLOPs: 31.69 | +7: iteration 33060/ 173500 | consumed samples: 8463360 | consumed tokens: 17332961280 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.069254E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.602 | TFLOPs: 31.93 | +7: iteration 33070/ 173500 | consumed samples: 8465920 | consumed tokens: 17338204160 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.083247E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.397 | TFLOPs: 31.92 | +7: iteration 33080/ 173500 | consumed samples: 8468480 | consumed tokens: 17343447040 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.075723E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.256 | TFLOPs: 31.91 | +7: iteration 33090/ 173500 | consumed samples: 8471040 | consumed tokens: 17348689920 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.090799E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.030 | TFLOPs: 31.90 | +7: iteration 33100/ 173500 | consumed samples: 8473600 | consumed tokens: 17353932800 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.093764E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.353 | TFLOPs: 31.87 | +7: iteration 33110/ 173500 | consumed samples: 8476160 | consumed tokens: 17359175680 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.066837E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.340 | TFLOPs: 31.92 | +7: iteration 33120/ 173500 | consumed samples: 8478720 | consumed tokens: 17364418560 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.084512E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.454 | TFLOPs: 31.92 | +7: iteration 33130/ 173500 | consumed samples: 8481280 | consumed tokens: 17369661440 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.075593E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.599 | TFLOPs: 31.93 | +7: iteration 33140/ 173500 | consumed samples: 8483840 | consumed tokens: 17374904320 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.070740E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.428 | TFLOPs: 31.92 | +7: iteration 33150/ 173500 | consumed samples: 8486400 | consumed tokens: 17380147200 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.084531E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.261 | TFLOPs: 31.91 | +7: iteration 33160/ 173500 | consumed samples: 8488960 | consumed tokens: 17385390080 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.083123E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.234 | TFLOPs: 31.91 | +7: iteration 33170/ 173500 | consumed samples: 8491520 | consumed tokens: 17390632960 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.076983E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.264 | TFLOPs: 31.86 | +7: iteration 33180/ 173500 | consumed samples: 8494080 | consumed tokens: 17395875840 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.076978E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.564 | TFLOPs: 31.93 | +7: iteration 33190/ 173500 | consumed samples: 8496640 | consumed tokens: 17401118720 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.064657E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.469 | TFLOPs: 31.93 | +7: iteration 33200/ 173500 | consumed samples: 8499200 | consumed tokens: 17406361600 | elapsed time per iteration (s): 0.43 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.065469E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.326 | TFLOPs: 31.34 | +7: iteration 33210/ 173500 | consumed samples: 8501760 | consumed tokens: 17411604480 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.096298E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.642 | TFLOPs: 31.93 | +7: iteration 33220/ 173500 | consumed samples: 8504320 | consumed tokens: 17416847360 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.061596E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.179 | TFLOPs: 31.65 | +7: iteration 33230/ 173500 | consumed samples: 8506880 | consumed tokens: 17422090240 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.082549E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.133 | TFLOPs: 31.96 | +7: iteration 33240/ 173500 | consumed samples: 8509440 | consumed tokens: 17427333120 | elapsed time per iteration (s): 0.43 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.073455E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.648 | TFLOPs: 31.57 | +7: iteration 33250/ 173500 | consumed samples: 8512000 | consumed tokens: 17432576000 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.077644E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.992 | TFLOPs: 31.95 | +7: iteration 33260/ 173500 | consumed samples: 8514560 | consumed tokens: 17437818880 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.067542E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.558 | TFLOPs: 31.93 | +7: iteration 33270/ 173500 | consumed samples: 8517120 | consumed tokens: 17443061760 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.092231E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.765 | TFLOPs: 31.89 | +7: iteration 33280/ 173500 | consumed samples: 8519680 | consumed tokens: 17448304640 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.095969E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.879 | TFLOPs: 31.63 | +7: iteration 33290/ 173500 | consumed samples: 8522240 | consumed tokens: 17453547520 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.074209E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.496 | TFLOPs: 31.87 | +7: iteration 33300/ 173500 | consumed samples: 8524800 | consumed tokens: 17458790400 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.082447E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.258 | TFLOPs: 31.91 | +7: iteration 33310/ 173500 | consumed samples: 8527360 | consumed tokens: 17464033280 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.075165E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.533 | TFLOPs: 31.67 | +7: iteration 33320/ 173500 | consumed samples: 8529920 | consumed tokens: 17469276160 | elapsed time per iteration (s): 0.43 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.085163E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.909 | TFLOPs: 31.58 | +7: iteration 33330/ 173500 | consumed samples: 8532480 | consumed tokens: 17474519040 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.067105E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.079 | TFLOPs: 31.96 | +7: iteration 33340/ 173500 | consumed samples: 8535040 | consumed tokens: 17479761920 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.082380E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.340 | TFLOPs: 31.92 | +7: iteration 33350/ 173500 | consumed samples: 8537600 | consumed tokens: 17485004800 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.081569E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.849 | TFLOPs: 31.89 | +7: iteration 33360/ 173500 | consumed samples: 8540160 | consumed tokens: 17490247680 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.077330E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.548 | TFLOPs: 31.93 | +7: iteration 33370/ 173500 | consumed samples: 8542720 | consumed tokens: 17495490560 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.071560E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.804 | TFLOPs: 31.68 | +7: iteration 33380/ 173500 | consumed samples: 8545280 | consumed tokens: 17500733440 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.083460E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.421 | TFLOPs: 31.92 | +7: iteration 33390/ 173500 | consumed samples: 8547840 | consumed tokens: 17505976320 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.076367E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.100 | TFLOPs: 31.91 | +7: iteration 33400/ 173500 | consumed samples: 8550400 | consumed tokens: 17511219200 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.081284E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.093 | TFLOPs: 31.91 | +7: iteration 33410/ 173500 | consumed samples: 8552960 | consumed tokens: 17516462080 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.086226E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.293 | TFLOPs: 31.92 | +7: iteration 33420/ 173500 | consumed samples: 8555520 | consumed tokens: 17521704960 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.073073E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.866 | TFLOPs: 31.89 | +7: iteration 33430/ 173500 | consumed samples: 8558080 | consumed tokens: 17526947840 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.085196E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.345 | TFLOPs: 31.92 | +7: iteration 33440/ 173500 | consumed samples: 8560640 | consumed tokens: 17532190720 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.085999E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.567 | TFLOPs: 31.93 | +7: iteration 33450/ 173500 | consumed samples: 8563200 | consumed tokens: 17537433600 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.077459E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.729 | TFLOPs: 31.94 | +7: iteration 33460/ 173500 | consumed samples: 8565760 | consumed tokens: 17542676480 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.082265E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.382 | TFLOPs: 31.92 | +7: iteration 33470/ 173500 | consumed samples: 8568320 | consumed tokens: 17547919360 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.076684E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.328 | TFLOPs: 31.92 | +7: iteration 33480/ 173500 | consumed samples: 8570880 | consumed tokens: 17553162240 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.074351E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.227 | TFLOPs: 31.91 | +7: iteration 33490/ 173500 | consumed samples: 8573440 | consumed tokens: 17558405120 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.074597E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.841 | TFLOPs: 31.94 | +7: iteration 33500/ 173500 | consumed samples: 8576000 | consumed tokens: 17563648000 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.080623E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.636 | TFLOPs: 31.93 | +7: iteration 33510/ 173500 | consumed samples: 8578560 | consumed tokens: 17568890880 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.062360E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.601 | TFLOPs: 31.93 | +7: iteration 33520/ 173500 | consumed samples: 8581120 | consumed tokens: 17574133760 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.075591E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.389 | TFLOPs: 31.92 | +7: iteration 33530/ 173500 | consumed samples: 8583680 | consumed tokens: 17579376640 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.085627E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.361 | TFLOPs: 31.92 | +7: iteration 33540/ 173500 | consumed samples: 8586240 | consumed tokens: 17584619520 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.094226E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.572 | TFLOPs: 31.72 | +7: iteration 33550/ 173500 | consumed samples: 8588800 | consumed tokens: 17589862400 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.074742E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.677 | TFLOPs: 31.62 | +7: iteration 33560/ 173500 | consumed samples: 8591360 | consumed tokens: 17595105280 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.072884E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.775 | TFLOPs: 31.94 | +7: iteration 33570/ 173500 | consumed samples: 8593920 | consumed tokens: 17600348160 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.072270E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.673 | TFLOPs: 31.94 | +7: iteration 33580/ 173500 | consumed samples: 8596480 | consumed tokens: 17605591040 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.093167E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.508 | TFLOPs: 31.93 | +7: iteration 33590/ 173500 | consumed samples: 8599040 | consumed tokens: 17610833920 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.077764E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.541 | TFLOPs: 31.93 | +7: iteration 33600/ 173500 | consumed samples: 8601600 | consumed tokens: 17616076800 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.061753E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.773 | TFLOPs: 31.94 | +7: iteration 33610/ 173500 | consumed samples: 8604160 | consumed tokens: 17621319680 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.089676E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.680 | TFLOPs: 31.94 | +7: iteration 33620/ 173500 | consumed samples: 8606720 | consumed tokens: 17626562560 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.084324E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.547 | TFLOPs: 31.93 | +7: iteration 33630/ 173500 | consumed samples: 8609280 | consumed tokens: 17631805440 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.061322E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.301 | TFLOPs: 31.92 | +7: iteration 33640/ 173500 | consumed samples: 8611840 | consumed tokens: 17637048320 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.090600E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.256 | TFLOPs: 31.91 | +7: iteration 33650/ 173500 | consumed samples: 8614400 | consumed tokens: 17642291200 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.065444E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.612 | TFLOPs: 31.93 | +7: iteration 33660/ 173500 | consumed samples: 8616960 | consumed tokens: 17647534080 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.065583E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.700 | TFLOPs: 31.94 | +7: iteration 33670/ 173500 | consumed samples: 8619520 | consumed tokens: 17652776960 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.067835E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.985 | TFLOPs: 31.95 | +7: iteration 33680/ 173500 | consumed samples: 8622080 | consumed tokens: 17658019840 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.085847E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.850 | TFLOPs: 31.95 | +7: iteration 33690/ 173500 | consumed samples: 8624640 | consumed tokens: 17663262720 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.088774E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.753 | TFLOPs: 31.94 | +7: iteration 33700/ 173500 | consumed samples: 8627200 | consumed tokens: 17668505600 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 3.082116E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.463 | TFLOPs: 31.93 | +7: iteration 33710/ 173500 | consumed samples: 8629760 | consumed tokens: 17673748480 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.079090E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.291 | TFLOPs: 31.92 | +7: iteration 33720/ 173500 | consumed samples: 8632320 | consumed tokens: 17678991360 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.073108E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.653 | TFLOPs: 31.94 | +7: iteration 33730/ 173500 | consumed samples: 8634880 | consumed tokens: 17684234240 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.080389E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.516 | TFLOPs: 31.93 | +7: iteration 33740/ 173500 | consumed samples: 8637440 | consumed tokens: 17689477120 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.074725E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.862 | TFLOPs: 31.95 | +7: iteration 33750/ 173500 | consumed samples: 8640000 | consumed tokens: 17694720000 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.093825E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.971 | TFLOPs: 31.95 | +7: iteration 33760/ 173500 | consumed samples: 8642560 | consumed tokens: 17699962880 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.092771E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.768 | TFLOPs: 31.94 | +7: iteration 33770/ 173500 | consumed samples: 8645120 | consumed tokens: 17705205760 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.074675E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.828 | TFLOPs: 31.94 | +7: iteration 33780/ 173500 | consumed samples: 8647680 | consumed tokens: 17710448640 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.068317E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.954 | TFLOPs: 31.95 | +7: iteration 33790/ 173500 | consumed samples: 8650240 | consumed tokens: 17715691520 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.069283E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.473 | TFLOPs: 31.87 | +7: iteration 33800/ 173500 | consumed samples: 8652800 | consumed tokens: 17720934400 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.062582E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.885 | TFLOPs: 31.89 | +7: iteration 33810/ 173500 | consumed samples: 8655360 | consumed tokens: 17726177280 | elapsed time per iteration (s): 0.42 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.087728E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.657 | TFLOPs: 31.94 | +7: iteration 33820/ 173500 | consumed samples: 8657920 | consumed tokens: 17731420160 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.079223E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.357 | TFLOPs: 31.92 | +7: iteration 33830/ 173500 | consumed samples: 8660480 | consumed tokens: 17736663040 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.079706E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.883 | TFLOPs: 31.95 | +7: iteration 33840/ 173500 | consumed samples: 8663040 | consumed tokens: 17741905920 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.077945E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.377 | TFLOPs: 31.92 | +7: iteration 33850/ 173500 | consumed samples: 8665600 | consumed tokens: 17747148800 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.081470E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.081 | TFLOPs: 31.91 | +7: iteration 33860/ 173500 | consumed samples: 8668160 | consumed tokens: 17752391680 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.068061E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.312 | TFLOPs: 31.92 | +7: iteration 33870/ 173500 | consumed samples: 8670720 | consumed tokens: 17757634560 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.076177E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.583 | TFLOPs: 31.93 | +7: iteration 33880/ 173500 | consumed samples: 8673280 | consumed tokens: 17762877440 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.064843E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.158 | TFLOPs: 31.96 | +7: iteration 33890/ 173500 | consumed samples: 8675840 | consumed tokens: 17768120320 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.082472E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.787 | TFLOPs: 31.94 | +7: iteration 33900/ 173500 | consumed samples: 8678400 | consumed tokens: 17773363200 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.079703E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.583 | TFLOPs: 31.93 | +7: iteration 33910/ 173500 | consumed samples: 8680960 | consumed tokens: 17778606080 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.075796E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.191 | TFLOPs: 31.91 | +7: iteration 33920/ 173500 | consumed samples: 8683520 | consumed tokens: 17783848960 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.081704E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.648 | TFLOPs: 31.93 | +7: iteration 33930/ 173500 | consumed samples: 8686080 | consumed tokens: 17789091840 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.071348E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.407 | TFLOPs: 31.92 | +7: iteration 33940/ 173500 | consumed samples: 8688640 | consumed tokens: 17794334720 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.071489E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.850 | TFLOPs: 31.95 | +7: iteration 33950/ 173500 | consumed samples: 8691200 | consumed tokens: 17799577600 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.078870E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.715 | TFLOPs: 31.94 | +7: iteration 33960/ 173500 | consumed samples: 8693760 | consumed tokens: 17804820480 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.070706E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.074 | TFLOPs: 31.96 | +7: iteration 33970/ 173500 | consumed samples: 8696320 | consumed tokens: 17810063360 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.083347E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.378 | TFLOPs: 31.92 | +7: iteration 33980/ 173500 | consumed samples: 8698880 | consumed tokens: 17815306240 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.070238E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.659 | TFLOPs: 31.94 | +7: iteration 33990/ 173500 | consumed samples: 8701440 | consumed tokens: 17820549120 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.068881E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.674 | TFLOPs: 31.94 | +0: [2023-03-17 03:13:46,702] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=0, lr=[0.00018477830620634072, 0.00018477830620634072, 0.00018477830620634072], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 34000/ 173500 | consumed samples: 8704000 | consumed tokens: 17825792000 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.063592E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.561 | TFLOPs: 31.93 | +0: steps: 34000 loss: 3.0520 iter time (s): 0.421 samples/sec: 608.084 +7: iteration 34010/ 173500 | consumed samples: 8706560 | consumed tokens: 17831034880 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.072315E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.593 | TFLOPs: 31.83 | +7: iteration 34020/ 173500 | consumed samples: 8709120 | consumed tokens: 17836277760 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.077948E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.817 | TFLOPs: 31.94 | +7: iteration 34030/ 173500 | consumed samples: 8711680 | consumed tokens: 17841520640 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.082406E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.685 | TFLOPs: 31.94 | +7: iteration 34040/ 173500 | consumed samples: 8714240 | consumed tokens: 17846763520 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.080444E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.501 | TFLOPs: 31.93 | +7: iteration 34050/ 173500 | consumed samples: 8716800 | consumed tokens: 17852006400 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.093202E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.078 | TFLOPs: 31.64 | +7: iteration 34060/ 173500 | consumed samples: 8719360 | consumed tokens: 17857249280 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.077901E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.131 | TFLOPs: 31.96 | +7: iteration 34070/ 173500 | consumed samples: 8721920 | consumed tokens: 17862492160 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.072429E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.953 | TFLOPs: 31.95 | +7: iteration 34080/ 173500 | consumed samples: 8724480 | consumed tokens: 17867735040 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.069530E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.892 | TFLOPs: 31.95 | +7: iteration 34090/ 173500 | consumed samples: 8727040 | consumed tokens: 17872977920 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.087870E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.820 | TFLOPs: 31.94 | +7: iteration 34100/ 173500 | consumed samples: 8729600 | consumed tokens: 17878220800 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.068554E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.890 | TFLOPs: 31.95 | +7: iteration 34110/ 173500 | consumed samples: 8732160 | consumed tokens: 17883463680 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.071536E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.702 | TFLOPs: 31.94 | +7: iteration 34120/ 173500 | consumed samples: 8734720 | consumed tokens: 17888706560 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.065419E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.844 | TFLOPs: 31.95 | +7: iteration 34130/ 173500 | consumed samples: 8737280 | consumed tokens: 17893949440 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.067695E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.779 | TFLOPs: 31.94 | +7: iteration 34140/ 173500 | consumed samples: 8739840 | consumed tokens: 17899192320 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.074207E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.704 | TFLOPs: 31.94 | +7: iteration 34150/ 173500 | consumed samples: 8742400 | consumed tokens: 17904435200 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.072530E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.095 | TFLOPs: 31.96 | +7: iteration 34160/ 173500 | consumed samples: 8744960 | consumed tokens: 17909678080 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.068435E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.463 | TFLOPs: 31.93 | +7: iteration 34170/ 173500 | consumed samples: 8747520 | consumed tokens: 17914920960 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.083970E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.452 | TFLOPs: 31.92 | +7: iteration 34180/ 173500 | consumed samples: 8750080 | consumed tokens: 17920163840 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.071170E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.962 | TFLOPs: 31.95 | +7: iteration 34190/ 173500 | consumed samples: 8752640 | consumed tokens: 17925406720 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.068277E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.441 | TFLOPs: 31.92 | +7: iteration 34200/ 173500 | consumed samples: 8755200 | consumed tokens: 17930649600 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.067515E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.878 | TFLOPs: 31.95 | +7: iteration 34210/ 173500 | consumed samples: 8757760 | consumed tokens: 17935892480 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.072563E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.841 | TFLOPs: 31.94 | +7: iteration 34220/ 173500 | consumed samples: 8760320 | consumed tokens: 17941135360 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.081892E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.979 | TFLOPs: 31.95 | +7: iteration 34230/ 173500 | consumed samples: 8762880 | consumed tokens: 17946378240 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.070495E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.426 | TFLOPs: 31.82 | +7: iteration 34240/ 173500 | consumed samples: 8765440 | consumed tokens: 17951621120 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.073211E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.028 | TFLOPs: 31.95 | +7: iteration 34250/ 173500 | consumed samples: 8768000 | consumed tokens: 17956864000 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.066527E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.716 | TFLOPs: 31.94 | +7: iteration 34260/ 173500 | consumed samples: 8770560 | consumed tokens: 17962106880 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.070992E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.839 | TFLOPs: 31.94 | +7: iteration 34270/ 173500 | consumed samples: 8773120 | consumed tokens: 17967349760 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.068431E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.248 | TFLOPs: 31.91 | +7: iteration 34280/ 173500 | consumed samples: 8775680 | consumed tokens: 17972592640 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.086472E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.347 | TFLOPs: 31.92 | +7: iteration 34290/ 173500 | consumed samples: 8778240 | consumed tokens: 17977835520 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.080302E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.680 | TFLOPs: 31.94 | +7: iteration 34300/ 173500 | consumed samples: 8780800 | consumed tokens: 17983078400 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.082608E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.910 | TFLOPs: 31.95 | +7: iteration 34310/ 173500 | consumed samples: 8783360 | consumed tokens: 17988321280 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.063090E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.202 | TFLOPs: 31.91 | +7: iteration 34320/ 173500 | consumed samples: 8785920 | consumed tokens: 17993564160 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.065364E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.084 | TFLOPs: 31.91 | +7: iteration 34330/ 173500 | consumed samples: 8788480 | consumed tokens: 17998807040 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.071773E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.081 | TFLOPs: 31.91 | +7: iteration 34340/ 173500 | consumed samples: 8791040 | consumed tokens: 18004049920 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.074329E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.234 | TFLOPs: 31.91 | +7: iteration 34350/ 173500 | consumed samples: 8793600 | consumed tokens: 18009292800 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.075075E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.680 | TFLOPs: 31.94 | +7: iteration 34360/ 173500 | consumed samples: 8796160 | consumed tokens: 18014535680 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.079578E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.365 | TFLOPs: 31.92 | +7: iteration 34370/ 173500 | consumed samples: 8798720 | consumed tokens: 18019778560 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.096719E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.291 | TFLOPs: 31.92 | +7: iteration 34380/ 173500 | consumed samples: 8801280 | consumed tokens: 18025021440 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.078130E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.354 | TFLOPs: 31.76 | +7: iteration 34390/ 173500 | consumed samples: 8803840 | consumed tokens: 18030264320 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.081691E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.360 | TFLOPs: 31.92 | +7: iteration 34400/ 173500 | consumed samples: 8806400 | consumed tokens: 18035507200 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.078430E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.494 | TFLOPs: 31.93 | +7: iteration 34410/ 173500 | consumed samples: 8808960 | consumed tokens: 18040750080 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.065909E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.455 | TFLOPs: 31.92 | +7: iteration 34420/ 173500 | consumed samples: 8811520 | consumed tokens: 18045992960 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.070960E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.272 | TFLOPs: 31.92 | +7: iteration 34430/ 173500 | consumed samples: 8814080 | consumed tokens: 18051235840 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.071896E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.907 | TFLOPs: 31.90 | +7: iteration 34440/ 173500 | consumed samples: 8816640 | consumed tokens: 18056478720 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.070465E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.080 | TFLOPs: 31.90 | +7: iteration 34450/ 173500 | consumed samples: 8819200 | consumed tokens: 18061721600 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.076143E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.138 | TFLOPs: 31.91 | +7: iteration 34460/ 173500 | consumed samples: 8821760 | consumed tokens: 18066964480 | elapsed time per iteration (s): 0.42 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 3.069966E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.394 | TFLOPs: 31.87 | +7: iteration 34470/ 173500 | consumed samples: 8824320 | consumed tokens: 18072207360 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.089605E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.919 | TFLOPs: 31.90 | +7: iteration 34480/ 173500 | consumed samples: 8826880 | consumed tokens: 18077450240 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.057137E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.002 | TFLOPs: 31.90 | +7: iteration 34490/ 173500 | consumed samples: 8829440 | consumed tokens: 18082693120 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.082091E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.957 | TFLOPs: 31.90 | +7: iteration 34500/ 173500 | consumed samples: 8832000 | consumed tokens: 18087936000 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.066290E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.932 | TFLOPs: 31.90 | +7: iteration 34510/ 173500 | consumed samples: 8834560 | consumed tokens: 18093178880 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.068804E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.198 | TFLOPs: 31.91 | +7: iteration 34520/ 173500 | consumed samples: 8837120 | consumed tokens: 18098421760 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.076998E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.811 | TFLOPs: 31.89 | +7: iteration 34530/ 173500 | consumed samples: 8839680 | consumed tokens: 18103664640 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.082316E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.332 | TFLOPs: 31.92 | +7: iteration 34540/ 173500 | consumed samples: 8842240 | consumed tokens: 18108907520 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.076967E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.473 | TFLOPs: 31.93 | +7: iteration 34550/ 173500 | consumed samples: 8844800 | consumed tokens: 18114150400 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.080982E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.546 | TFLOPs: 31.93 | +7: iteration 34560/ 173500 | consumed samples: 8847360 | consumed tokens: 18119393280 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.086202E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.498 | TFLOPs: 31.77 | +7: iteration 34570/ 173500 | consumed samples: 8849920 | consumed tokens: 18124636160 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.067270E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.687 | TFLOPs: 31.88 | +7: iteration 34580/ 173500 | consumed samples: 8852480 | consumed tokens: 18129879040 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.072494E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.788 | TFLOPs: 31.89 | +7: iteration 34590/ 173500 | consumed samples: 8855040 | consumed tokens: 18135121920 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.054217E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.132 | TFLOPs: 31.91 | +7: iteration 34600/ 173500 | consumed samples: 8857600 | consumed tokens: 18140364800 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.074732E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.275 | TFLOPs: 31.92 | +7: iteration 34610/ 173500 | consumed samples: 8860160 | consumed tokens: 18145607680 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.073693E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.437 | TFLOPs: 31.92 | +7: iteration 34620/ 173500 | consumed samples: 8862720 | consumed tokens: 18150850560 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.070438E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.485 | TFLOPs: 31.93 | +7: iteration 34630/ 173500 | consumed samples: 8865280 | consumed tokens: 18156093440 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.058241E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.193 | TFLOPs: 31.91 | +7: iteration 34640/ 173500 | consumed samples: 8867840 | consumed tokens: 18161336320 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.083254E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.751 | TFLOPs: 31.89 | +7: iteration 34650/ 173500 | consumed samples: 8870400 | consumed tokens: 18166579200 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.085131E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.176 | TFLOPs: 31.91 | +7: iteration 34660/ 173500 | consumed samples: 8872960 | consumed tokens: 18171822080 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.074345E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.141 | TFLOPs: 31.70 | +7: iteration 34670/ 173500 | consumed samples: 8875520 | consumed tokens: 18177064960 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.066088E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.636 | TFLOPs: 31.93 | +7: iteration 34680/ 173500 | consumed samples: 8878080 | consumed tokens: 18182307840 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.067599E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.963 | TFLOPs: 31.95 | +7: iteration 34690/ 173500 | consumed samples: 8880640 | consumed tokens: 18187550720 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.075992E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.693 | TFLOPs: 31.94 | +7: iteration 34700/ 173500 | consumed samples: 8883200 | consumed tokens: 18192793600 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.072024E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.581 | TFLOPs: 31.93 | +7: iteration 34710/ 173500 | consumed samples: 8885760 | consumed tokens: 18198036480 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.073573E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.628 | TFLOPs: 31.93 | +7: iteration 34720/ 173500 | consumed samples: 8888320 | consumed tokens: 18203279360 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.063093E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.478 | TFLOPs: 31.93 | +7: iteration 34730/ 173500 | consumed samples: 8890880 | consumed tokens: 18208522240 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.066139E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.081 | TFLOPs: 31.96 | +7: iteration 34740/ 173500 | consumed samples: 8893440 | consumed tokens: 18213765120 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.052015E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.805 | TFLOPs: 31.94 | +7: iteration 34750/ 173500 | consumed samples: 8896000 | consumed tokens: 18219008000 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.066839E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.700 | TFLOPs: 31.94 | +7: iteration 34760/ 173500 | consumed samples: 8898560 | consumed tokens: 18224250880 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.075723E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.926 | TFLOPs: 31.95 | +7: iteration 34770/ 173500 | consumed samples: 8901120 | consumed tokens: 18229493760 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.081703E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.583 | TFLOPs: 31.93 | +7: iteration 34780/ 173500 | consumed samples: 8903680 | consumed tokens: 18234736640 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 3.086466E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.473 | TFLOPs: 31.93 | +7: iteration 34790/ 173500 | consumed samples: 8906240 | consumed tokens: 18239979520 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.094000E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.946 | TFLOPs: 31.95 | +7: iteration 34800/ 173500 | consumed samples: 8908800 | consumed tokens: 18245222400 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.070151E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.864 | TFLOPs: 31.95 | +7: iteration 34810/ 173500 | consumed samples: 8911360 | consumed tokens: 18250465280 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.047789E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.671 | TFLOPs: 31.94 | +7: iteration 34820/ 173500 | consumed samples: 8913920 | consumed tokens: 18255708160 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.078138E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.629 | TFLOPs: 31.93 | +7: iteration 34830/ 173500 | consumed samples: 8916480 | consumed tokens: 18260951040 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.062323E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.647 | TFLOPs: 31.93 | +7: iteration 34840/ 173500 | consumed samples: 8919040 | consumed tokens: 18266193920 | elapsed time per iteration (s): 0.43 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.068308E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.138 | TFLOPs: 31.23 | +7: iteration 34850/ 173500 | consumed samples: 8921600 | consumed tokens: 18271436800 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.063758E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.212 | TFLOPs: 31.96 | +7: iteration 34860/ 173500 | consumed samples: 8924160 | consumed tokens: 18276679680 | elapsed time per iteration (s): 0.43 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.065574E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.858 | TFLOPs: 31.32 | +7: iteration 34870/ 173500 | consumed samples: 8926720 | consumed tokens: 18281922560 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.062587E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.731 | TFLOPs: 31.99 | +7: iteration 34880/ 173500 | consumed samples: 8929280 | consumed tokens: 18287165440 | elapsed time per iteration (s): 0.43 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.074051E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.886 | TFLOPs: 31.58 | +7: iteration 34890/ 173500 | consumed samples: 8931840 | consumed tokens: 18292408320 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.072168E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.536 | TFLOPs: 31.72 | +7: iteration 34900/ 173500 | consumed samples: 8934400 | consumed tokens: 18297651200 | elapsed time per iteration (s): 0.42 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.069065E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.272 | TFLOPs: 31.65 | +7: iteration 34910/ 173500 | consumed samples: 8936960 | consumed tokens: 18302894080 | elapsed time per iteration (s): 0.42 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.067032E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.419 | TFLOPs: 31.98 | +7: iteration 34920/ 173500 | consumed samples: 8939520 | consumed tokens: 18308136960 | elapsed time per iteration (s): 0.42 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.072675E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.873 | TFLOPs: 31.63 | +7: iteration 34930/ 173500 | consumed samples: 8942080 | consumed tokens: 18313379840 | elapsed time per iteration (s): 0.43 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.071992E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.692 | TFLOPs: 31.31 | +7: iteration 34940/ 173500 | consumed samples: 8944640 | consumed tokens: 18318622720 | elapsed time per iteration (s): 0.43 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.069118E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.218 | TFLOPs: 30.97 | +7: iteration 34950/ 173500 | consumed samples: 8947200 | consumed tokens: 18323865600 | elapsed time per iteration (s): 0.43 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.072222E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.030 | TFLOPs: 30.96 | +7: iteration 34960/ 173500 | consumed samples: 8949760 | consumed tokens: 18329108480 | elapsed time per iteration (s): 0.44 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.081236E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.480 | TFLOPs: 30.30 | +7: iteration 34970/ 173500 | consumed samples: 8952320 | consumed tokens: 18334351360 | elapsed time per iteration (s): 0.45 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.064027E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.829 | TFLOPs: 29.85 | +7: iteration 34980/ 173500 | consumed samples: 8954880 | consumed tokens: 18339594240 | elapsed time per iteration (s): 0.46 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.079283E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.431 | TFLOPs: 29.09 | +7: iteration 34990/ 173500 | consumed samples: 8957440 | consumed tokens: 18344837120 | elapsed time per iteration (s): 0.44 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.059673E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.301 | TFLOPs: 30.60 | +7: iteration 35000/ 173500 | consumed samples: 8960000 | consumed tokens: 18350080000 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.068896E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.182 | TFLOPs: 31.75 | +7: iteration 35010/ 173500 | consumed samples: 8962560 | consumed tokens: 18355322880 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.087302E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.676 | TFLOPs: 32.09 | +7: iteration 35020/ 173500 | consumed samples: 8965120 | consumed tokens: 18360565760 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.060201E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.014 | TFLOPs: 32.06 | +7: iteration 35030/ 173500 | consumed samples: 8967680 | consumed tokens: 18365808640 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.075839E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.754 | TFLOPs: 32.05 | +7: iteration 35040/ 173500 | consumed samples: 8970240 | consumed tokens: 18371051520 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.065799E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.398 | TFLOPs: 32.03 | +7: iteration 35050/ 173500 | consumed samples: 8972800 | consumed tokens: 18376294400 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.045011E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.366 | TFLOPs: 32.02 | +7: iteration 35060/ 173500 | consumed samples: 8975360 | consumed tokens: 18381537280 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.061057E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.520 | TFLOPs: 32.03 | +7: iteration 35070/ 173500 | consumed samples: 8977920 | consumed tokens: 18386780160 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.079161E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.172 | TFLOPs: 32.01 | +7: iteration 35080/ 173500 | consumed samples: 8980480 | consumed tokens: 18392023040 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.078316E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.010 | TFLOPs: 32.01 | +7: iteration 35090/ 173500 | consumed samples: 8983040 | consumed tokens: 18397265920 | elapsed time per iteration (s): 0.43 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.057574E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.212 | TFLOPs: 31.60 | +7: iteration 35100/ 173500 | consumed samples: 8985600 | consumed tokens: 18402508800 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.079897E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.370 | TFLOPs: 32.03 | +7: iteration 35110/ 173500 | consumed samples: 8988160 | consumed tokens: 18407751680 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.078643E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.317 | TFLOPs: 32.02 | +7: iteration 35120/ 173500 | consumed samples: 8990720 | consumed tokens: 18412994560 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.070240E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.759 | TFLOPs: 31.99 | +7: iteration 35130/ 173500 | consumed samples: 8993280 | consumed tokens: 18418237440 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.061122E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.713 | TFLOPs: 31.99 | +7: iteration 35140/ 173500 | consumed samples: 8995840 | consumed tokens: 18423480320 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.069727E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.996 | TFLOPs: 31.95 | +7: iteration 35150/ 173500 | consumed samples: 8998400 | consumed tokens: 18428723200 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.060845E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.708 | TFLOPs: 31.94 | +7: iteration 35160/ 173500 | consumed samples: 9000960 | consumed tokens: 18433966080 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.058291E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.145 | TFLOPs: 31.96 | +7: iteration 35170/ 173500 | consumed samples: 9003520 | consumed tokens: 18439208960 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.055853E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.913 | TFLOPs: 31.95 | +7: iteration 35180/ 173500 | consumed samples: 9006080 | consumed tokens: 18444451840 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.077292E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.979 | TFLOPs: 31.95 | +7: iteration 35190/ 173500 | consumed samples: 9008640 | consumed tokens: 18449694720 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.063017E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.871 | TFLOPs: 31.95 | +7: iteration 35200/ 173500 | consumed samples: 9011200 | consumed tokens: 18454937600 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.061124E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.735 | TFLOPs: 31.94 | +7: iteration 35210/ 173500 | consumed samples: 9013760 | consumed tokens: 18460180480 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 3.072868E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.937 | TFLOPs: 31.95 | +7: iteration 35220/ 173500 | consumed samples: 9016320 | consumed tokens: 18465423360 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.071621E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.797 | TFLOPs: 31.94 | +7: iteration 35230/ 173500 | consumed samples: 9018880 | consumed tokens: 18470666240 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.071504E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.941 | TFLOPs: 31.95 | +7: iteration 35240/ 173500 | consumed samples: 9021440 | consumed tokens: 18475909120 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.083606E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.097 | TFLOPs: 31.96 | +7: iteration 35250/ 173500 | consumed samples: 9024000 | consumed tokens: 18481152000 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.073073E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.247 | TFLOPs: 31.97 | +7: iteration 35260/ 173500 | consumed samples: 9026560 | consumed tokens: 18486394880 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.059469E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.938 | TFLOPs: 31.95 | +7: iteration 35270/ 173500 | consumed samples: 9029120 | consumed tokens: 18491637760 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.065742E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.757 | TFLOPs: 31.94 | +7: iteration 35280/ 173500 | consumed samples: 9031680 | consumed tokens: 18496880640 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.061916E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.863 | TFLOPs: 31.95 | +7: iteration 35290/ 173500 | consumed samples: 9034240 | consumed tokens: 18502123520 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.074316E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.537 | TFLOPs: 31.98 | +7: iteration 35300/ 173500 | consumed samples: 9036800 | consumed tokens: 18507366400 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.071043E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.622 | TFLOPs: 31.93 | +7: iteration 35310/ 173500 | consumed samples: 9039360 | consumed tokens: 18512609280 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.078820E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.648 | TFLOPs: 31.93 | +7: iteration 35320/ 173500 | consumed samples: 9041920 | consumed tokens: 18517852160 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.054347E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.036 | TFLOPs: 31.96 | +7: iteration 35330/ 173500 | consumed samples: 9044480 | consumed tokens: 18523095040 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.054790E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.807 | TFLOPs: 31.94 | +7: iteration 35340/ 173500 | consumed samples: 9047040 | consumed tokens: 18528337920 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.072869E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.087 | TFLOPs: 31.96 | +7: iteration 35350/ 173500 | consumed samples: 9049600 | consumed tokens: 18533580800 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.081223E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.019 | TFLOPs: 31.95 | +7: iteration 35360/ 173500 | consumed samples: 9052160 | consumed tokens: 18538823680 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.069141E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.028 | TFLOPs: 31.95 | +7: iteration 35370/ 173500 | consumed samples: 9054720 | consumed tokens: 18544066560 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.061228E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.749 | TFLOPs: 31.94 | +7: iteration 35380/ 173500 | consumed samples: 9057280 | consumed tokens: 18549309440 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.065348E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.224 | TFLOPs: 31.97 | +7: iteration 35390/ 173500 | consumed samples: 9059840 | consumed tokens: 18554552320 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.082777E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.772 | TFLOPs: 31.94 | +7: iteration 35400/ 173500 | consumed samples: 9062400 | consumed tokens: 18559795200 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.055122E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.162 | TFLOPs: 31.96 | +7: iteration 35410/ 173500 | consumed samples: 9064960 | consumed tokens: 18565038080 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.066819E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.886 | TFLOPs: 31.95 | +7: iteration 35420/ 173500 | consumed samples: 9067520 | consumed tokens: 18570280960 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.065316E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.814 | TFLOPs: 31.94 | +7: iteration 35430/ 173500 | consumed samples: 9070080 | consumed tokens: 18575523840 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.075392E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.730 | TFLOPs: 31.94 | +7: iteration 35440/ 173500 | consumed samples: 9072640 | consumed tokens: 18580766720 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.066161E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.751 | TFLOPs: 31.94 | +7: iteration 35450/ 173500 | consumed samples: 9075200 | consumed tokens: 18586009600 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.070789E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.822 | TFLOPs: 31.94 | +7: iteration 35460/ 173500 | consumed samples: 9077760 | consumed tokens: 18591252480 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.068289E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.875 | TFLOPs: 31.95 | +7: iteration 35470/ 173500 | consumed samples: 9080320 | consumed tokens: 18596495360 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.079801E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.915 | TFLOPs: 31.95 | +7: iteration 35480/ 173500 | consumed samples: 9082880 | consumed tokens: 18601738240 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.067792E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.237 | TFLOPs: 31.97 | +7: iteration 35490/ 173500 | consumed samples: 9085440 | consumed tokens: 18606981120 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.048999E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.209 | TFLOPs: 31.96 | +7: iteration 35500/ 173500 | consumed samples: 9088000 | consumed tokens: 18612224000 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.076750E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.546 | TFLOPs: 31.93 | +7: iteration 35510/ 173500 | consumed samples: 9090560 | consumed tokens: 18617466880 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.070509E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.832 | TFLOPs: 31.73 | +7: iteration 35520/ 173500 | consumed samples: 9093120 | consumed tokens: 18622709760 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 3.076777E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.341 | TFLOPs: 31.97 | +7: iteration 35530/ 173500 | consumed samples: 9095680 | consumed tokens: 18627952640 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.063676E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.206 | TFLOPs: 31.91 | +7: iteration 35540/ 173500 | consumed samples: 9098240 | consumed tokens: 18633195520 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.062399E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.716 | TFLOPs: 31.94 | +7: iteration 35550/ 173500 | consumed samples: 9100800 | consumed tokens: 18638438400 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.050532E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.703 | TFLOPs: 31.94 | +7: iteration 35560/ 173500 | consumed samples: 9103360 | consumed tokens: 18643681280 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.072943E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.889 | TFLOPs: 31.95 | +7: iteration 35570/ 173500 | consumed samples: 9105920 | consumed tokens: 18648924160 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.058117E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.944 | TFLOPs: 31.95 | +7: iteration 35580/ 173500 | consumed samples: 9108480 | consumed tokens: 18654167040 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.054987E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.652 | TFLOPs: 31.94 | +7: iteration 35590/ 173500 | consumed samples: 9111040 | consumed tokens: 18659409920 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.067836E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.458 | TFLOPs: 31.92 | +7: iteration 35600/ 173500 | consumed samples: 9113600 | consumed tokens: 18664652800 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.062255E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.443 | TFLOPs: 31.92 | +7: iteration 35610/ 173500 | consumed samples: 9116160 | consumed tokens: 18669895680 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.061540E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.808 | TFLOPs: 31.94 | +7: iteration 35620/ 173500 | consumed samples: 9118720 | consumed tokens: 18675138560 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.066137E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.443 | TFLOPs: 31.92 | +7: iteration 35630/ 173500 | consumed samples: 9121280 | consumed tokens: 18680381440 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.063906E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.733 | TFLOPs: 31.94 | +7: iteration 35640/ 173500 | consumed samples: 9123840 | consumed tokens: 18685624320 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.057875E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.927 | TFLOPs: 31.95 | +7: iteration 35650/ 173500 | consumed samples: 9126400 | consumed tokens: 18690867200 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.059308E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.475 | TFLOPs: 31.66 | +7: iteration 35660/ 173500 | consumed samples: 9128960 | consumed tokens: 18696110080 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.067271E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.775 | TFLOPs: 31.94 | +7: iteration 35670/ 173500 | consumed samples: 9131520 | consumed tokens: 18701352960 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.062686E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.829 | TFLOPs: 31.94 | +7: iteration 35680/ 173500 | consumed samples: 9134080 | consumed tokens: 18706595840 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.064883E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.521 | TFLOPs: 31.93 | +7: iteration 35690/ 173500 | consumed samples: 9136640 | consumed tokens: 18711838720 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.057034E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.087 | TFLOPs: 31.96 | +7: iteration 35700/ 173500 | consumed samples: 9139200 | consumed tokens: 18717081600 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.075016E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.170 | TFLOPs: 31.96 | +7: iteration 35710/ 173500 | consumed samples: 9141760 | consumed tokens: 18722324480 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.076637E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.860 | TFLOPs: 31.95 | +7: iteration 35720/ 173500 | consumed samples: 9144320 | consumed tokens: 18727567360 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.062606E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.275 | TFLOPs: 31.97 | +7: iteration 35730/ 173500 | consumed samples: 9146880 | consumed tokens: 18732810240 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.076563E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.058 | TFLOPs: 31.96 | +7: iteration 35740/ 173500 | consumed samples: 9149440 | consumed tokens: 18738053120 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.065065E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.568 | TFLOPs: 31.98 | +7: iteration 35750/ 173500 | consumed samples: 9152000 | consumed tokens: 18743296000 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.058569E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.432 | TFLOPs: 31.98 | +7: iteration 35760/ 173500 | consumed samples: 9154560 | consumed tokens: 18748538880 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.054236E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.422 | TFLOPs: 31.98 | +7: iteration 35770/ 173500 | consumed samples: 9157120 | consumed tokens: 18753781760 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.075420E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.012 | TFLOPs: 31.95 | +7: iteration 35780/ 173500 | consumed samples: 9159680 | consumed tokens: 18759024640 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.063726E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.292 | TFLOPs: 31.97 | +7: iteration 35790/ 173500 | consumed samples: 9162240 | consumed tokens: 18764267520 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.046663E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.236 | TFLOPs: 31.97 | +7: iteration 35800/ 173500 | consumed samples: 9164800 | consumed tokens: 18769510400 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.066606E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.195 | TFLOPs: 31.96 | +7: iteration 35810/ 173500 | consumed samples: 9167360 | consumed tokens: 18774753280 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.054089E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.039 | TFLOPs: 31.96 | +7: iteration 35820/ 173500 | consumed samples: 9169920 | consumed tokens: 18779996160 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.078687E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.215 | TFLOPs: 31.96 | +7: iteration 35830/ 173500 | consumed samples: 9172480 | consumed tokens: 18785239040 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.083919E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.320 | TFLOPs: 31.97 | +7: iteration 35840/ 173500 | consumed samples: 9175040 | consumed tokens: 18790481920 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 3.061291E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.868 | TFLOPs: 31.95 | +7: iteration 35850/ 173500 | consumed samples: 9177600 | consumed tokens: 18795724800 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.067068E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.982 | TFLOPs: 31.95 | +7: iteration 35860/ 173500 | consumed samples: 9180160 | consumed tokens: 18800967680 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.064595E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.707 | TFLOPs: 31.94 | +7: iteration 35870/ 173500 | consumed samples: 9182720 | consumed tokens: 18806210560 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.072022E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.922 | TFLOPs: 31.95 | +7: iteration 35880/ 173500 | consumed samples: 9185280 | consumed tokens: 18811453440 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.060715E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.643 | TFLOPs: 31.93 | +7: iteration 35890/ 173500 | consumed samples: 9187840 | consumed tokens: 18816696320 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.053262E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.316 | TFLOPs: 31.97 | +7: iteration 35900/ 173500 | consumed samples: 9190400 | consumed tokens: 18821939200 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.075739E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.853 | TFLOPs: 31.95 | +7: iteration 35910/ 173500 | consumed samples: 9192960 | consumed tokens: 18827182080 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.047430E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.096 | TFLOPs: 31.96 | +7: iteration 35920/ 173500 | consumed samples: 9195520 | consumed tokens: 18832424960 | elapsed time per iteration (s): 0.43 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.053997E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.612 | TFLOPs: 31.57 | +7: iteration 35930/ 173500 | consumed samples: 9198080 | consumed tokens: 18837667840 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.050560E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.785 | TFLOPs: 31.99 | +7: iteration 35940/ 173500 | consumed samples: 9200640 | consumed tokens: 18842910720 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.070040E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.234 | TFLOPs: 31.97 | +7: iteration 35950/ 173500 | consumed samples: 9203200 | consumed tokens: 18848153600 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.071242E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.016 | TFLOPs: 31.95 | +7: iteration 35960/ 173500 | consumed samples: 9205760 | consumed tokens: 18853396480 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.065954E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.138 | TFLOPs: 31.96 | +7: iteration 35970/ 173500 | consumed samples: 9208320 | consumed tokens: 18858639360 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.071127E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.107 | TFLOPs: 31.96 | +7: iteration 35980/ 173500 | consumed samples: 9210880 | consumed tokens: 18863882240 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.059713E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.049 | TFLOPs: 31.96 | +7: iteration 35990/ 173500 | consumed samples: 9213440 | consumed tokens: 18869125120 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.050380E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.141 | TFLOPs: 31.96 | +0: [2023-03-17 03:27:49,774] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=0, lr=[0.00018289669072542715, 0.00018289669072542715, 0.00018289669072542715], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 36000/ 173500 | consumed samples: 9216000 | consumed tokens: 18874368000 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.059849E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.809 | TFLOPs: 31.94 | +0: steps: 36000 loss: 3.0540 iter time (s): 0.419 samples/sec: 610.744 +7: iteration 36010/ 173500 | consumed samples: 9218560 | consumed tokens: 18879610880 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.066271E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.723 | TFLOPs: 31.89 | +7: iteration 36020/ 173500 | consumed samples: 9221120 | consumed tokens: 18884853760 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.069009E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.588 | TFLOPs: 31.93 | +7: iteration 36030/ 173500 | consumed samples: 9223680 | consumed tokens: 18890096640 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.070304E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.913 | TFLOPs: 31.95 | +7: iteration 36040/ 173500 | consumed samples: 9226240 | consumed tokens: 18895339520 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.056650E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.210 | TFLOPs: 31.86 | +7: iteration 36050/ 173500 | consumed samples: 9228800 | consumed tokens: 18900582400 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.059617E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.070 | TFLOPs: 31.96 | +7: iteration 36060/ 173500 | consumed samples: 9231360 | consumed tokens: 18905825280 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.060226E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.733 | TFLOPs: 31.94 | +7: iteration 36070/ 173500 | consumed samples: 9233920 | consumed tokens: 18911068160 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.060716E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.141 | TFLOPs: 31.96 | +7: iteration 36080/ 173500 | consumed samples: 9236480 | consumed tokens: 18916311040 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.071784E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.353 | TFLOPs: 31.97 | +7: iteration 36090/ 173500 | consumed samples: 9239040 | consumed tokens: 18921553920 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.060310E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.822 | TFLOPs: 31.94 | +7: iteration 36100/ 173500 | consumed samples: 9241600 | consumed tokens: 18926796800 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.055334E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.707 | TFLOPs: 31.94 | +7: iteration 36110/ 173500 | consumed samples: 9244160 | consumed tokens: 18932039680 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.076507E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.959 | TFLOPs: 31.95 | +7: iteration 36120/ 173500 | consumed samples: 9246720 | consumed tokens: 18937282560 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.068061E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.853 | TFLOPs: 31.84 | +7: iteration 36130/ 173500 | consumed samples: 9249280 | consumed tokens: 18942525440 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.059087E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.319 | TFLOPs: 31.97 | +7: iteration 36140/ 173500 | consumed samples: 9251840 | consumed tokens: 18947768320 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.061892E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.179 | TFLOPs: 31.96 | +7: iteration 36150/ 173500 | consumed samples: 9254400 | consumed tokens: 18953011200 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.068924E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.353 | TFLOPs: 31.97 | +7: iteration 36160/ 173500 | consumed samples: 9256960 | consumed tokens: 18958254080 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.058429E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.155 | TFLOPs: 31.96 | +7: iteration 36170/ 173500 | consumed samples: 9259520 | consumed tokens: 18963496960 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.047693E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.229 | TFLOPs: 31.97 | +7: iteration 36180/ 173500 | consumed samples: 9262080 | consumed tokens: 18968739840 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.065043E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.956 | TFLOPs: 31.95 | +7: iteration 36190/ 173500 | consumed samples: 9264640 | consumed tokens: 18973982720 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.073053E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.338 | TFLOPs: 31.97 | +7: iteration 36200/ 173500 | consumed samples: 9267200 | consumed tokens: 18979225600 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.071976E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.249 | TFLOPs: 31.97 | +7: iteration 36210/ 173500 | consumed samples: 9269760 | consumed tokens: 18984468480 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.064310E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.020 | TFLOPs: 31.95 | +7: iteration 36220/ 173500 | consumed samples: 9272320 | consumed tokens: 18989711360 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.068003E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.978 | TFLOPs: 31.95 | +7: iteration 36230/ 173500 | consumed samples: 9274880 | consumed tokens: 18994954240 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.060367E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.851 | TFLOPs: 31.95 | +7: iteration 36240/ 173500 | consumed samples: 9277440 | consumed tokens: 19000197120 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.063082E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.578 | TFLOPs: 31.93 | +7: iteration 36250/ 173500 | consumed samples: 9280000 | consumed tokens: 19005440000 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.049461E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.866 | TFLOPs: 31.95 | +7: iteration 36260/ 173500 | consumed samples: 9282560 | consumed tokens: 19010682880 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.067984E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.887 | TFLOPs: 31.84 | +7: iteration 36270/ 173500 | consumed samples: 9285120 | consumed tokens: 19015925760 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.078761E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.020 | TFLOPs: 31.95 | +7: iteration 36280/ 173500 | consumed samples: 9287680 | consumed tokens: 19021168640 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.066347E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.143 | TFLOPs: 31.91 | +7: iteration 36290/ 173500 | consumed samples: 9290240 | consumed tokens: 19026411520 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.068142E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.887 | TFLOPs: 31.89 | +7: iteration 36300/ 173500 | consumed samples: 9292800 | consumed tokens: 19031654400 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.063483E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.036 | TFLOPs: 31.85 | +7: iteration 36310/ 173500 | consumed samples: 9295360 | consumed tokens: 19036897280 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.056640E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.879 | TFLOPs: 31.89 | +7: iteration 36320/ 173500 | consumed samples: 9297920 | consumed tokens: 19042140160 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.083787E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.637 | TFLOPs: 31.83 | +7: iteration 36330/ 173500 | consumed samples: 9300480 | consumed tokens: 19047383040 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.052543E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.804 | TFLOPs: 31.94 | +7: iteration 36340/ 173500 | consumed samples: 9303040 | consumed tokens: 19052625920 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.068003E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.098 | TFLOPs: 31.96 | +7: iteration 36350/ 173500 | consumed samples: 9305600 | consumed tokens: 19057868800 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.059719E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.757 | TFLOPs: 31.94 | +7: iteration 36360/ 173500 | consumed samples: 9308160 | consumed tokens: 19063111680 | elapsed time per iteration (s): 0.42 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.065703E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.715 | TFLOPs: 31.73 | +7: iteration 36370/ 173500 | consumed samples: 9310720 | consumed tokens: 19068354560 | elapsed time per iteration (s): 0.42 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.071777E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.250 | TFLOPs: 31.97 | +7: iteration 36380/ 173500 | consumed samples: 9313280 | consumed tokens: 19073597440 | elapsed time per iteration (s): 0.42 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.062461E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.301 | TFLOPs: 31.86 | +7: iteration 36390/ 173500 | consumed samples: 9315840 | consumed tokens: 19078840320 | elapsed time per iteration (s): 0.42 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.059538E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.940 | TFLOPs: 31.90 | +7: iteration 36400/ 173500 | consumed samples: 9318400 | consumed tokens: 19084083200 | elapsed time per iteration (s): 0.43 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.062050E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.388 | TFLOPs: 31.13 | +7: iteration 36410/ 173500 | consumed samples: 9320960 | consumed tokens: 19089326080 | elapsed time per iteration (s): 0.45 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.063710E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.317 | TFLOPs: 30.08 | +7: iteration 36420/ 173500 | consumed samples: 9323520 | consumed tokens: 19094568960 | elapsed time per iteration (s): 0.43 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.059597E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.085 | TFLOPs: 31.17 | +7: iteration 36430/ 173500 | consumed samples: 9326080 | consumed tokens: 19099811840 | elapsed time per iteration (s): 0.44 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.070416E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.746 | TFLOPs: 30.84 | +7: iteration 36440/ 173500 | consumed samples: 9328640 | consumed tokens: 19105054720 | elapsed time per iteration (s): 0.42 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.055326E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.255 | TFLOPs: 31.81 | +7: iteration 36450/ 173500 | consumed samples: 9331200 | consumed tokens: 19110297600 | elapsed time per iteration (s): 0.45 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.066568E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.157 | TFLOPs: 30.13 | +7: iteration 36460/ 173500 | consumed samples: 9333760 | consumed tokens: 19115540480 | elapsed time per iteration (s): 0.43 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.060197E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.098 | TFLOPs: 31.07 | +7: iteration 36470/ 173500 | consumed samples: 9336320 | consumed tokens: 19120783360 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.067537E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.318 | TFLOPs: 31.39 | +7: iteration 36480/ 173500 | consumed samples: 9338880 | consumed tokens: 19126026240 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.064716E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.796 | TFLOPs: 31.52 | +7: iteration 36490/ 173500 | consumed samples: 9341440 | consumed tokens: 19131269120 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.066243E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.523 | TFLOPs: 31.14 | +7: iteration 36500/ 173500 | consumed samples: 9344000 | consumed tokens: 19136512000 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.053299E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.994 | TFLOPs: 31.27 | +7: iteration 36510/ 173500 | consumed samples: 9346560 | consumed tokens: 19141754880 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.057338E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.465 | TFLOPs: 31.24 | +7: iteration 36520/ 173500 | consumed samples: 9349120 | consumed tokens: 19146997760 | elapsed time per iteration (s): 0.42 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.058583E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.204 | TFLOPs: 31.70 | +7: iteration 36530/ 173500 | consumed samples: 9351680 | consumed tokens: 19152240640 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.062446E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.660 | TFLOPs: 31.41 | +7: iteration 36540/ 173500 | consumed samples: 9354240 | consumed tokens: 19157483520 | elapsed time per iteration (s): 0.44 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.059503E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.316 | TFLOPs: 30.61 | +7: iteration 36550/ 173500 | consumed samples: 9356800 | consumed tokens: 19162726400 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.064424E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.627 | TFLOPs: 30.94 | +7: iteration 36560/ 173500 | consumed samples: 9359360 | consumed tokens: 19167969280 | elapsed time per iteration (s): 0.42 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.065158E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.028 | TFLOPs: 31.64 | +7: iteration 36570/ 173500 | consumed samples: 9361920 | consumed tokens: 19173212160 | elapsed time per iteration (s): 0.44 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.064998E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.131 | TFLOPs: 30.49 | +7: iteration 36580/ 173500 | consumed samples: 9364480 | consumed tokens: 19178455040 | elapsed time per iteration (s): 0.44 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.065820E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.156 | TFLOPs: 30.86 | +7: iteration 36590/ 173500 | consumed samples: 9367040 | consumed tokens: 19183697920 | elapsed time per iteration (s): 0.43 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.068505E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.624 | TFLOPs: 31.41 | +7: iteration 36600/ 173500 | consumed samples: 9369600 | consumed tokens: 19188940800 | elapsed time per iteration (s): 0.45 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.063844E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.017 | TFLOPs: 30.17 | +7: iteration 36610/ 173500 | consumed samples: 9372160 | consumed tokens: 19194183680 | elapsed time per iteration (s): 0.43 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.062399E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.292 | TFLOPs: 31.02 | +7: iteration 36620/ 173500 | consumed samples: 9374720 | consumed tokens: 19199426560 | elapsed time per iteration (s): 0.42 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.067678E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.890 | TFLOPs: 31.63 | +7: iteration 36630/ 173500 | consumed samples: 9377280 | consumed tokens: 19204669440 | elapsed time per iteration (s): 0.43 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.066568E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.443 | TFLOPs: 30.98 | +7: iteration 36640/ 173500 | consumed samples: 9379840 | consumed tokens: 19209912320 | elapsed time per iteration (s): 0.43 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.057106E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.349 | TFLOPs: 31.34 | +7: iteration 36650/ 173500 | consumed samples: 9382400 | consumed tokens: 19215155200 | elapsed time per iteration (s): 0.43 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.051656E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.583 | TFLOPs: 31.20 | +7: iteration 36660/ 173500 | consumed samples: 9384960 | consumed tokens: 19220398080 | elapsed time per iteration (s): 0.43 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.067555E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.358 | TFLOPs: 31.39 | +7: iteration 36670/ 173500 | consumed samples: 9387520 | consumed tokens: 19225640960 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.081152E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.069 | TFLOPs: 31.38 | +7: iteration 36680/ 173500 | consumed samples: 9390080 | consumed tokens: 19230883840 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.060173E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.667 | TFLOPs: 30.99 | +7: iteration 36690/ 173500 | consumed samples: 9392640 | consumed tokens: 19236126720 | elapsed time per iteration (s): 0.42 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.061325E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.932 | TFLOPs: 31.63 | +7: iteration 36700/ 173500 | consumed samples: 9395200 | consumed tokens: 19241369600 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.067748E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.358 | TFLOPs: 31.34 | +7: iteration 36710/ 173500 | consumed samples: 9397760 | consumed tokens: 19246612480 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.060762E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.434 | TFLOPs: 31.03 | +7: iteration 36720/ 173500 | consumed samples: 9400320 | consumed tokens: 19251855360 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.055444E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.163 | TFLOPs: 31.12 | +7: iteration 36730/ 173500 | consumed samples: 9402880 | consumed tokens: 19257098240 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.061614E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.986 | TFLOPs: 31.22 | +7: iteration 36740/ 173500 | consumed samples: 9405440 | consumed tokens: 19262341120 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.061700E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.822 | TFLOPs: 30.95 | +7: iteration 36750/ 173500 | consumed samples: 9408000 | consumed tokens: 19267584000 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.067436E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.253 | TFLOPs: 31.13 | +7: iteration 36760/ 173500 | consumed samples: 9410560 | consumed tokens: 19272826880 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.068371E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.497 | TFLOPs: 31.30 | +7: iteration 36770/ 173500 | consumed samples: 9413120 | consumed tokens: 19278069760 | elapsed time per iteration (s): 0.42 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.068937E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.830 | TFLOPs: 31.79 | +7: iteration 36780/ 173500 | consumed samples: 9415680 | consumed tokens: 19283312640 | elapsed time per iteration (s): 0.43 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.057695E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.802 | TFLOPs: 31.26 | +7: iteration 36790/ 173500 | consumed samples: 9418240 | consumed tokens: 19288555520 | elapsed time per iteration (s): 0.43 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.057080E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.082 | TFLOPs: 30.96 | +7: iteration 36800/ 173500 | consumed samples: 9420800 | consumed tokens: 19293798400 | elapsed time per iteration (s): 0.42 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.068208E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.048 | TFLOPs: 31.69 | +7: iteration 36810/ 173500 | consumed samples: 9423360 | consumed tokens: 19299041280 | elapsed time per iteration (s): 0.43 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.050803E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.497 | TFLOPs: 31.40 | +7: iteration 36820/ 173500 | consumed samples: 9425920 | consumed tokens: 19304284160 | elapsed time per iteration (s): 0.43 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.069357E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.239 | TFLOPs: 31.28 | +7: iteration 36830/ 173500 | consumed samples: 9428480 | consumed tokens: 19309527040 | elapsed time per iteration (s): 0.42 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.079889E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.583 | TFLOPs: 31.88 | +7: iteration 36840/ 173500 | consumed samples: 9431040 | consumed tokens: 19314769920 | elapsed time per iteration (s): 0.43 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.052310E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.622 | TFLOPs: 31.41 | +7: iteration 36850/ 173500 | consumed samples: 9433600 | consumed tokens: 19320012800 | elapsed time per iteration (s): 0.42 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.059551E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.543 | TFLOPs: 31.82 | +7: iteration 36860/ 173500 | consumed samples: 9436160 | consumed tokens: 19325255680 | elapsed time per iteration (s): 0.44 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.065790E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.478 | TFLOPs: 30.88 | +7: iteration 36870/ 173500 | consumed samples: 9438720 | consumed tokens: 19330498560 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.054744E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.406 | TFLOPs: 31.40 | +7: iteration 36880/ 173500 | consumed samples: 9441280 | consumed tokens: 19335741440 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.053169E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.077 | TFLOPs: 31.43 | +7: iteration 36890/ 173500 | consumed samples: 9443840 | consumed tokens: 19340984320 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.064848E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.941 | TFLOPs: 30.95 | +7: iteration 36900/ 173500 | consumed samples: 9446400 | consumed tokens: 19346227200 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.075246E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.902 | TFLOPs: 31.42 | +7: iteration 36910/ 173500 | consumed samples: 9448960 | consumed tokens: 19351470080 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.064509E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.479 | TFLOPs: 31.03 | +7: iteration 36920/ 173500 | consumed samples: 9451520 | consumed tokens: 19356712960 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.055364E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.291 | TFLOPs: 31.13 | +7: iteration 36930/ 173500 | consumed samples: 9454080 | consumed tokens: 19361955840 | elapsed time per iteration (s): 0.44 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.073740E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.835 | TFLOPs: 30.58 | +7: iteration 36940/ 173500 | consumed samples: 9456640 | consumed tokens: 19367198720 | elapsed time per iteration (s): 0.44 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.067083E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.963 | TFLOPs: 30.85 | +7: iteration 36950/ 173500 | consumed samples: 9459200 | consumed tokens: 19372441600 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.058533E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.843 | TFLOPs: 31.32 | +7: iteration 36960/ 173500 | consumed samples: 9461760 | consumed tokens: 19377684480 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.083391E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.946 | TFLOPs: 31.22 | +7: iteration 36970/ 173500 | consumed samples: 9464320 | consumed tokens: 19382927360 | elapsed time per iteration (s): 0.43 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.057423E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.875 | TFLOPs: 31.16 | +7: iteration 36980/ 173500 | consumed samples: 9466880 | consumed tokens: 19388170240 | elapsed time per iteration (s): 0.43 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.075380E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.781 | TFLOPs: 31.42 | +7: iteration 36990/ 173500 | consumed samples: 9469440 | consumed tokens: 19393413120 | elapsed time per iteration (s): 0.43 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.059644E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.686 | TFLOPs: 31.31 | +7: iteration 37000/ 173500 | consumed samples: 9472000 | consumed tokens: 19398656000 | elapsed time per iteration (s): 0.43 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.063161E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.771 | TFLOPs: 30.94 | +7: iteration 37010/ 173500 | consumed samples: 9474560 | consumed tokens: 19403898880 | elapsed time per iteration (s): 0.43 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.055255E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.289 | TFLOPs: 31.34 | +7: iteration 37020/ 173500 | consumed samples: 9477120 | consumed tokens: 19409141760 | elapsed time per iteration (s): 0.44 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.061075E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.249 | TFLOPs: 30.81 | +7: iteration 37030/ 173500 | consumed samples: 9479680 | consumed tokens: 19414384640 | elapsed time per iteration (s): 0.43 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.065561E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.405 | TFLOPs: 31.40 | +7: iteration 37040/ 173500 | consumed samples: 9482240 | consumed tokens: 19419627520 | elapsed time per iteration (s): 0.43 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.065269E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.037 | TFLOPs: 31.38 | +7: iteration 37050/ 173500 | consumed samples: 9484800 | consumed tokens: 19424870400 | elapsed time per iteration (s): 0.44 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.066508E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.716 | TFLOPs: 30.36 | +7: iteration 37060/ 173500 | consumed samples: 9487360 | consumed tokens: 19430113280 | elapsed time per iteration (s): 0.43 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.054326E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.531 | TFLOPs: 31.46 | +7: iteration 37070/ 173500 | consumed samples: 9489920 | consumed tokens: 19435356160 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.062653E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.977 | TFLOPs: 31.32 | +7: iteration 37080/ 173500 | consumed samples: 9492480 | consumed tokens: 19440599040 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.070389E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.478 | TFLOPs: 31.24 | +7: iteration 37090/ 173500 | consumed samples: 9495040 | consumed tokens: 19445841920 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.066742E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.421 | TFLOPs: 31.14 | +7: iteration 37100/ 173500 | consumed samples: 9497600 | consumed tokens: 19451084800 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.073306E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.547 | TFLOPs: 31.51 | +7: iteration 37110/ 173500 | consumed samples: 9500160 | consumed tokens: 19456327680 | elapsed time per iteration (s): 0.44 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.072193E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.270 | TFLOPs: 30.81 | +7: iteration 37120/ 173500 | consumed samples: 9502720 | consumed tokens: 19461570560 | elapsed time per iteration (s): 0.44 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.061492E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.782 | TFLOPs: 30.42 | +7: iteration 37130/ 173500 | consumed samples: 9505280 | consumed tokens: 19466813440 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.052478E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.486 | TFLOPs: 31.35 | +7: iteration 37140/ 173500 | consumed samples: 9507840 | consumed tokens: 19472056320 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.058166E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.096 | TFLOPs: 31.22 | +7: iteration 37150/ 173500 | consumed samples: 9510400 | consumed tokens: 19477299200 | elapsed time per iteration (s): 0.44 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.083718E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.450 | TFLOPs: 30.40 | +7: iteration 37160/ 173500 | consumed samples: 9512960 | consumed tokens: 19482542080 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.059274E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.390 | TFLOPs: 31.19 | +7: iteration 37170/ 173500 | consumed samples: 9515520 | consumed tokens: 19487784960 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.076000E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.031 | TFLOPs: 31.48 | +7: iteration 37180/ 173500 | consumed samples: 9518080 | consumed tokens: 19493027840 | elapsed time per iteration (s): 0.44 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.068515E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.172 | TFLOPs: 30.28 | +7: iteration 37190/ 173500 | consumed samples: 9520640 | consumed tokens: 19498270720 | elapsed time per iteration (s): 0.44 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.075813E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.790 | TFLOPs: 30.84 | +7: iteration 37200/ 173500 | consumed samples: 9523200 | consumed tokens: 19503513600 | elapsed time per iteration (s): 0.43 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.070084E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.248 | TFLOPs: 31.49 | +7: iteration 37210/ 173500 | consumed samples: 9525760 | consumed tokens: 19508756480 | elapsed time per iteration (s): 0.43 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.052695E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.445 | TFLOPs: 31.40 | +7: iteration 37220/ 173500 | consumed samples: 9528320 | consumed tokens: 19513999360 | elapsed time per iteration (s): 0.43 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.051370E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.509 | TFLOPs: 31.40 | +7: iteration 37230/ 173500 | consumed samples: 9530880 | consumed tokens: 19519242240 | elapsed time per iteration (s): 0.44 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.065091E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.823 | TFLOPs: 30.74 | +7: iteration 37240/ 173500 | consumed samples: 9533440 | consumed tokens: 19524485120 | elapsed time per iteration (s): 0.44 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.065467E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.205 | TFLOPs: 30.23 | +7: iteration 37250/ 173500 | consumed samples: 9536000 | consumed tokens: 19529728000 | elapsed time per iteration (s): 0.43 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.065917E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.366 | TFLOPs: 31.29 | +7: iteration 37260/ 173500 | consumed samples: 9538560 | consumed tokens: 19534970880 | elapsed time per iteration (s): 0.43 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.057449E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.211 | TFLOPs: 31.02 | +7: iteration 37270/ 173500 | consumed samples: 9541120 | consumed tokens: 19540213760 | elapsed time per iteration (s): 0.43 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.068008E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.132 | TFLOPs: 31.54 | +7: iteration 37280/ 173500 | consumed samples: 9543680 | consumed tokens: 19545456640 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.078780E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.956 | TFLOPs: 31.43 | +7: iteration 37290/ 173500 | consumed samples: 9546240 | consumed tokens: 19550699520 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.063709E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.373 | TFLOPs: 31.29 | +7: iteration 37300/ 173500 | consumed samples: 9548800 | consumed tokens: 19555942400 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.063469E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.291 | TFLOPs: 30.97 | +7: iteration 37310/ 173500 | consumed samples: 9551360 | consumed tokens: 19561185280 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.049110E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.352 | TFLOPs: 31.29 | +7: iteration 37320/ 173500 | consumed samples: 9553920 | consumed tokens: 19566428160 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.060519E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.837 | TFLOPs: 31.32 | +7: iteration 37330/ 173500 | consumed samples: 9556480 | consumed tokens: 19571671040 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.060816E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.050 | TFLOPs: 31.54 | +7: iteration 37340/ 173500 | consumed samples: 9559040 | consumed tokens: 19576913920 | elapsed time per iteration (s): 0.42 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.055832E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.211 | TFLOPs: 31.91 | +7: iteration 37350/ 173500 | consumed samples: 9561600 | consumed tokens: 19582156800 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.081350E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.746 | TFLOPs: 31.52 | +7: iteration 37360/ 173500 | consumed samples: 9564160 | consumed tokens: 19587399680 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.046669E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.003 | TFLOPs: 31.53 | +7: iteration 37370/ 173500 | consumed samples: 9566720 | consumed tokens: 19592642560 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.056770E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.751 | TFLOPs: 31.15 | +7: iteration 37380/ 173500 | consumed samples: 9569280 | consumed tokens: 19597885440 | elapsed time per iteration (s): 0.42 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.064770E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.156 | TFLOPs: 31.86 | +7: iteration 37390/ 173500 | consumed samples: 9571840 | consumed tokens: 19603128320 | elapsed time per iteration (s): 0.43 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.048809E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.506 | TFLOPs: 31.04 | +7: iteration 37400/ 173500 | consumed samples: 9574400 | consumed tokens: 19608371200 | elapsed time per iteration (s): 0.44 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.042331E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.703 | TFLOPs: 30.57 | +7: iteration 37410/ 173500 | consumed samples: 9576960 | consumed tokens: 19613614080 | elapsed time per iteration (s): 0.43 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.054039E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.907 | TFLOPs: 31.06 | +7: iteration 37420/ 173500 | consumed samples: 9579520 | consumed tokens: 19618856960 | elapsed time per iteration (s): 0.44 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.040462E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.870 | TFLOPs: 30.32 | +7: iteration 37430/ 173500 | consumed samples: 9582080 | consumed tokens: 19624099840 | elapsed time per iteration (s): 0.43 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.048459E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.411 | TFLOPs: 30.98 | +7: iteration 37440/ 173500 | consumed samples: 9584640 | consumed tokens: 19629342720 | elapsed time per iteration (s): 0.43 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.066514E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.786 | TFLOPs: 31.31 | +7: iteration 37450/ 173500 | consumed samples: 9587200 | consumed tokens: 19634585600 | elapsed time per iteration (s): 0.43 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.066080E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.913 | TFLOPs: 31.11 | +7: iteration 37460/ 173500 | consumed samples: 9589760 | consumed tokens: 19639828480 | elapsed time per iteration (s): 0.44 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.054301E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.166 | TFLOPs: 30.76 | +7: iteration 37470/ 173500 | consumed samples: 9592320 | consumed tokens: 19645071360 | elapsed time per iteration (s): 0.42 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.060948E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.490 | TFLOPs: 31.87 | +7: iteration 37480/ 173500 | consumed samples: 9594880 | consumed tokens: 19650314240 | elapsed time per iteration (s): 0.43 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.057071E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.367 | TFLOPs: 31.40 | +7: iteration 37490/ 173500 | consumed samples: 9597440 | consumed tokens: 19655557120 | elapsed time per iteration (s): 0.43 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.056328E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.089 | TFLOPs: 31.49 | +7: iteration 37500/ 173500 | consumed samples: 9600000 | consumed tokens: 19660800000 | elapsed time per iteration (s): 0.44 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.056370E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.821 | TFLOPs: 30.68 | +7: iteration 37510/ 173500 | consumed samples: 9602560 | consumed tokens: 19666042880 | elapsed time per iteration (s): 0.45 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.063468E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.196 | TFLOPs: 30.18 | +7: iteration 37520/ 173500 | consumed samples: 9605120 | consumed tokens: 19671285760 | elapsed time per iteration (s): 0.44 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.058116E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.324 | TFLOPs: 30.29 | +7: iteration 37530/ 173500 | consumed samples: 9607680 | consumed tokens: 19676528640 | elapsed time per iteration (s): 0.45 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.050609E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.152 | TFLOPs: 29.86 | +7: iteration 37540/ 173500 | consumed samples: 9610240 | consumed tokens: 19681771520 | elapsed time per iteration (s): 0.46 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.052308E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.056 | TFLOPs: 29.49 | +7: iteration 37550/ 173500 | consumed samples: 9612800 | consumed tokens: 19687014400 | elapsed time per iteration (s): 0.45 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.063740E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.147 | TFLOPs: 29.86 | +7: iteration 37560/ 173500 | consumed samples: 9615360 | consumed tokens: 19692257280 | elapsed time per iteration (s): 0.45 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.063387E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.081 | TFLOPs: 29.86 | +7: iteration 37570/ 173500 | consumed samples: 9617920 | consumed tokens: 19697500160 | elapsed time per iteration (s): 0.44 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.066346E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.735 | TFLOPs: 30.79 | +7: iteration 37580/ 173500 | consumed samples: 9620480 | consumed tokens: 19702743040 | elapsed time per iteration (s): 0.43 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.053468E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.177 | TFLOPs: 31.07 | +7: iteration 37590/ 173500 | consumed samples: 9623040 | consumed tokens: 19707985920 | elapsed time per iteration (s): 0.43 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.062629E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.066 | TFLOPs: 31.43 | +7: iteration 37600/ 173500 | consumed samples: 9625600 | consumed tokens: 19713228800 | elapsed time per iteration (s): 0.42 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.062275E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.980 | TFLOPs: 31.74 | +7: iteration 37610/ 173500 | consumed samples: 9628160 | consumed tokens: 19718471680 | elapsed time per iteration (s): 0.42 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.062837E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.960 | TFLOPs: 31.95 | +7: iteration 37620/ 173500 | consumed samples: 9630720 | consumed tokens: 19723714560 | elapsed time per iteration (s): 0.43 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.051143E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.058 | TFLOPs: 31.54 | +7: iteration 37630/ 173500 | consumed samples: 9633280 | consumed tokens: 19728957440 | elapsed time per iteration (s): 0.43 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.065475E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.836 | TFLOPs: 31.37 | +7: iteration 37640/ 173500 | consumed samples: 9635840 | consumed tokens: 19734200320 | elapsed time per iteration (s): 0.43 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.057635E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.372 | TFLOPs: 31.34 | +7: iteration 37650/ 173500 | consumed samples: 9638400 | consumed tokens: 19739443200 | elapsed time per iteration (s): 0.43 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.065776E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.346 | TFLOPs: 31.50 | +7: iteration 37660/ 173500 | consumed samples: 9640960 | consumed tokens: 19744686080 | elapsed time per iteration (s): 0.42 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.061511E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.878 | TFLOPs: 31.89 | +7: iteration 37670/ 173500 | consumed samples: 9643520 | consumed tokens: 19749928960 | elapsed time per iteration (s): 0.43 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.063460E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.569 | TFLOPs: 31.35 | +7: iteration 37680/ 173500 | consumed samples: 9646080 | consumed tokens: 19755171840 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.057478E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.308 | TFLOPs: 31.71 | +7: iteration 37690/ 173500 | consumed samples: 9648640 | consumed tokens: 19760414720 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.062454E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.187 | TFLOPs: 31.70 | +7: iteration 37700/ 173500 | consumed samples: 9651200 | consumed tokens: 19765657600 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.046512E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.338 | TFLOPs: 31.92 | +7: iteration 37710/ 173500 | consumed samples: 9653760 | consumed tokens: 19770900480 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.055122E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.284 | TFLOPs: 32.07 | +7: iteration 37720/ 173500 | consumed samples: 9656320 | consumed tokens: 19776143360 | elapsed time per iteration (s): 0.43 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.054542E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.990 | TFLOPs: 31.27 | +7: iteration 37730/ 173500 | consumed samples: 9658880 | consumed tokens: 19781386240 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.065705E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.763 | TFLOPs: 31.84 | +7: iteration 37740/ 173500 | consumed samples: 9661440 | consumed tokens: 19786629120 | elapsed time per iteration (s): 0.43 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.051316E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.161 | TFLOPs: 31.17 | +7: iteration 37750/ 173500 | consumed samples: 9664000 | consumed tokens: 19791872000 | elapsed time per iteration (s): 0.43 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.038726E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.999 | TFLOPs: 31.53 | +7: iteration 37760/ 173500 | consumed samples: 9666560 | consumed tokens: 19797114880 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.058334E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.865 | TFLOPs: 31.84 | +7: iteration 37770/ 173500 | consumed samples: 9669120 | consumed tokens: 19802357760 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.049107E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.465 | TFLOPs: 31.82 | +7: iteration 37780/ 173500 | consumed samples: 9671680 | consumed tokens: 19807600640 | elapsed time per iteration (s): 0.42 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.051269E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.168 | TFLOPs: 32.01 | +7: iteration 37790/ 173500 | consumed samples: 9674240 | consumed tokens: 19812843520 | elapsed time per iteration (s): 0.43 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.049369E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.984 | TFLOPs: 31.43 | +7: iteration 37800/ 173500 | consumed samples: 9676800 | consumed tokens: 19818086400 | elapsed time per iteration (s): 0.43 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.060951E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.922 | TFLOPs: 31.48 | +7: iteration 37810/ 173500 | consumed samples: 9679360 | consumed tokens: 19823329280 | elapsed time per iteration (s): 0.42 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.062891E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.587 | TFLOPs: 31.77 | +7: iteration 37820/ 173500 | consumed samples: 9681920 | consumed tokens: 19828572160 | elapsed time per iteration (s): 0.43 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.050041E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.792 | TFLOPs: 31.00 | +7: iteration 37830/ 173500 | consumed samples: 9684480 | consumed tokens: 19833815040 | elapsed time per iteration (s): 0.43 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.058772E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.922 | TFLOPs: 31.48 | +7: iteration 37840/ 173500 | consumed samples: 9687040 | consumed tokens: 19839057920 | elapsed time per iteration (s): 0.43 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.064434E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.082 | TFLOPs: 31.43 | +7: iteration 37850/ 173500 | consumed samples: 9689600 | consumed tokens: 19844300800 | elapsed time per iteration (s): 0.43 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.070495E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.413 | TFLOPs: 31.45 | +7: iteration 37860/ 173500 | consumed samples: 9692160 | consumed tokens: 19849543680 | elapsed time per iteration (s): 0.43 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.076591E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.786 | TFLOPs: 31.57 | +7: iteration 37870/ 173500 | consumed samples: 9694720 | consumed tokens: 19854786560 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.074368E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.199 | TFLOPs: 31.75 | +7: iteration 37880/ 173500 | consumed samples: 9697280 | consumed tokens: 19860029440 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.059883E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.455 | TFLOPs: 31.66 | +7: iteration 37890/ 173500 | consumed samples: 9699840 | consumed tokens: 19865272320 | elapsed time per iteration (s): 0.43 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.068970E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.961 | TFLOPs: 31.37 | +7: iteration 37900/ 173500 | consumed samples: 9702400 | consumed tokens: 19870515200 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.063878E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.183 | TFLOPs: 31.70 | +7: iteration 37910/ 173500 | consumed samples: 9704960 | consumed tokens: 19875758080 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.066187E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.812 | TFLOPs: 32.05 | +7: iteration 37920/ 173500 | consumed samples: 9707520 | consumed tokens: 19881000960 | elapsed time per iteration (s): 0.43 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.057208E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.792 | TFLOPs: 31.52 | +7: iteration 37930/ 173500 | consumed samples: 9710080 | consumed tokens: 19886243840 | elapsed time per iteration (s): 0.43 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.061264E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.085 | TFLOPs: 31.54 | +7: iteration 37940/ 173500 | consumed samples: 9712640 | consumed tokens: 19891486720 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.053983E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.817 | TFLOPs: 31.84 | +7: iteration 37950/ 173500 | consumed samples: 9715200 | consumed tokens: 19896729600 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.064761E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.791 | TFLOPs: 32.05 | +7: iteration 37960/ 173500 | consumed samples: 9717760 | consumed tokens: 19901972480 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.058300E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.160 | TFLOPs: 31.75 | +7: iteration 37970/ 173500 | consumed samples: 9720320 | consumed tokens: 19907215360 | elapsed time per iteration (s): 0.43 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.060423E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.115 | TFLOPs: 31.43 | +7: iteration 37980/ 173500 | consumed samples: 9722880 | consumed tokens: 19912458240 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.058055E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.123 | TFLOPs: 31.91 | +7: iteration 37990/ 173500 | consumed samples: 9725440 | consumed tokens: 19917701120 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.054585E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.774 | TFLOPs: 31.84 | +0: [2023-03-17 03:42:06,260] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=0, lr=[0.00018091754328052937, 0.00018091754328052937, 0.00018091754328052937], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 38000/ 173500 | consumed samples: 9728000 | consumed tokens: 19922944000 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.061432E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.559 | TFLOPs: 31.62 | +0: steps: 38000 loss: 3.0648 iter time (s): 0.426 samples/sec: 601.255 +7: iteration 38010/ 173500 | consumed samples: 9730560 | consumed tokens: 19928186880 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.052611E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.430 | TFLOPs: 31.61 | +7: iteration 38020/ 173500 | consumed samples: 9733120 | consumed tokens: 19933429760 | elapsed time per iteration (s): 0.43 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.066887E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.199 | TFLOPs: 31.07 | +7: iteration 38030/ 173500 | consumed samples: 9735680 | consumed tokens: 19938672640 | elapsed time per iteration (s): 0.43 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.052884E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.216 | TFLOPs: 31.13 | +7: iteration 38040/ 173500 | consumed samples: 9738240 | consumed tokens: 19943915520 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.057279E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.522 | TFLOPs: 31.67 | +7: iteration 38050/ 173500 | consumed samples: 9740800 | consumed tokens: 19949158400 | elapsed time per iteration (s): 0.43 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.062773E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.337 | TFLOPs: 31.60 | +7: iteration 38060/ 173500 | consumed samples: 9743360 | consumed tokens: 19954401280 | elapsed time per iteration (s): 0.43 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.058936E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.325 | TFLOPs: 31.50 | +7: iteration 38070/ 173500 | consumed samples: 9745920 | consumed tokens: 19959644160 | elapsed time per iteration (s): 0.43 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.057241E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.466 | TFLOPs: 31.14 | +7: iteration 38080/ 173500 | consumed samples: 9748480 | consumed tokens: 19964887040 | elapsed time per iteration (s): 0.43 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.051721E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.206 | TFLOPs: 31.60 | +7: iteration 38090/ 173500 | consumed samples: 9751040 | consumed tokens: 19970129920 | elapsed time per iteration (s): 0.42 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.073661E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.119 | TFLOPs: 31.85 | +7: iteration 38100/ 173500 | consumed samples: 9753600 | consumed tokens: 19975372800 | elapsed time per iteration (s): 0.43 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.058748E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.801 | TFLOPs: 31.58 | +7: iteration 38110/ 173500 | consumed samples: 9756160 | consumed tokens: 19980615680 | elapsed time per iteration (s): 0.42 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.065301E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.668 | TFLOPs: 31.78 | +7: iteration 38120/ 173500 | consumed samples: 9758720 | consumed tokens: 19985858560 | elapsed time per iteration (s): 0.42 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.074509E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.314 | TFLOPs: 31.76 | +7: iteration 38130/ 173500 | consumed samples: 9761280 | consumed tokens: 19991101440 | elapsed time per iteration (s): 0.44 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.052190E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.836 | TFLOPs: 30.79 | +7: iteration 38140/ 173500 | consumed samples: 9763840 | consumed tokens: 19996344320 | elapsed time per iteration (s): 0.43 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.056601E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.360 | TFLOPs: 31.34 | +7: iteration 38150/ 173500 | consumed samples: 9766400 | consumed tokens: 20001587200 | elapsed time per iteration (s): 0.43 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.055941E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.736 | TFLOPs: 31.20 | +7: iteration 38160/ 173500 | consumed samples: 9768960 | consumed tokens: 20006830080 | elapsed time per iteration (s): 0.42 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.058585E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.479 | TFLOPs: 31.87 | +7: iteration 38170/ 173500 | consumed samples: 9771520 | consumed tokens: 20012072960 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.042056E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.082 | TFLOPs: 31.91 | +7: iteration 38180/ 173500 | consumed samples: 9774080 | consumed tokens: 20017315840 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.060986E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.967 | TFLOPs: 31.90 | +7: iteration 38190/ 173500 | consumed samples: 9776640 | consumed tokens: 20022558720 | elapsed time per iteration (s): 0.43 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.049990E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.137 | TFLOPs: 31.54 | +7: iteration 38200/ 173500 | consumed samples: 9779200 | consumed tokens: 20027801600 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.055180E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.222 | TFLOPs: 31.96 | +7: iteration 38210/ 173500 | consumed samples: 9781760 | consumed tokens: 20033044480 | elapsed time per iteration (s): 0.43 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.058231E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.729 | TFLOPs: 31.41 | +7: iteration 38220/ 173500 | consumed samples: 9784320 | consumed tokens: 20038287360 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.058885E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.463 | TFLOPs: 31.66 | +7: iteration 38230/ 173500 | consumed samples: 9786880 | consumed tokens: 20043530240 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.055165E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.392 | TFLOPs: 32.03 | +7: iteration 38240/ 173500 | consumed samples: 9789440 | consumed tokens: 20048773120 | elapsed time per iteration (s): 0.43 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.052048E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.338 | TFLOPs: 31.45 | +7: iteration 38250/ 173500 | consumed samples: 9792000 | consumed tokens: 20054016000 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.058411E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.140 | TFLOPs: 31.75 | +7: iteration 38260/ 173500 | consumed samples: 9794560 | consumed tokens: 20059258880 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.055273E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.619 | TFLOPs: 31.78 | +7: iteration 38270/ 173500 | consumed samples: 9797120 | consumed tokens: 20064501760 | elapsed time per iteration (s): 0.42 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.040353E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.206 | TFLOPs: 31.81 | +7: iteration 38280/ 173500 | consumed samples: 9799680 | consumed tokens: 20069744640 | elapsed time per iteration (s): 0.42 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.043379E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.597 | TFLOPs: 31.77 | +7: iteration 38290/ 173500 | consumed samples: 9802240 | consumed tokens: 20074987520 | elapsed time per iteration (s): 0.43 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.065621E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.166 | TFLOPs: 31.33 | +7: iteration 38300/ 173500 | consumed samples: 9804800 | consumed tokens: 20080230400 | elapsed time per iteration (s): 0.44 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.042681E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.153 | TFLOPs: 30.75 | +7: iteration 38310/ 173500 | consumed samples: 9807360 | consumed tokens: 20085473280 | elapsed time per iteration (s): 0.46 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.073188E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.205 | TFLOPs: 29.34 | +7: iteration 38320/ 173500 | consumed samples: 9809920 | consumed tokens: 20090716160 | elapsed time per iteration (s): 0.43 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.048955E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.459 | TFLOPs: 31.45 | +7: iteration 38330/ 173500 | consumed samples: 9812480 | consumed tokens: 20095959040 | elapsed time per iteration (s): 0.43 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.057552E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.207 | TFLOPs: 31.60 | +7: iteration 38340/ 173500 | consumed samples: 9815040 | consumed tokens: 20101201920 | elapsed time per iteration (s): 0.43 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.052508E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.586 | TFLOPs: 30.99 | +7: iteration 38350/ 173500 | consumed samples: 9817600 | consumed tokens: 20106444800 | elapsed time per iteration (s): 0.42 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.065851E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.081 | TFLOPs: 31.91 | +7: iteration 38360/ 173500 | consumed samples: 9820160 | consumed tokens: 20111687680 | elapsed time per iteration (s): 0.43 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.058067E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.781 | TFLOPs: 30.94 | +7: iteration 38370/ 173500 | consumed samples: 9822720 | consumed tokens: 20116930560 | elapsed time per iteration (s): 0.44 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.042648E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.520 | TFLOPs: 30.83 | +7: iteration 38380/ 173500 | consumed samples: 9825280 | consumed tokens: 20122173440 | elapsed time per iteration (s): 0.43 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.053709E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.626 | TFLOPs: 31.36 | +7: iteration 38390/ 173500 | consumed samples: 9827840 | consumed tokens: 20127416320 | elapsed time per iteration (s): 0.43 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.050038E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.490 | TFLOPs: 31.14 | +7: iteration 38400/ 173500 | consumed samples: 9830400 | consumed tokens: 20132659200 | elapsed time per iteration (s): 0.43 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.068662E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.632 | TFLOPs: 31.46 | +7: iteration 38410/ 173500 | consumed samples: 9832960 | consumed tokens: 20137902080 | elapsed time per iteration (s): 0.42 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.065916E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.668 | TFLOPs: 31.67 | +7: iteration 38420/ 173500 | consumed samples: 9835520 | consumed tokens: 20143144960 | elapsed time per iteration (s): 0.43 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.054506E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.678 | TFLOPs: 31.31 | +7: iteration 38430/ 173500 | consumed samples: 9838080 | consumed tokens: 20148387840 | elapsed time per iteration (s): 0.42 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.046300E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.778 | TFLOPs: 31.73 | +7: iteration 38440/ 173500 | consumed samples: 9840640 | consumed tokens: 20153630720 | elapsed time per iteration (s): 0.42 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.046653E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.557 | TFLOPs: 31.77 | +7: iteration 38450/ 173500 | consumed samples: 9843200 | consumed tokens: 20158873600 | elapsed time per iteration (s): 0.43 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.067102E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.062 | TFLOPs: 31.54 | +7: iteration 38460/ 173500 | consumed samples: 9845760 | consumed tokens: 20164116480 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.059876E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.612 | TFLOPs: 31.88 | +7: iteration 38470/ 173500 | consumed samples: 9848320 | consumed tokens: 20169359360 | elapsed time per iteration (s): 0.43 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.050803E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.202 | TFLOPs: 31.23 | +7: iteration 38480/ 173500 | consumed samples: 9850880 | consumed tokens: 20174602240 | elapsed time per iteration (s): 0.43 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.052409E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.836 | TFLOPs: 31.26 | +7: iteration 38490/ 173500 | consumed samples: 9853440 | consumed tokens: 20179845120 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.066685E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.349 | TFLOPs: 31.66 | +7: iteration 38500/ 173500 | consumed samples: 9856000 | consumed tokens: 20185088000 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.065858E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.903 | TFLOPs: 31.74 | +7: iteration 38510/ 173500 | consumed samples: 9858560 | consumed tokens: 20190330880 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.058574E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.958 | TFLOPs: 32.06 | +7: iteration 38520/ 173500 | consumed samples: 9861120 | consumed tokens: 20195573760 | elapsed time per iteration (s): 0.43 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.069706E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.003 | TFLOPs: 31.48 | +7: iteration 38530/ 173500 | consumed samples: 9863680 | consumed tokens: 20200816640 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.055919E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.796 | TFLOPs: 31.63 | +7: iteration 38540/ 173500 | consumed samples: 9866240 | consumed tokens: 20206059520 | elapsed time per iteration (s): 0.43 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.051956E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.105 | TFLOPs: 31.43 | +7: iteration 38550/ 173500 | consumed samples: 9868800 | consumed tokens: 20211302400 | elapsed time per iteration (s): 0.43 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.038837E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.341 | TFLOPs: 31.08 | +7: iteration 38560/ 173500 | consumed samples: 9871360 | consumed tokens: 20216545280 | elapsed time per iteration (s): 0.43 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.042831E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.152 | TFLOPs: 31.23 | +7: iteration 38570/ 173500 | consumed samples: 9873920 | consumed tokens: 20221788160 | elapsed time per iteration (s): 0.44 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.060752E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.292 | TFLOPs: 30.60 | +7: iteration 38580/ 173500 | consumed samples: 9876480 | consumed tokens: 20227031040 | elapsed time per iteration (s): 0.42 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.048363E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.015 | TFLOPs: 31.90 | +7: iteration 38590/ 173500 | consumed samples: 9879040 | consumed tokens: 20232273920 | elapsed time per iteration (s): 0.44 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.045074E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.422 | TFLOPs: 30.82 | +7: iteration 38600/ 173500 | consumed samples: 9881600 | consumed tokens: 20237516800 | elapsed time per iteration (s): 0.42 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.070304E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.939 | TFLOPs: 31.95 | +7: iteration 38610/ 173500 | consumed samples: 9884160 | consumed tokens: 20242759680 | elapsed time per iteration (s): 0.43 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.044367E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.265 | TFLOPs: 31.44 | +7: iteration 38620/ 173500 | consumed samples: 9886720 | consumed tokens: 20248002560 | elapsed time per iteration (s): 0.43 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.053366E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.745 | TFLOPs: 31.47 | +7: iteration 38630/ 173500 | consumed samples: 9889280 | consumed tokens: 20253245440 | elapsed time per iteration (s): 0.43 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.066314E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.349 | TFLOPs: 31.55 | +7: iteration 38640/ 173500 | consumed samples: 9891840 | consumed tokens: 20258488320 | elapsed time per iteration (s): 0.42 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.057189E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.863 | TFLOPs: 31.68 | +7: iteration 38650/ 173500 | consumed samples: 9894400 | consumed tokens: 20263731200 | elapsed time per iteration (s): 0.43 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.061027E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.119 | TFLOPs: 31.59 | +7: iteration 38660/ 173500 | consumed samples: 9896960 | consumed tokens: 20268974080 | elapsed time per iteration (s): 0.43 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.055438E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.181 | TFLOPs: 31.60 | +7: iteration 38670/ 173500 | consumed samples: 9899520 | consumed tokens: 20274216960 | elapsed time per iteration (s): 0.43 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.065195E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.821 | TFLOPs: 31.16 | +7: iteration 38680/ 173500 | consumed samples: 9902080 | consumed tokens: 20279459840 | elapsed time per iteration (s): 0.42 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.043562E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.899 | TFLOPs: 31.90 | +7: iteration 38690/ 173500 | consumed samples: 9904640 | consumed tokens: 20284702720 | elapsed time per iteration (s): 0.44 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.067965E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.077 | TFLOPs: 30.70 | +7: iteration 38700/ 173500 | consumed samples: 9907200 | consumed tokens: 20289945600 | elapsed time per iteration (s): 0.43 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.068495E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.436 | TFLOPs: 31.45 | +7: iteration 38710/ 173500 | consumed samples: 9909760 | consumed tokens: 20295188480 | elapsed time per iteration (s): 0.42 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.062034E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.112 | TFLOPs: 31.85 | +7: iteration 38720/ 173500 | consumed samples: 9912320 | consumed tokens: 20300431360 | elapsed time per iteration (s): 0.43 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.053666E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.073 | TFLOPs: 31.33 | +7: iteration 38730/ 173500 | consumed samples: 9914880 | consumed tokens: 20305674240 | elapsed time per iteration (s): 0.44 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.055113E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.641 | TFLOPs: 30.57 | +7: iteration 38740/ 173500 | consumed samples: 9917440 | consumed tokens: 20310917120 | elapsed time per iteration (s): 0.43 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.054861E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.321 | TFLOPs: 31.55 | +7: iteration 38750/ 173500 | consumed samples: 9920000 | consumed tokens: 20316160000 | elapsed time per iteration (s): 0.44 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.054850E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.935 | TFLOPs: 30.85 | +7: iteration 38760/ 173500 | consumed samples: 9922560 | consumed tokens: 20321402880 | elapsed time per iteration (s): 0.42 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.055861E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.209 | TFLOPs: 31.81 | +7: iteration 38770/ 173500 | consumed samples: 9925120 | consumed tokens: 20326645760 | elapsed time per iteration (s): 0.43 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.060657E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.960 | TFLOPs: 31.43 | +7: iteration 38780/ 173500 | consumed samples: 9927680 | consumed tokens: 20331888640 | elapsed time per iteration (s): 0.43 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.068195E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.564 | TFLOPs: 30.99 | +7: iteration 38790/ 173500 | consumed samples: 9930240 | consumed tokens: 20337131520 | elapsed time per iteration (s): 0.42 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.066267E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.095 | TFLOPs: 31.70 | +7: iteration 38800/ 173500 | consumed samples: 9932800 | consumed tokens: 20342374400 | elapsed time per iteration (s): 0.43 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.040047E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.290 | TFLOPs: 31.13 | +7: iteration 38810/ 173500 | consumed samples: 9935360 | consumed tokens: 20347617280 | elapsed time per iteration (s): 0.43 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.046461E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.745 | TFLOPs: 31.57 | +7: iteration 38820/ 173500 | consumed samples: 9937920 | consumed tokens: 20352860160 | elapsed time per iteration (s): 0.43 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.057123E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.909 | TFLOPs: 31.58 | +7: iteration 38830/ 173500 | consumed samples: 9940480 | consumed tokens: 20358103040 | elapsed time per iteration (s): 0.42 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.051315E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.370 | TFLOPs: 31.66 | +7: iteration 38840/ 173500 | consumed samples: 9943040 | consumed tokens: 20363345920 | elapsed time per iteration (s): 0.42 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.167659E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.871 | TFLOPs: 31.79 | +7: iteration 38850/ 173500 | consumed samples: 9945600 | consumed tokens: 20368588800 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.055268E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.537 | TFLOPs: 32.03 | +7: iteration 38860/ 173500 | consumed samples: 9948160 | consumed tokens: 20373831680 | elapsed time per iteration (s): 0.43 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.079273E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.883 | TFLOPs: 31.42 | +7: iteration 38870/ 173500 | consumed samples: 9950720 | consumed tokens: 20379074560 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.083746E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.095 | TFLOPs: 32.01 | +7: iteration 38880/ 173500 | consumed samples: 9953280 | consumed tokens: 20384317440 | elapsed time per iteration (s): 0.43 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.073256E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.202 | TFLOPs: 31.54 | +7: iteration 38890/ 173500 | consumed samples: 9955840 | consumed tokens: 20389560320 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.066936E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.577 | TFLOPs: 31.93 | +7: iteration 38900/ 173500 | consumed samples: 9958400 | consumed tokens: 20394803200 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.057701E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.081 | TFLOPs: 32.01 | +7: iteration 38910/ 173500 | consumed samples: 9960960 | consumed tokens: 20400046080 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.062218E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.574 | TFLOPs: 31.72 | +7: iteration 38920/ 173500 | consumed samples: 9963520 | consumed tokens: 20405288960 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.048916E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.168 | TFLOPs: 31.70 | +7: iteration 38930/ 173500 | consumed samples: 9966080 | consumed tokens: 20410531840 | elapsed time per iteration (s): 0.43 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.065863E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.995 | TFLOPs: 31.27 | +7: iteration 38940/ 173500 | consumed samples: 9968640 | consumed tokens: 20415774720 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.075274E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.686 | TFLOPs: 31.73 | +7: iteration 38950/ 173500 | consumed samples: 9971200 | consumed tokens: 20421017600 | elapsed time per iteration (s): 0.43 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.053879E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.222 | TFLOPs: 31.49 | +7: iteration 38960/ 173500 | consumed samples: 9973760 | consumed tokens: 20426260480 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.066867E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.564 | TFLOPs: 31.93 | +7: iteration 38970/ 173500 | consumed samples: 9976320 | consumed tokens: 20431503360 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.046002E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.558 | TFLOPs: 31.77 | +7: iteration 38980/ 173500 | consumed samples: 9978880 | consumed tokens: 20436746240 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.058563E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.572 | TFLOPs: 31.77 | +7: iteration 38990/ 173500 | consumed samples: 9981440 | consumed tokens: 20441989120 | elapsed time per iteration (s): 0.43 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.035224E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.985 | TFLOPs: 31.59 | +7: iteration 39000/ 173500 | consumed samples: 9984000 | consumed tokens: 20447232000 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.065340E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.375 | TFLOPs: 31.66 | +7: iteration 39010/ 173500 | consumed samples: 9986560 | consumed tokens: 20452474880 | elapsed time per iteration (s): 0.43 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.041547E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.211 | TFLOPs: 31.39 | +7: iteration 39020/ 173500 | consumed samples: 9989120 | consumed tokens: 20457717760 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.054468E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.779 | TFLOPs: 31.68 | +7: iteration 39030/ 173500 | consumed samples: 9991680 | consumed tokens: 20462960640 | elapsed time per iteration (s): 0.43 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.055614E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.716 | TFLOPs: 31.52 | +7: iteration 39040/ 173500 | consumed samples: 9994240 | consumed tokens: 20468203520 | elapsed time per iteration (s): 0.43 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.068602E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.202 | TFLOPs: 31.60 | +7: iteration 39050/ 173500 | consumed samples: 9996800 | consumed tokens: 20473446400 | elapsed time per iteration (s): 0.43 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.062916E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.767 | TFLOPs: 31.52 | +7: iteration 39060/ 173500 | consumed samples: 9999360 | consumed tokens: 20478689280 | elapsed time per iteration (s): 0.42 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.063990E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.155 | TFLOPs: 31.75 | +7: iteration 39070/ 173500 | consumed samples: 10001920 | consumed tokens: 20483932160 | elapsed time per iteration (s): 0.42 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.052660E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.412 | TFLOPs: 31.76 | +7: iteration 39080/ 173500 | consumed samples: 10004480 | consumed tokens: 20489175040 | elapsed time per iteration (s): 0.44 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.039218E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.328 | TFLOPs: 30.61 | +7: iteration 39090/ 173500 | consumed samples: 10007040 | consumed tokens: 20494417920 | elapsed time per iteration (s): 0.43 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.053407E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.426 | TFLOPs: 31.40 | +7: iteration 39100/ 173500 | consumed samples: 10009600 | consumed tokens: 20499660800 | elapsed time per iteration (s): 0.43 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.052003E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.302 | TFLOPs: 31.34 | +7: iteration 39110/ 173500 | consumed samples: 10012160 | consumed tokens: 20504903680 | elapsed time per iteration (s): 0.43 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.052103E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.303 | TFLOPs: 31.23 | +7: iteration 39120/ 173500 | consumed samples: 10014720 | consumed tokens: 20510146560 | elapsed time per iteration (s): 0.43 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.045254E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.705 | TFLOPs: 31.10 | +7: iteration 39130/ 173500 | consumed samples: 10017280 | consumed tokens: 20515389440 | elapsed time per iteration (s): 0.43 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.062652E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.515 | TFLOPs: 31.19 | +7: iteration 39140/ 173500 | consumed samples: 10019840 | consumed tokens: 20520632320 | elapsed time per iteration (s): 0.42 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.035557E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.487 | TFLOPs: 31.61 | +7: iteration 39150/ 173500 | consumed samples: 10022400 | consumed tokens: 20525875200 | elapsed time per iteration (s): 0.42 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.044989E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.198 | TFLOPs: 31.65 | +7: iteration 39160/ 173500 | consumed samples: 10024960 | consumed tokens: 20531118080 | elapsed time per iteration (s): 0.42 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.067962E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.271 | TFLOPs: 32.02 | +7: iteration 39170/ 173500 | consumed samples: 10027520 | consumed tokens: 20536360960 | elapsed time per iteration (s): 0.43 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.070372E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.317 | TFLOPs: 31.55 | +7: iteration 39180/ 173500 | consumed samples: 10030080 | consumed tokens: 20541603840 | elapsed time per iteration (s): 0.43 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.064751E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.812 | TFLOPs: 31.05 | +7: iteration 39190/ 173500 | consumed samples: 10032640 | consumed tokens: 20546846720 | elapsed time per iteration (s): 0.43 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.074328E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.356 | TFLOPs: 31.39 | +7: iteration 39200/ 173500 | consumed samples: 10035200 | consumed tokens: 20552089600 | elapsed time per iteration (s): 0.42 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.056215E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.064 | TFLOPs: 32.06 | +7: iteration 39210/ 173500 | consumed samples: 10037760 | consumed tokens: 20557332480 | elapsed time per iteration (s): 0.43 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.043735E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.711 | TFLOPs: 31.36 | +7: iteration 39220/ 173500 | consumed samples: 10040320 | consumed tokens: 20562575360 | elapsed time per iteration (s): 0.42 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.053914E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.946 | TFLOPs: 32.06 | +7: iteration 39230/ 173500 | consumed samples: 10042880 | consumed tokens: 20567818240 | elapsed time per iteration (s): 0.42 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.064204E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.663 | TFLOPs: 31.73 | +7: iteration 39240/ 173500 | consumed samples: 10045440 | consumed tokens: 20573061120 | elapsed time per iteration (s): 0.43 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.052339E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.523 | TFLOPs: 31.46 | +7: iteration 39250/ 173500 | consumed samples: 10048000 | consumed tokens: 20578304000 | elapsed time per iteration (s): 0.44 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.055667E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.466 | TFLOPs: 30.88 | +7: iteration 39260/ 173500 | consumed samples: 10050560 | consumed tokens: 20583546880 | elapsed time per iteration (s): 0.43 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.056566E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.935 | TFLOPs: 31.22 | +7: iteration 39270/ 173500 | consumed samples: 10053120 | consumed tokens: 20588789760 | elapsed time per iteration (s): 0.43 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.058334E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.304 | TFLOPs: 31.23 | +7: iteration 39280/ 173500 | consumed samples: 10055680 | consumed tokens: 20594032640 | elapsed time per iteration (s): 0.42 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.054121E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.592 | TFLOPs: 31.88 | +7: iteration 39290/ 173500 | consumed samples: 10058240 | consumed tokens: 20599275520 | elapsed time per iteration (s): 0.42 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.058530E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.679 | TFLOPs: 31.73 | +7: iteration 39300/ 173500 | consumed samples: 10060800 | consumed tokens: 20604518400 | elapsed time per iteration (s): 0.43 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.051991E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.615 | TFLOPs: 31.41 | +7: iteration 39310/ 173500 | consumed samples: 10063360 | consumed tokens: 20609761280 | elapsed time per iteration (s): 0.42 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.058522E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.828 | TFLOPs: 32.05 | +7: iteration 39320/ 173500 | consumed samples: 10065920 | consumed tokens: 20615004160 | elapsed time per iteration (s): 0.42 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.058073E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.828 | TFLOPs: 31.73 | +7: iteration 39330/ 173500 | consumed samples: 10068480 | consumed tokens: 20620247040 | elapsed time per iteration (s): 0.42 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.058098E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.685 | TFLOPs: 31.62 | +7: iteration 39340/ 173500 | consumed samples: 10071040 | consumed tokens: 20625489920 | elapsed time per iteration (s): 0.43 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.056406E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.744 | TFLOPs: 31.52 | +7: iteration 39350/ 173500 | consumed samples: 10073600 | consumed tokens: 20630732800 | elapsed time per iteration (s): 0.43 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.053460E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.204 | TFLOPs: 31.39 | +7: iteration 39360/ 173500 | consumed samples: 10076160 | consumed tokens: 20635975680 | elapsed time per iteration (s): 0.43 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.041319E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.812 | TFLOPs: 31.37 | +7: iteration 39370/ 173500 | consumed samples: 10078720 | consumed tokens: 20641218560 | elapsed time per iteration (s): 0.43 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.056118E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.019 | TFLOPs: 31.59 | +7: iteration 39380/ 173500 | consumed samples: 10081280 | consumed tokens: 20646461440 | elapsed time per iteration (s): 0.43 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.050313E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.646 | TFLOPs: 31.20 | +7: iteration 39390/ 173500 | consumed samples: 10083840 | consumed tokens: 20651704320 | elapsed time per iteration (s): 0.42 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.040587E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.578 | TFLOPs: 31.67 | +7: iteration 39400/ 173500 | consumed samples: 10086400 | consumed tokens: 20656947200 | elapsed time per iteration (s): 0.42 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.047275E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.675 | TFLOPs: 31.94 | +7: iteration 39410/ 173500 | consumed samples: 10088960 | consumed tokens: 20662190080 | elapsed time per iteration (s): 0.43 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.052576E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.560 | TFLOPs: 31.30 | +7: iteration 39420/ 173500 | consumed samples: 10091520 | consumed tokens: 20667432960 | elapsed time per iteration (s): 0.42 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.063211E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.273 | TFLOPs: 31.71 | +7: iteration 39430/ 173500 | consumed samples: 10094080 | consumed tokens: 20672675840 | elapsed time per iteration (s): 0.44 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.053765E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.105 | TFLOPs: 30.49 | +7: iteration 39440/ 173500 | consumed samples: 10096640 | consumed tokens: 20677918720 | elapsed time per iteration (s): 0.43 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.057051E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.318 | TFLOPs: 31.50 | +7: iteration 39450/ 173500 | consumed samples: 10099200 | consumed tokens: 20683161600 | elapsed time per iteration (s): 0.42 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.040043E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.867 | TFLOPs: 31.79 | +7: iteration 39460/ 173500 | consumed samples: 10101760 | consumed tokens: 20688404480 | elapsed time per iteration (s): 0.43 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.048693E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.280 | TFLOPs: 31.39 | +7: iteration 39470/ 173500 | consumed samples: 10104320 | consumed tokens: 20693647360 | elapsed time per iteration (s): 0.43 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.043638E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.151 | TFLOPs: 31.33 | +7: iteration 39480/ 173500 | consumed samples: 10106880 | consumed tokens: 20698890240 | elapsed time per iteration (s): 0.42 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.060084E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.791 | TFLOPs: 31.84 | +7: iteration 39490/ 173500 | consumed samples: 10109440 | consumed tokens: 20704133120 | elapsed time per iteration (s): 0.42 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.067726E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.483 | TFLOPs: 31.82 | +7: iteration 39500/ 173500 | consumed samples: 10112000 | consumed tokens: 20709376000 | elapsed time per iteration (s): 0.43 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.047858E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.714 | TFLOPs: 31.36 | +7: iteration 39510/ 173500 | consumed samples: 10114560 | consumed tokens: 20714618880 | elapsed time per iteration (s): 0.42 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.055682E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.859 | TFLOPs: 31.84 | +7: iteration 39520/ 173500 | consumed samples: 10117120 | consumed tokens: 20719861760 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.051255E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.130 | TFLOPs: 31.80 | +7: iteration 39530/ 173500 | consumed samples: 10119680 | consumed tokens: 20725104640 | elapsed time per iteration (s): 0.43 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.053514E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.776 | TFLOPs: 31.36 | +7: iteration 39540/ 173500 | consumed samples: 10122240 | consumed tokens: 20730347520 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.047455E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.501 | TFLOPs: 31.87 | +7: iteration 39550/ 173500 | consumed samples: 10124800 | consumed tokens: 20735590400 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.043283E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.048 | TFLOPs: 31.64 | +7: iteration 39560/ 173500 | consumed samples: 10127360 | consumed tokens: 20740833280 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.051490E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.837 | TFLOPs: 31.84 | +7: iteration 39570/ 173500 | consumed samples: 10129920 | consumed tokens: 20746076160 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.037256E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.518 | TFLOPs: 31.82 | +7: iteration 39580/ 173500 | consumed samples: 10132480 | consumed tokens: 20751319040 | elapsed time per iteration (s): 0.43 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.050589E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.679 | TFLOPs: 31.57 | +7: iteration 39590/ 173500 | consumed samples: 10135040 | consumed tokens: 20756561920 | elapsed time per iteration (s): 0.43 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.061372E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.890 | TFLOPs: 30.95 | +7: iteration 39600/ 173500 | consumed samples: 10137600 | consumed tokens: 20761804800 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.051273E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.802 | TFLOPs: 31.94 | +7: iteration 39610/ 173500 | consumed samples: 10140160 | consumed tokens: 20767047680 | elapsed time per iteration (s): 0.43 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.065385E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.410 | TFLOPs: 31.24 | +7: iteration 39620/ 173500 | consumed samples: 10142720 | consumed tokens: 20772290560 | elapsed time per iteration (s): 0.43 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.056535E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.313 | TFLOPs: 31.60 | +7: iteration 39630/ 173500 | consumed samples: 10145280 | consumed tokens: 20777533440 | elapsed time per iteration (s): 0.42 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.060206E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.926 | TFLOPs: 31.90 | +7: iteration 39640/ 173500 | consumed samples: 10147840 | consumed tokens: 20782776320 | elapsed time per iteration (s): 0.42 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.052784E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.663 | TFLOPs: 31.62 | +7: iteration 39650/ 173500 | consumed samples: 10150400 | consumed tokens: 20788019200 | elapsed time per iteration (s): 0.44 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.047486E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.960 | TFLOPs: 30.85 | +7: iteration 39660/ 173500 | consumed samples: 10152960 | consumed tokens: 20793262080 | elapsed time per iteration (s): 0.42 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.056289E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.924 | TFLOPs: 31.69 | +7: iteration 39670/ 173500 | consumed samples: 10155520 | consumed tokens: 20798504960 | elapsed time per iteration (s): 0.43 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.057724E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.723 | TFLOPs: 31.47 | +7: iteration 39680/ 173500 | consumed samples: 10158080 | consumed tokens: 20803747840 | elapsed time per iteration (s): 0.42 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.047312E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.675 | TFLOPs: 31.78 | +7: iteration 39690/ 173500 | consumed samples: 10160640 | consumed tokens: 20808990720 | elapsed time per iteration (s): 0.42 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.060368E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.124 | TFLOPs: 31.64 | +7: iteration 39700/ 173500 | consumed samples: 10163200 | consumed tokens: 20814233600 | elapsed time per iteration (s): 0.43 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.047460E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.705 | TFLOPs: 31.57 | +7: iteration 39710/ 173500 | consumed samples: 10165760 | consumed tokens: 20819476480 | elapsed time per iteration (s): 0.43 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 3.054798E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.901 | TFLOPs: 31.53 | +7: iteration 39720/ 173500 | consumed samples: 10168320 | consumed tokens: 20824719360 | elapsed time per iteration (s): 0.42 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.051536E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.019 | TFLOPs: 31.74 | +7: iteration 39730/ 173500 | consumed samples: 10170880 | consumed tokens: 20829962240 | elapsed time per iteration (s): 0.43 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.048909E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.624 | TFLOPs: 31.04 | +7: iteration 39740/ 173500 | consumed samples: 10173440 | consumed tokens: 20835205120 | elapsed time per iteration (s): 0.43 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.061325E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.586 | TFLOPs: 31.04 | +7: iteration 39750/ 173500 | consumed samples: 10176000 | consumed tokens: 20840448000 | elapsed time per iteration (s): 0.43 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.050302E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.321 | TFLOPs: 31.45 | +7: iteration 39760/ 173500 | consumed samples: 10178560 | consumed tokens: 20845690880 | elapsed time per iteration (s): 0.42 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.046997E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.775 | TFLOPs: 31.63 | +7: iteration 39770/ 173500 | consumed samples: 10181120 | consumed tokens: 20850933760 | elapsed time per iteration (s): 0.42 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.059061E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.444 | TFLOPs: 31.98 | +7: iteration 39780/ 173500 | consumed samples: 10183680 | consumed tokens: 20856176640 | elapsed time per iteration (s): 0.42 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.044423E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.866 | TFLOPs: 31.68 | +7: iteration 39790/ 173500 | consumed samples: 10186240 | consumed tokens: 20861419520 | elapsed time per iteration (s): 0.44 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.045544E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.858 | TFLOPs: 30.84 | +7: iteration 39800/ 173500 | consumed samples: 10188800 | consumed tokens: 20866662400 | elapsed time per iteration (s): 0.42 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.045099E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.463 | TFLOPs: 31.93 | +7: iteration 39810/ 173500 | consumed samples: 10191360 | consumed tokens: 20871905280 | elapsed time per iteration (s): 0.42 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.062776E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.971 | TFLOPs: 31.95 | +7: iteration 39820/ 173500 | consumed samples: 10193920 | consumed tokens: 20877148160 | elapsed time per iteration (s): 0.43 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.038971E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.973 | TFLOPs: 31.48 | +7: iteration 39830/ 173500 | consumed samples: 10196480 | consumed tokens: 20882391040 | elapsed time per iteration (s): 0.42 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.055943E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.169 | TFLOPs: 32.07 | +7: iteration 39840/ 173500 | consumed samples: 10199040 | consumed tokens: 20887633920 | elapsed time per iteration (s): 0.42 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.045441E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.189 | TFLOPs: 32.07 | +7: iteration 39850/ 173500 | consumed samples: 10201600 | consumed tokens: 20892876800 | elapsed time per iteration (s): 0.43 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.043221E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.722 | TFLOPs: 31.31 | +7: iteration 39860/ 173500 | consumed samples: 10204160 | consumed tokens: 20898119680 | elapsed time per iteration (s): 0.44 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.062329E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.885 | TFLOPs: 30.69 | +7: iteration 39870/ 173500 | consumed samples: 10206720 | consumed tokens: 20903362560 | elapsed time per iteration (s): 0.43 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.059397E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.037 | TFLOPs: 31.59 | +7: iteration 39880/ 173500 | consumed samples: 10209280 | consumed tokens: 20908605440 | elapsed time per iteration (s): 0.43 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.051705E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.915 | TFLOPs: 31.06 | +7: iteration 39890/ 173500 | consumed samples: 10211840 | consumed tokens: 20913848320 | elapsed time per iteration (s): 0.42 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.051351E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.383 | TFLOPs: 31.66 | +7: iteration 39900/ 173500 | consumed samples: 10214400 | consumed tokens: 20919091200 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.049341E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.332 | TFLOPs: 32.08 | +7: iteration 39910/ 173500 | consumed samples: 10216960 | consumed tokens: 20924334080 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.057866E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.336 | TFLOPs: 31.87 | +7: iteration 39920/ 173500 | consumed samples: 10219520 | consumed tokens: 20929576960 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.074586E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.083 | TFLOPs: 31.80 | +7: iteration 39930/ 173500 | consumed samples: 10222080 | consumed tokens: 20934819840 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.040617E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.750 | TFLOPs: 31.78 | +7: iteration 39940/ 173500 | consumed samples: 10224640 | consumed tokens: 20940062720 | elapsed time per iteration (s): 0.44 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.055553E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.732 | TFLOPs: 30.68 | +7: iteration 39950/ 173500 | consumed samples: 10227200 | consumed tokens: 20945305600 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.033859E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.139 | TFLOPs: 31.70 | +7: iteration 39960/ 173500 | consumed samples: 10229760 | consumed tokens: 20950548480 | elapsed time per iteration (s): 0.43 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.044089E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.365 | TFLOPs: 31.24 | +7: iteration 39970/ 173500 | consumed samples: 10232320 | consumed tokens: 20955791360 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.063917E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.095 | TFLOPs: 31.85 | +7: iteration 39980/ 173500 | consumed samples: 10234880 | consumed tokens: 20961034240 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.053400E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.290 | TFLOPs: 31.92 | +7: iteration 39990/ 173500 | consumed samples: 10237440 | consumed tokens: 20966277120 | elapsed time per iteration (s): 0.44 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.060201E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.774 | TFLOPs: 30.52 | +0: [2023-03-17 03:56:19,168] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=0, lr=[0.0001788435118675357, 0.0001788435118675357, 0.0001788435118675357], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 40000/ 173500 | consumed samples: 10240000 | consumed tokens: 20971520000 | elapsed time per iteration (s): 0.45 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.061768E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.282 | TFLOPs: 29.87 | +0: steps: 40000 loss: 3.0664 iter time (s): 0.424 samples/sec: 603.718 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 40000 | lm loss value: 3.297636E+00 | lm loss PPL: 2.704861E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 40000 to checkpoints_221m91b400m +0: [2023-03-17 03:56:19,331] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step40000 is begin to save! +0: [2023-03-17 03:56:19,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_01-model_00-model_states.pt... +0: [2023-03-17 03:56:19,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_01-model_00-model_states.pt. +0: [2023-03-17 03:56:19,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_03-model_00-model_states.pt... +0: [2023-03-17 03:56:19,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_03-model_00-model_states.pt. +0: [2023-03-17 03:56:19,486] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_04-model_00-model_states.pt... +0: [2023-03-17 03:56:19,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_04-model_00-model_states.pt. +0: [2023-03-17 03:56:19,510] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_05-model_00-model_states.pt... +0: [2023-03-17 03:56:19,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_05-model_00-model_states.pt. +0: [2023-03-17 03:56:19,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_06-model_00-model_states.pt... +0: [2023-03-17 03:56:19,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_06-model_00-model_states.pt. +0: [2023-03-17 03:56:19,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_07-model_00-model_states.pt... +0: [2023-03-17 03:56:19,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_07-model_00-model_states.pt. +0: [2023-03-17 03:56:19,584] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_08-model_00-model_states.pt... +0: [2023-03-17 03:56:19,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_08-model_00-model_states.pt. +0: [2023-03-17 03:56:19,608] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_09-model_00-model_states.pt... +0: [2023-03-17 03:56:19,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_09-model_00-model_states.pt. +0: [2023-03-17 03:56:19,633] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_10-model_00-model_states.pt... +0: [2023-03-17 03:56:19,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_10-model_00-model_states.pt. +0: [2023-03-17 03:56:19,657] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_11-model_00-model_states.pt... +0: [2023-03-17 03:56:19,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_11-model_00-model_states.pt. +0: [2023-03-17 03:56:19,682] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_12-model_00-model_states.pt... +0: [2023-03-17 03:56:19,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_12-model_00-model_states.pt. +0: [2023-03-17 03:56:19,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_13-model_00-model_states.pt... +0: [2023-03-17 03:56:19,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_13-model_00-model_states.pt. +0: [2023-03-17 03:56:19,731] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_14-model_00-model_states.pt... +0: [2023-03-17 03:56:19,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_14-model_00-model_states.pt. +0: [2023-03-17 03:56:19,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_15-model_00-model_states.pt... +0: [2023-03-17 03:56:19,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_15-model_00-model_states.pt. +0: [2023-03-17 03:56:19,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_16-model_00-model_states.pt... +0: [2023-03-17 03:56:19,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_16-model_00-model_states.pt. +0: [2023-03-17 03:56:19,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_17-model_00-model_states.pt... +0: [2023-03-17 03:56:19,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_17-model_00-model_states.pt. +0: [2023-03-17 03:56:19,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_18-model_00-model_states.pt... +0: [2023-03-17 03:56:19,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_18-model_00-model_states.pt. +0: [2023-03-17 03:56:19,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_19-model_00-model_states.pt... +0: [2023-03-17 03:56:19,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_19-model_00-model_states.pt. +0: [2023-03-17 03:56:19,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_20-model_00-model_states.pt... +0: [2023-03-17 03:56:19,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_20-model_00-model_states.pt. +0: [2023-03-17 03:56:19,901] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/layer_22-model_00-model_states.pt... +0: [2023-03-17 03:56:19,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/layer_22-model_00-model_states.pt. +0: [2023-03-17 03:56:19,907] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step40000/mp_rank_00_model_states.pt +0: [2023-03-17 03:56:19,907] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/mp_rank_00_model_states.pt... +0: [2023-03-17 03:56:19,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/mp_rank_00_model_states.pt. +0: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +1: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +7: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +6: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +2: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +4: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +5: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +3: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +1: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +7: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +6: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +2: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +4: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +5: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +3: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +2: [2023-03-17 03:56:19,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +0: [2023-03-17 03:56:20,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 03:56:20,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 03:56:20,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +0: [2023-03-17 03:56:20,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 03:56:20,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 03:56:20,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +0: [2023-03-17 03:56:20,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 03:56:20,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 03:56:20,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 03:56:20,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +0: [2023-03-17 03:56:20,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 03:56:20,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 03:56:20,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +5: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +5: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +5: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +5: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +5: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +5: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +2: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 03:56:20,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +2: [2023-03-17 03:56:20,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +5: [2023-03-17 03:56:20,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 03:56:20,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 03:56:20,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +5: [2023-03-17 03:56:20,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 03:56:20,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +2: [2023-03-17 03:56:20,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 03:56:20,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +6: [2023-03-17 03:56:20,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +2: [2023-03-17 03:56:20,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +6: [2023-03-17 03:56:20,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 03:56:20,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +5: [2023-03-17 03:56:20,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +5: [2023-03-17 03:56:20,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 03:56:20,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 03:56:20,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +5: [2023-03-17 03:56:20,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 03:56:20,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 03:56:20,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +6: [2023-03-17 03:56:20,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 03:56:20,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 03:56:20,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +6: [2023-03-17 03:56:20,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 03:56:20,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 03:56:20,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +6: [2023-03-17 03:56:20,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 03:56:20,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 03:56:20,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +6: [2023-03-17 03:56:20,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 03:56:20,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 03:56:20,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 03:56:20,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 03:56:20,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 03:56:20,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 03:56:20,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 03:56:20,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 03:56:20,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +6: [2023-03-17 03:56:20,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +6: [2023-03-17 03:56:20,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +6: [2023-03-17 03:56:20,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 03:56:20,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 03:56:20,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 03:56:20,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 03:56:20,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 03:56:20,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 03:56:20,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 03:56:20,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 03:56:20,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +3: [2023-03-17 03:56:20,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +0: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 03:56:20,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +0: [2023-03-17 03:56:20,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 03:56:20,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +0: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +7: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 03:56:20,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 03:56:20,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 03:56:20,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 03:56:20,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +7: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +7: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +7: [2023-03-17 03:56:20,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +7: [2023-03-17 03:56:20,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 03:56:20,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 03:56:20,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +7: [2023-03-17 03:56:20,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 03:56:20,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 03:56:20,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 03:56:20,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 03:56:20,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 03:56:20,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 03:56:20,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +7: [2023-03-17 03:56:20,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +7: [2023-03-17 03:56:20,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +1: [2023-03-17 03:56:20,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 03:56:20,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 03:56:20,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 03:56:20,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 03:56:20,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 03:56:20,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 03:56:20,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 03:56:20,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 03:56:20,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 03:56:20,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 03:56:20,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 03:56:20,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 03:56:20,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 03:56:20,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +1: [2023-03-17 03:56:20,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +1: [2023-03-17 03:56:20,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +1: [2023-03-17 03:56:20,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +1: [2023-03-17 03:56:20,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 03:56:20,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +1: [2023-03-17 03:56:20,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +1: [2023-03-17 03:56:20,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +1: [2023-03-17 03:56:20,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 03:56:20,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 03:56:20,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 03:56:20,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 03:56:20,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 03:56:20,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 03:56:20,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 03:56:20,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 03:56:20,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +4: [2023-03-17 03:56:20,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +4: [2023-03-17 03:56:20,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +4: [2023-03-17 03:56:20,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +0: [2023-03-17 03:56:20,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 03:56:20,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! +0: successfully saved checkpoint at iteration 40000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 746.01 +7: iteration 40010/ 173500 | consumed samples: 10242560 | consumed tokens: 20976762880 | elapsed time per iteration (s): 0.52 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.040602E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 496.599 | TFLOPs: 26.06 | +7: iteration 40020/ 173500 | consumed samples: 10245120 | consumed tokens: 20982005760 | elapsed time per iteration (s): 0.43 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.066241E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.067 | TFLOPs: 31.59 | +7: iteration 40030/ 173500 | consumed samples: 10247680 | consumed tokens: 20987248640 | elapsed time per iteration (s): 0.42 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.053435E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.890 | TFLOPs: 31.63 | +7: iteration 40040/ 173500 | consumed samples: 10250240 | consumed tokens: 20992491520 | elapsed time per iteration (s): 0.43 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.036250E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.254 | TFLOPs: 31.34 | +7: iteration 40050/ 173500 | consumed samples: 10252800 | consumed tokens: 20997734400 | elapsed time per iteration (s): 0.43 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.042148E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.557 | TFLOPs: 31.20 | +7: iteration 40060/ 173500 | consumed samples: 10255360 | consumed tokens: 21002977280 | elapsed time per iteration (s): 0.43 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.046366E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.206 | TFLOPs: 31.39 | +7: iteration 40070/ 173500 | consumed samples: 10257920 | consumed tokens: 21008220160 | elapsed time per iteration (s): 0.42 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.047302E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.911 | TFLOPs: 31.63 | +7: iteration 40080/ 173500 | consumed samples: 10260480 | consumed tokens: 21013463040 | elapsed time per iteration (s): 0.43 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.058964E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.399 | TFLOPs: 31.29 | +7: iteration 40090/ 173500 | consumed samples: 10263040 | consumed tokens: 21018705920 | elapsed time per iteration (s): 0.43 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.042276E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.307 | TFLOPs: 31.55 | +7: iteration 40100/ 173500 | consumed samples: 10265600 | consumed tokens: 21023948800 | elapsed time per iteration (s): 0.43 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.060426E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.301 | TFLOPs: 31.18 | +7: iteration 40110/ 173500 | consumed samples: 10268160 | consumed tokens: 21029191680 | elapsed time per iteration (s): 0.42 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.048492E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.546 | TFLOPs: 32.09 | +7: iteration 40120/ 173500 | consumed samples: 10270720 | consumed tokens: 21034434560 | elapsed time per iteration (s): 0.45 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.064765E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.283 | TFLOPs: 29.87 | +7: iteration 40130/ 173500 | consumed samples: 10273280 | consumed tokens: 21039677440 | elapsed time per iteration (s): 0.44 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.043181E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.656 | TFLOPs: 30.31 | +7: iteration 40140/ 173500 | consumed samples: 10275840 | consumed tokens: 21044920320 | elapsed time per iteration (s): 0.44 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.045099E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.530 | TFLOPs: 30.62 | +7: iteration 40150/ 173500 | consumed samples: 10278400 | consumed tokens: 21050163200 | elapsed time per iteration (s): 0.45 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.042066E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.315 | TFLOPs: 29.77 | +7: iteration 40160/ 173500 | consumed samples: 10280960 | consumed tokens: 21055406080 | elapsed time per iteration (s): 0.46 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.039830E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.065 | TFLOPs: 29.44 | +7: iteration 40170/ 173500 | consumed samples: 10283520 | consumed tokens: 21060648960 | elapsed time per iteration (s): 0.46 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.041073E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.514 | TFLOPs: 29.41 | +7: iteration 40180/ 173500 | consumed samples: 10286080 | consumed tokens: 21065891840 | elapsed time per iteration (s): 0.46 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 3.055547E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.851 | TFLOPs: 28.95 | +7: iteration 40190/ 173500 | consumed samples: 10288640 | consumed tokens: 21071134720 | elapsed time per iteration (s): 0.43 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.050929E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.457 | TFLOPs: 30.98 | +7: iteration 40200/ 173500 | consumed samples: 10291200 | consumed tokens: 21076377600 | elapsed time per iteration (s): 0.42 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.048373E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.231 | TFLOPs: 32.02 | +7: iteration 40210/ 173500 | consumed samples: 10293760 | consumed tokens: 21081620480 | elapsed time per iteration (s): 0.42 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.050772E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.295 | TFLOPs: 31.76 | +7: iteration 40220/ 173500 | consumed samples: 10296320 | consumed tokens: 21086863360 | elapsed time per iteration (s): 0.43 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.060480E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.570 | TFLOPs: 31.25 | +7: iteration 40230/ 173500 | consumed samples: 10298880 | consumed tokens: 21092106240 | elapsed time per iteration (s): 0.43 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.056696E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.972 | TFLOPs: 31.53 | +7: iteration 40240/ 173500 | consumed samples: 10301440 | consumed tokens: 21097349120 | elapsed time per iteration (s): 0.43 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.052746E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.486 | TFLOPs: 31.40 | +7: iteration 40250/ 173500 | consumed samples: 10304000 | consumed tokens: 21102592000 | elapsed time per iteration (s): 0.43 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.045454E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.284 | TFLOPs: 31.29 | +7: iteration 40260/ 173500 | consumed samples: 10306560 | consumed tokens: 21107834880 | elapsed time per iteration (s): 0.42 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.049738E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.030 | TFLOPs: 31.90 | +7: iteration 40270/ 173500 | consumed samples: 10309120 | consumed tokens: 21113077760 | elapsed time per iteration (s): 0.42 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.042615E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.489 | TFLOPs: 31.87 | +7: iteration 40280/ 173500 | consumed samples: 10311680 | consumed tokens: 21118320640 | elapsed time per iteration (s): 0.43 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.053718E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.076 | TFLOPs: 31.59 | +7: iteration 40290/ 173500 | consumed samples: 10314240 | consumed tokens: 21123563520 | elapsed time per iteration (s): 0.42 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.051777E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.838 | TFLOPs: 31.79 | +7: iteration 40300/ 173500 | consumed samples: 10316800 | consumed tokens: 21128806400 | elapsed time per iteration (s): 0.42 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.036981E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.307 | TFLOPs: 31.65 | +7: iteration 40310/ 173500 | consumed samples: 10319360 | consumed tokens: 21134049280 | elapsed time per iteration (s): 0.43 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.056104E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.696 | TFLOPs: 31.36 | +7: iteration 40320/ 173500 | consumed samples: 10321920 | consumed tokens: 21139292160 | elapsed time per iteration (s): 0.43 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.036643E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.936 | TFLOPs: 31.43 | +7: iteration 40330/ 173500 | consumed samples: 10324480 | consumed tokens: 21144535040 | elapsed time per iteration (s): 0.42 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.062786E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.947 | TFLOPs: 31.85 | +7: iteration 40340/ 173500 | consumed samples: 10327040 | consumed tokens: 21149777920 | elapsed time per iteration (s): 0.42 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.059579E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.993 | TFLOPs: 32.06 | +7: iteration 40350/ 173500 | consumed samples: 10329600 | consumed tokens: 21155020800 | elapsed time per iteration (s): 0.43 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.057254E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.466 | TFLOPs: 31.19 | +7: iteration 40360/ 173500 | consumed samples: 10332160 | consumed tokens: 21160263680 | elapsed time per iteration (s): 0.42 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.051363E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.864 | TFLOPs: 32.05 | +7: iteration 40370/ 173500 | consumed samples: 10334720 | consumed tokens: 21165506560 | elapsed time per iteration (s): 0.42 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.058371E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.668 | TFLOPs: 32.09 | +7: iteration 40380/ 173500 | consumed samples: 10337280 | consumed tokens: 21170749440 | elapsed time per iteration (s): 0.42 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.043217E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.623 | TFLOPs: 31.88 | +7: iteration 40390/ 173500 | consumed samples: 10339840 | consumed tokens: 21175992320 | elapsed time per iteration (s): 0.42 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.055123E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.479 | TFLOPs: 32.08 | +7: iteration 40400/ 173500 | consumed samples: 10342400 | consumed tokens: 21181235200 | elapsed time per iteration (s): 0.43 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.044576E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.362 | TFLOPs: 31.29 | +7: iteration 40410/ 173500 | consumed samples: 10344960 | consumed tokens: 21186478080 | elapsed time per iteration (s): 0.42 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.065300E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.224 | TFLOPs: 31.76 | +7: iteration 40420/ 173500 | consumed samples: 10347520 | consumed tokens: 21191720960 | elapsed time per iteration (s): 0.42 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.059095E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.128 | TFLOPs: 31.80 | +7: iteration 40430/ 173500 | consumed samples: 10350080 | consumed tokens: 21196963840 | elapsed time per iteration (s): 0.42 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.039324E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.112 | TFLOPs: 32.06 | +7: iteration 40440/ 173500 | consumed samples: 10352640 | consumed tokens: 21202206720 | elapsed time per iteration (s): 0.42 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.038569E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.174 | TFLOPs: 32.07 | +7: iteration 40450/ 173500 | consumed samples: 10355200 | consumed tokens: 21207449600 | elapsed time per iteration (s): 0.44 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.046122E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.710 | TFLOPs: 30.42 | +7: iteration 40460/ 173500 | consumed samples: 10357760 | consumed tokens: 21212692480 | elapsed time per iteration (s): 0.42 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 3.054082E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.885 | TFLOPs: 31.74 | +7: iteration 40470/ 173500 | consumed samples: 10360320 | consumed tokens: 21217935360 | elapsed time per iteration (s): 0.42 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.048092E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.592 | TFLOPs: 31.62 | +7: iteration 40480/ 173500 | consumed samples: 10362880 | consumed tokens: 21223178240 | elapsed time per iteration (s): 0.42 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.044517E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.469 | TFLOPs: 31.87 | +7: iteration 40490/ 173500 | consumed samples: 10365440 | consumed tokens: 21228421120 | elapsed time per iteration (s): 0.42 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.044626E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.370 | TFLOPs: 31.66 | +7: iteration 40500/ 173500 | consumed samples: 10368000 | consumed tokens: 21233664000 | elapsed time per iteration (s): 0.43 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.044251E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.512 | TFLOPs: 31.09 | +7: iteration 40510/ 173500 | consumed samples: 10370560 | consumed tokens: 21238906880 | elapsed time per iteration (s): 0.43 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.060729E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.004 | TFLOPs: 31.48 | +7: iteration 40520/ 173500 | consumed samples: 10373120 | consumed tokens: 21244149760 | elapsed time per iteration (s): 0.42 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.047738E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.522 | TFLOPs: 31.88 | +7: iteration 40530/ 173500 | consumed samples: 10375680 | consumed tokens: 21249392640 | elapsed time per iteration (s): 0.42 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.046529E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.799 | TFLOPs: 31.73 | +7: iteration 40540/ 173500 | consumed samples: 10378240 | consumed tokens: 21254635520 | elapsed time per iteration (s): 0.43 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.046402E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.997 | TFLOPs: 31.59 | +7: iteration 40550/ 173500 | consumed samples: 10380800 | consumed tokens: 21259878400 | elapsed time per iteration (s): 0.42 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.052394E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.492 | TFLOPs: 31.87 | +7: iteration 40560/ 173500 | consumed samples: 10383360 | consumed tokens: 21265121280 | elapsed time per iteration (s): 0.42 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.058506E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.004 | TFLOPs: 31.85 | +7: iteration 40570/ 173500 | consumed samples: 10385920 | consumed tokens: 21270364160 | elapsed time per iteration (s): 0.43 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.050426E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.971 | TFLOPs: 31.06 | +7: iteration 40580/ 173500 | consumed samples: 10388480 | consumed tokens: 21275607040 | elapsed time per iteration (s): 0.42 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.065562E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.355 | TFLOPs: 31.87 | +7: iteration 40590/ 173500 | consumed samples: 10391040 | consumed tokens: 21280849920 | elapsed time per iteration (s): 0.43 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.051371E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.082 | TFLOPs: 31.54 | +7: iteration 40600/ 173500 | consumed samples: 10393600 | consumed tokens: 21286092800 | elapsed time per iteration (s): 0.42 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.041887E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.784 | TFLOPs: 31.99 | +7: iteration 40610/ 173500 | consumed samples: 10396160 | consumed tokens: 21291335680 | elapsed time per iteration (s): 0.43 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.046971E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.238 | TFLOPs: 31.34 | +7: iteration 40620/ 173500 | consumed samples: 10398720 | consumed tokens: 21296578560 | elapsed time per iteration (s): 0.42 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.048107E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.930 | TFLOPs: 31.84 | +7: iteration 40630/ 173500 | consumed samples: 10401280 | consumed tokens: 21301821440 | elapsed time per iteration (s): 0.42 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.053861E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.818 | TFLOPs: 31.79 | +7: iteration 40640/ 173500 | consumed samples: 10403840 | consumed tokens: 21307064320 | elapsed time per iteration (s): 0.42 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.053580E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.999 | TFLOPs: 31.74 | +7: iteration 40650/ 173500 | consumed samples: 10406400 | consumed tokens: 21312307200 | elapsed time per iteration (s): 0.42 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.051926E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.318 | TFLOPs: 31.81 | +7: iteration 40660/ 173500 | consumed samples: 10408960 | consumed tokens: 21317550080 | elapsed time per iteration (s): 0.43 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.062083E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.871 | TFLOPs: 31.42 | +7: iteration 40670/ 173500 | consumed samples: 10411520 | consumed tokens: 21322792960 | elapsed time per iteration (s): 0.42 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.033824E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.927 | TFLOPs: 31.95 | +7: iteration 40680/ 173500 | consumed samples: 10414080 | consumed tokens: 21328035840 | elapsed time per iteration (s): 0.42 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.048396E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.491 | TFLOPs: 31.72 | +7: iteration 40690/ 173500 | consumed samples: 10416640 | consumed tokens: 21333278720 | elapsed time per iteration (s): 0.43 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.047068E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.897 | TFLOPs: 31.58 | +7: iteration 40700/ 173500 | consumed samples: 10419200 | consumed tokens: 21338521600 | elapsed time per iteration (s): 0.42 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.040001E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.528 | TFLOPs: 31.82 | +7: iteration 40710/ 173500 | consumed samples: 10421760 | consumed tokens: 21343764480 | elapsed time per iteration (s): 0.43 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.045530E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.140 | TFLOPs: 31.38 | +7: iteration 40720/ 173500 | consumed samples: 10424320 | consumed tokens: 21349007360 | elapsed time per iteration (s): 0.42 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.040955E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.343 | TFLOPs: 31.87 | +7: iteration 40730/ 173500 | consumed samples: 10426880 | consumed tokens: 21354250240 | elapsed time per iteration (s): 0.42 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.054344E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.479 | TFLOPs: 31.82 | +7: iteration 40740/ 173500 | consumed samples: 10429440 | consumed tokens: 21359493120 | elapsed time per iteration (s): 0.42 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.054162E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.016 | TFLOPs: 31.80 | +7: iteration 40750/ 173500 | consumed samples: 10432000 | consumed tokens: 21364736000 | elapsed time per iteration (s): 0.42 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.057284E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.766 | TFLOPs: 31.84 | +7: iteration 40760/ 173500 | consumed samples: 10434560 | consumed tokens: 21369978880 | elapsed time per iteration (s): 0.43 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.043633E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.547 | TFLOPs: 30.88 | +7: iteration 40770/ 173500 | consumed samples: 10437120 | consumed tokens: 21375221760 | elapsed time per iteration (s): 0.42 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.052304E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.543 | TFLOPs: 31.67 | +7: iteration 40780/ 173500 | consumed samples: 10439680 | consumed tokens: 21380464640 | elapsed time per iteration (s): 0.42 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.046933E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.443 | TFLOPs: 32.03 | +7: iteration 40790/ 173500 | consumed samples: 10442240 | consumed tokens: 21385707520 | elapsed time per iteration (s): 0.42 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.032228E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.171 | TFLOPs: 32.01 | +7: iteration 40800/ 173500 | consumed samples: 10444800 | consumed tokens: 21390950400 | elapsed time per iteration (s): 0.42 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.059672E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.046 | TFLOPs: 32.01 | +7: iteration 40810/ 173500 | consumed samples: 10447360 | consumed tokens: 21396193280 | elapsed time per iteration (s): 0.42 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.048586E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.803 | TFLOPs: 31.79 | +7: iteration 40820/ 173500 | consumed samples: 10449920 | consumed tokens: 21401436160 | elapsed time per iteration (s): 0.42 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.043913E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.989 | TFLOPs: 32.01 | +7: iteration 40830/ 173500 | consumed samples: 10452480 | consumed tokens: 21406679040 | elapsed time per iteration (s): 0.42 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.044016E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.764 | TFLOPs: 31.63 | +7: iteration 40840/ 173500 | consumed samples: 10455040 | consumed tokens: 21411921920 | elapsed time per iteration (s): 0.42 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 3.059758E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.143 | TFLOPs: 32.01 | +7: iteration 40850/ 173500 | consumed samples: 10457600 | consumed tokens: 21417164800 | elapsed time per iteration (s): 0.42 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 3.054466E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.994 | TFLOPs: 31.80 | +7: iteration 40860/ 173500 | consumed samples: 10460160 | consumed tokens: 21422407680 | elapsed time per iteration (s): 0.43 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 3.049340E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.924 | TFLOPs: 30.95 | +7: iteration 40870/ 173500 | consumed samples: 10462720 | consumed tokens: 21427650560 | elapsed time per iteration (s): 0.42 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 3.045196E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.426 | TFLOPs: 31.82 | +7: iteration 40880/ 173500 | consumed samples: 10465280 | consumed tokens: 21432893440 | elapsed time per iteration (s): 0.43 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 3.055400E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.766 | TFLOPs: 31.57 | +7: iteration 40890/ 173500 | consumed samples: 10467840 | consumed tokens: 21438136320 | elapsed time per iteration (s): 0.42 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 3.050816E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.021 | TFLOPs: 31.80 | +7: iteration 40900/ 173500 | consumed samples: 10470400 | consumed tokens: 21443379200 | elapsed time per iteration (s): 0.42 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 3.047844E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.989 | TFLOPs: 31.85 | +7: iteration 40910/ 173500 | consumed samples: 10472960 | consumed tokens: 21448622080 | elapsed time per iteration (s): 0.42 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 3.046043E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.600 | TFLOPs: 31.77 | +7: iteration 40920/ 173500 | consumed samples: 10475520 | consumed tokens: 21453864960 | elapsed time per iteration (s): 0.43 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 3.035016E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.062 | TFLOPs: 31.48 | +7: iteration 40930/ 173500 | consumed samples: 10478080 | consumed tokens: 21459107840 | elapsed time per iteration (s): 0.42 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.051998E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.878 | TFLOPs: 32.00 | +7: iteration 40940/ 173500 | consumed samples: 10480640 | consumed tokens: 21464350720 | elapsed time per iteration (s): 0.42 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.043232E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.354 | TFLOPs: 32.02 | +7: iteration 40950/ 173500 | consumed samples: 10483200 | consumed tokens: 21469593600 | elapsed time per iteration (s): 0.43 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.047574E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.535 | TFLOPs: 31.46 | +7: iteration 40960/ 173500 | consumed samples: 10485760 | consumed tokens: 21474836480 | elapsed time per iteration (s): 0.43 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.042061E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.956 | TFLOPs: 31.32 | +7: iteration 40970/ 173500 | consumed samples: 10488320 | consumed tokens: 21480079360 | elapsed time per iteration (s): 0.42 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.050894E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.615 | TFLOPs: 31.88 | +7: iteration 40980/ 173500 | consumed samples: 10490880 | consumed tokens: 21485322240 | elapsed time per iteration (s): 0.43 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.063966E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.400 | TFLOPs: 31.45 | +7: iteration 40990/ 173500 | consumed samples: 10493440 | consumed tokens: 21490565120 | elapsed time per iteration (s): 0.42 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.047638E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.892 | TFLOPs: 32.00 | +7: iteration 41000/ 173500 | consumed samples: 10496000 | consumed tokens: 21495808000 | elapsed time per iteration (s): 0.43 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.038523E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.306 | TFLOPs: 31.60 | +7: iteration 41010/ 173500 | consumed samples: 10498560 | consumed tokens: 21501050880 | elapsed time per iteration (s): 0.42 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.033541E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.069 | TFLOPs: 31.80 | +7: iteration 41020/ 173500 | consumed samples: 10501120 | consumed tokens: 21506293760 | elapsed time per iteration (s): 0.43 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.045397E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.242 | TFLOPs: 31.49 | +7: iteration 41030/ 173500 | consumed samples: 10503680 | consumed tokens: 21511536640 | elapsed time per iteration (s): 0.42 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.036605E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.294 | TFLOPs: 31.81 | +7: iteration 41040/ 173500 | consumed samples: 10506240 | consumed tokens: 21516779520 | elapsed time per iteration (s): 0.42 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.046413E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.016 | TFLOPs: 32.01 | +7: iteration 41050/ 173500 | consumed samples: 10508800 | consumed tokens: 21522022400 | elapsed time per iteration (s): 0.42 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.045146E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.469 | TFLOPs: 31.87 | +7: iteration 41060/ 173500 | consumed samples: 10511360 | consumed tokens: 21527265280 | elapsed time per iteration (s): 0.43 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.044756E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.042 | TFLOPs: 31.43 | +7: iteration 41070/ 173500 | consumed samples: 10513920 | consumed tokens: 21532508160 | elapsed time per iteration (s): 0.42 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.039619E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.328 | TFLOPs: 31.76 | +7: iteration 41080/ 173500 | consumed samples: 10516480 | consumed tokens: 21537751040 | elapsed time per iteration (s): 0.42 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.048736E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.296 | TFLOPs: 32.02 | +7: iteration 41090/ 173500 | consumed samples: 10519040 | consumed tokens: 21542993920 | elapsed time per iteration (s): 0.42 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.053243E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.630 | TFLOPs: 31.78 | +7: iteration 41100/ 173500 | consumed samples: 10521600 | consumed tokens: 21548236800 | elapsed time per iteration (s): 0.42 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.039883E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.915 | TFLOPs: 32.00 | +7: iteration 41110/ 173500 | consumed samples: 10524160 | consumed tokens: 21553479680 | elapsed time per iteration (s): 0.43 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.053394E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.010 | TFLOPs: 31.06 | +7: iteration 41120/ 173500 | consumed samples: 10526720 | consumed tokens: 21558722560 | elapsed time per iteration (s): 0.42 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.044786E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.593 | TFLOPs: 31.77 | +7: iteration 41130/ 173500 | consumed samples: 10529280 | consumed tokens: 21563965440 | elapsed time per iteration (s): 0.42 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.057214E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.861 | TFLOPs: 32.00 | +7: iteration 41140/ 173500 | consumed samples: 10531840 | consumed tokens: 21569208320 | elapsed time per iteration (s): 0.43 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.046040E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.786 | TFLOPs: 31.52 | +7: iteration 41150/ 173500 | consumed samples: 10534400 | consumed tokens: 21574451200 | elapsed time per iteration (s): 0.42 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.042251E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.270 | TFLOPs: 31.81 | +7: iteration 41160/ 173500 | consumed samples: 10536960 | consumed tokens: 21579694080 | elapsed time per iteration (s): 0.43 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.036787E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.477 | TFLOPs: 31.45 | +7: iteration 41170/ 173500 | consumed samples: 10539520 | consumed tokens: 21584936960 | elapsed time per iteration (s): 0.42 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.057184E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.604 | TFLOPs: 31.62 | +7: iteration 41180/ 173500 | consumed samples: 10542080 | consumed tokens: 21590179840 | elapsed time per iteration (s): 0.42 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.056099E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.415 | TFLOPs: 32.03 | +7: iteration 41190/ 173500 | consumed samples: 10544640 | consumed tokens: 21595422720 | elapsed time per iteration (s): 0.43 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.036900E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.485 | TFLOPs: 31.51 | +7: iteration 41200/ 173500 | consumed samples: 10547200 | consumed tokens: 21600665600 | elapsed time per iteration (s): 0.42 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.044218E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.118 | TFLOPs: 32.01 | +7: iteration 41210/ 173500 | consumed samples: 10549760 | consumed tokens: 21605908480 | elapsed time per iteration (s): 0.42 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.039240E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.309 | TFLOPs: 31.71 | +7: iteration 41220/ 173500 | consumed samples: 10552320 | consumed tokens: 21611151360 | elapsed time per iteration (s): 0.43 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.040709E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.253 | TFLOPs: 31.23 | +7: iteration 41230/ 173500 | consumed samples: 10554880 | consumed tokens: 21616394240 | elapsed time per iteration (s): 0.42 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.056883E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.816 | TFLOPs: 32.00 | +7: iteration 41240/ 173500 | consumed samples: 10557440 | consumed tokens: 21621637120 | elapsed time per iteration (s): 0.42 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.053337E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.279 | TFLOPs: 31.81 | +7: iteration 41250/ 173500 | consumed samples: 10560000 | consumed tokens: 21626880000 | elapsed time per iteration (s): 0.42 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.037986E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.047 | TFLOPs: 31.96 | +7: iteration 41260/ 173500 | consumed samples: 10562560 | consumed tokens: 21632122880 | elapsed time per iteration (s): 0.43 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.050083E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.631 | TFLOPs: 31.57 | +7: iteration 41270/ 173500 | consumed samples: 10565120 | consumed tokens: 21637365760 | elapsed time per iteration (s): 0.42 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.050760E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.665 | TFLOPs: 31.99 | +7: iteration 41280/ 173500 | consumed samples: 10567680 | consumed tokens: 21642608640 | elapsed time per iteration (s): 0.42 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.052510E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.689 | TFLOPs: 31.99 | +7: iteration 41290/ 173500 | consumed samples: 10570240 | consumed tokens: 21647851520 | elapsed time per iteration (s): 0.43 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.042740E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.314 | TFLOPs: 31.55 | +7: iteration 41300/ 173500 | consumed samples: 10572800 | consumed tokens: 21653094400 | elapsed time per iteration (s): 0.43 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 3.034382E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.549 | TFLOPs: 30.88 | +7: iteration 41310/ 173500 | consumed samples: 10575360 | consumed tokens: 21658337280 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 3.055266E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.941 | TFLOPs: 31.85 | +7: iteration 41320/ 173500 | consumed samples: 10577920 | consumed tokens: 21663580160 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 3.047602E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.907 | TFLOPs: 32.00 | +7: iteration 41330/ 173500 | consumed samples: 10580480 | consumed tokens: 21668823040 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 3.042860E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.521 | TFLOPs: 31.98 | +7: iteration 41340/ 173500 | consumed samples: 10583040 | consumed tokens: 21674065920 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 3.042682E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.503 | TFLOPs: 31.98 | +7: iteration 41350/ 173500 | consumed samples: 10585600 | consumed tokens: 21679308800 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 3.045884E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.249 | TFLOPs: 31.97 | +7: iteration 41360/ 173500 | consumed samples: 10588160 | consumed tokens: 21684551680 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 3.048994E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.293 | TFLOPs: 31.97 | +7: iteration 41370/ 173500 | consumed samples: 10590720 | consumed tokens: 21689794560 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 3.046780E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.712 | TFLOPs: 31.99 | +7: iteration 41380/ 173500 | consumed samples: 10593280 | consumed tokens: 21695037440 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 3.054180E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.190 | TFLOPs: 31.70 | +7: iteration 41390/ 173500 | consumed samples: 10595840 | consumed tokens: 21700280320 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.052046E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.027 | TFLOPs: 32.01 | +7: iteration 41400/ 173500 | consumed samples: 10598400 | consumed tokens: 21705523200 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.048957E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.871 | TFLOPs: 31.79 | +7: iteration 41410/ 173500 | consumed samples: 10600960 | consumed tokens: 21710766080 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.057349E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.465 | TFLOPs: 31.61 | +7: iteration 41420/ 173500 | consumed samples: 10603520 | consumed tokens: 21716008960 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.051360E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.172 | TFLOPs: 31.75 | +7: iteration 41430/ 173500 | consumed samples: 10606080 | consumed tokens: 21721251840 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.030491E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.903 | TFLOPs: 32.00 | +7: iteration 41440/ 173500 | consumed samples: 10608640 | consumed tokens: 21726494720 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.037493E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.119 | TFLOPs: 32.01 | +7: iteration 41450/ 173500 | consumed samples: 10611200 | consumed tokens: 21731737600 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.048707E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.887 | TFLOPs: 32.00 | +7: iteration 41460/ 173500 | consumed samples: 10613760 | consumed tokens: 21736980480 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.054581E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.868 | TFLOPs: 31.74 | +7: iteration 41470/ 173500 | consumed samples: 10616320 | consumed tokens: 21742223360 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.052076E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.467 | TFLOPs: 31.98 | +7: iteration 41480/ 173500 | consumed samples: 10618880 | consumed tokens: 21747466240 | elapsed time per iteration (s): 0.42 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.041731E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.772 | TFLOPs: 31.99 | +7: iteration 41490/ 173500 | consumed samples: 10621440 | consumed tokens: 21752709120 | elapsed time per iteration (s): 0.42 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.044957E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.975 | TFLOPs: 32.00 | +7: iteration 41500/ 173500 | consumed samples: 10624000 | consumed tokens: 21757952000 | elapsed time per iteration (s): 0.42 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.052980E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.951 | TFLOPs: 32.00 | +7: iteration 41510/ 173500 | consumed samples: 10626560 | consumed tokens: 21763194880 | elapsed time per iteration (s): 0.42 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.043730E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.984 | TFLOPs: 32.00 | +7: iteration 41520/ 173500 | consumed samples: 10629120 | consumed tokens: 21768437760 | elapsed time per iteration (s): 0.42 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.048092E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.839 | TFLOPs: 32.00 | +7: iteration 41530/ 173500 | consumed samples: 10631680 | consumed tokens: 21773680640 | elapsed time per iteration (s): 0.42 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.037062E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.672 | TFLOPs: 31.99 | +7: iteration 41540/ 173500 | consumed samples: 10634240 | consumed tokens: 21778923520 | elapsed time per iteration (s): 0.42 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.044062E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.992 | TFLOPs: 31.95 | +7: iteration 41550/ 173500 | consumed samples: 10636800 | consumed tokens: 21784166400 | elapsed time per iteration (s): 0.43 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.045501E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.112 | TFLOPs: 31.22 | +7: iteration 41560/ 173500 | consumed samples: 10639360 | consumed tokens: 21789409280 | elapsed time per iteration (s): 0.43 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.059968E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.930 | TFLOPs: 30.90 | +7: iteration 41570/ 173500 | consumed samples: 10641920 | consumed tokens: 21794652160 | elapsed time per iteration (s): 0.42 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.044500E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.716 | TFLOPs: 32.04 | +7: iteration 41580/ 173500 | consumed samples: 10644480 | consumed tokens: 21799895040 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 3.041662E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.884 | TFLOPs: 32.00 | +7: iteration 41590/ 173500 | consumed samples: 10647040 | consumed tokens: 21805137920 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 3.042311E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.358 | TFLOPs: 31.97 | +7: iteration 41600/ 173500 | consumed samples: 10649600 | consumed tokens: 21810380800 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 3.041268E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.127 | TFLOPs: 31.96 | +7: iteration 41610/ 173500 | consumed samples: 10652160 | consumed tokens: 21815623680 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 3.045370E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.507 | TFLOPs: 31.98 | +7: iteration 41620/ 173500 | consumed samples: 10654720 | consumed tokens: 21820866560 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 3.031988E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.573 | TFLOPs: 31.77 | +7: iteration 41630/ 173500 | consumed samples: 10657280 | consumed tokens: 21826109440 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 3.036896E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.382 | TFLOPs: 31.97 | +7: iteration 41640/ 173500 | consumed samples: 10659840 | consumed tokens: 21831352320 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 3.040325E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.420 | TFLOPs: 31.98 | +7: iteration 41650/ 173500 | consumed samples: 10662400 | consumed tokens: 21836595200 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 3.043718E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.165 | TFLOPs: 31.96 | +7: iteration 41660/ 173500 | consumed samples: 10664960 | consumed tokens: 21841838080 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 3.041129E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.366 | TFLOPs: 31.97 | +7: iteration 41670/ 173500 | consumed samples: 10667520 | consumed tokens: 21847080960 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.050434E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.479 | TFLOPs: 31.98 | +7: iteration 41680/ 173500 | consumed samples: 10670080 | consumed tokens: 21852323840 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.047943E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.376 | TFLOPs: 31.97 | +7: iteration 41690/ 173500 | consumed samples: 10672640 | consumed tokens: 21857566720 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.062769E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.449 | TFLOPs: 31.71 | +7: iteration 41700/ 173500 | consumed samples: 10675200 | consumed tokens: 21862809600 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.041172E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.993 | TFLOPs: 31.85 | +7: iteration 41710/ 173500 | consumed samples: 10677760 | consumed tokens: 21868052480 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.035760E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.569 | TFLOPs: 31.98 | +7: iteration 41720/ 173500 | consumed samples: 10680320 | consumed tokens: 21873295360 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.036894E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.240 | TFLOPs: 31.97 | +7: iteration 41730/ 173500 | consumed samples: 10682880 | consumed tokens: 21878538240 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.041746E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.425 | TFLOPs: 31.98 | +7: iteration 41740/ 173500 | consumed samples: 10685440 | consumed tokens: 21883781120 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.038793E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.472 | TFLOPs: 31.98 | +7: iteration 41750/ 173500 | consumed samples: 10688000 | consumed tokens: 21889024000 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.041421E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.988 | TFLOPs: 31.95 | +7: iteration 41760/ 173500 | consumed samples: 10690560 | consumed tokens: 21894266880 | elapsed time per iteration (s): 0.42 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 3.054809E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.858 | TFLOPs: 31.95 | +7: iteration 41770/ 173500 | consumed samples: 10693120 | consumed tokens: 21899509760 | elapsed time per iteration (s): 0.42 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 3.049758E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.838 | TFLOPs: 31.94 | +7: iteration 41780/ 173500 | consumed samples: 10695680 | consumed tokens: 21904752640 | elapsed time per iteration (s): 0.42 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 3.038669E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.190 | TFLOPs: 31.70 | +7: iteration 41790/ 173500 | consumed samples: 10698240 | consumed tokens: 21909995520 | elapsed time per iteration (s): 0.42 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 3.057763E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.263 | TFLOPs: 31.97 | +7: iteration 41800/ 173500 | consumed samples: 10700800 | consumed tokens: 21915238400 | elapsed time per iteration (s): 0.42 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 3.029845E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.674 | TFLOPs: 31.83 | +7: iteration 41810/ 173500 | consumed samples: 10703360 | consumed tokens: 21920481280 | elapsed time per iteration (s): 0.42 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 3.046432E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.375 | TFLOPs: 31.97 | +7: iteration 41820/ 173500 | consumed samples: 10705920 | consumed tokens: 21925724160 | elapsed time per iteration (s): 0.42 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 3.048480E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.451 | TFLOPs: 31.98 | +7: iteration 41830/ 173500 | consumed samples: 10708480 | consumed tokens: 21930967040 | elapsed time per iteration (s): 0.42 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 3.045572E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.407 | TFLOPs: 31.97 | +7: iteration 41840/ 173500 | consumed samples: 10711040 | consumed tokens: 21936209920 | elapsed time per iteration (s): 0.42 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 3.040576E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.098 | TFLOPs: 31.96 | +7: iteration 41850/ 173500 | consumed samples: 10713600 | consumed tokens: 21941452800 | elapsed time per iteration (s): 0.42 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.056840E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.038 | TFLOPs: 31.90 | +7: iteration 41860/ 173500 | consumed samples: 10716160 | consumed tokens: 21946695680 | elapsed time per iteration (s): 0.42 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.035646E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.915 | TFLOPs: 31.90 | +7: iteration 41870/ 173500 | consumed samples: 10718720 | consumed tokens: 21951938560 | elapsed time per iteration (s): 0.42 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.038697E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.751 | TFLOPs: 31.68 | +7: iteration 41880/ 173500 | consumed samples: 10721280 | consumed tokens: 21957181440 | elapsed time per iteration (s): 0.42 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.049130E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.772 | TFLOPs: 31.94 | +7: iteration 41890/ 173500 | consumed samples: 10723840 | consumed tokens: 21962424320 | elapsed time per iteration (s): 0.42 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.048166E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.364 | TFLOPs: 31.97 | +7: iteration 41900/ 173500 | consumed samples: 10726400 | consumed tokens: 21967667200 | elapsed time per iteration (s): 0.42 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.045145E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.487 | TFLOPs: 31.98 | +7: iteration 41910/ 173500 | consumed samples: 10728960 | consumed tokens: 21972910080 | elapsed time per iteration (s): 0.42 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.065628E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.461 | TFLOPs: 31.77 | +7: iteration 41920/ 173500 | consumed samples: 10731520 | consumed tokens: 21978152960 | elapsed time per iteration (s): 0.42 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.041127E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.582 | TFLOPs: 31.67 | +7: iteration 41930/ 173500 | consumed samples: 10734080 | consumed tokens: 21983395840 | elapsed time per iteration (s): 0.43 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.052935E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.589 | TFLOPs: 31.46 | +7: iteration 41940/ 173500 | consumed samples: 10736640 | consumed tokens: 21988638720 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.047281E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.243 | TFLOPs: 32.02 | +7: iteration 41950/ 173500 | consumed samples: 10739200 | consumed tokens: 21993881600 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.050170E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.452 | TFLOPs: 31.98 | +7: iteration 41960/ 173500 | consumed samples: 10741760 | consumed tokens: 21999124480 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.050097E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.722 | TFLOPs: 31.99 | +7: iteration 41970/ 173500 | consumed samples: 10744320 | consumed tokens: 22004367360 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.043641E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.291 | TFLOPs: 31.97 | +7: iteration 41980/ 173500 | consumed samples: 10746880 | consumed tokens: 22009610240 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.059355E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.628 | TFLOPs: 31.99 | +7: iteration 41990/ 173500 | consumed samples: 10749440 | consumed tokens: 22014853120 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.052669E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.402 | TFLOPs: 31.97 | +0: [2023-03-17 04:10:28,134] [INFO] [logging.py:68:log_dist] [Rank 0] step=42000, skipped=0, lr=[0.00017667737143212697, 0.00017667737143212697, 0.00017667737143212697], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 42000/ 173500 | consumed samples: 10752000 | consumed tokens: 22020096000 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.043455E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.822 | TFLOPs: 32.00 | +0: steps: 42000 loss: 3.0279 iter time (s): 0.422 samples/sec: 606.748 +7: iteration 42010/ 173500 | consumed samples: 10754560 | consumed tokens: 22025338880 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.049450E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.865 | TFLOPs: 31.89 | +7: iteration 42020/ 173500 | consumed samples: 10757120 | consumed tokens: 22030581760 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.049535E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.329 | TFLOPs: 31.97 | +7: iteration 42030/ 173500 | consumed samples: 10759680 | consumed tokens: 22035824640 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.050141E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.370 | TFLOPs: 31.97 | +7: iteration 42040/ 173500 | consumed samples: 10762240 | consumed tokens: 22041067520 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.042726E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.433 | TFLOPs: 31.98 | +7: iteration 42050/ 173500 | consumed samples: 10764800 | consumed tokens: 22046310400 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.048915E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.646 | TFLOPs: 31.99 | +7: iteration 42060/ 173500 | consumed samples: 10767360 | consumed tokens: 22051553280 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.052312E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.471 | TFLOPs: 31.98 | +7: iteration 42070/ 173500 | consumed samples: 10769920 | consumed tokens: 22056796160 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.031445E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.392 | TFLOPs: 31.97 | +7: iteration 42080/ 173500 | consumed samples: 10772480 | consumed tokens: 22062039040 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.040054E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.607 | TFLOPs: 31.99 | +7: iteration 42090/ 173500 | consumed samples: 10775040 | consumed tokens: 22067281920 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.041878E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.511 | TFLOPs: 31.98 | +7: iteration 42100/ 173500 | consumed samples: 10777600 | consumed tokens: 22072524800 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.047017E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.128 | TFLOPs: 31.96 | +7: iteration 42110/ 173500 | consumed samples: 10780160 | consumed tokens: 22077767680 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.032914E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.535 | TFLOPs: 31.98 | +7: iteration 42120/ 173500 | consumed samples: 10782720 | consumed tokens: 22083010560 | elapsed time per iteration (s): 0.42 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.045241E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.620 | TFLOPs: 31.99 | +7: iteration 42130/ 173500 | consumed samples: 10785280 | consumed tokens: 22088253440 | elapsed time per iteration (s): 0.42 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.039321E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.410 | TFLOPs: 31.97 | +7: iteration 42140/ 173500 | consumed samples: 10787840 | consumed tokens: 22093496320 | elapsed time per iteration (s): 0.42 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.043159E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.541 | TFLOPs: 31.98 | +7: iteration 42150/ 173500 | consumed samples: 10790400 | consumed tokens: 22098739200 | elapsed time per iteration (s): 0.42 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.045976E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.304 | TFLOPs: 31.97 | +7: iteration 42160/ 173500 | consumed samples: 10792960 | consumed tokens: 22103982080 | elapsed time per iteration (s): 0.42 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.040516E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.566 | TFLOPs: 31.88 | +7: iteration 42170/ 173500 | consumed samples: 10795520 | consumed tokens: 22109224960 | elapsed time per iteration (s): 0.42 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.037086E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.481 | TFLOPs: 31.61 | +7: iteration 42180/ 173500 | consumed samples: 10798080 | consumed tokens: 22114467840 | elapsed time per iteration (s): 0.42 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.045688E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.266 | TFLOPs: 32.02 | +7: iteration 42190/ 173500 | consumed samples: 10800640 | consumed tokens: 22119710720 | elapsed time per iteration (s): 0.42 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.048657E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.685 | TFLOPs: 31.99 | +7: iteration 42200/ 173500 | consumed samples: 10803200 | consumed tokens: 22124953600 | elapsed time per iteration (s): 0.42 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.042628E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.738 | TFLOPs: 31.99 | +7: iteration 42210/ 173500 | consumed samples: 10805760 | consumed tokens: 22130196480 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 3.038918E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.575 | TFLOPs: 31.98 | +7: iteration 42220/ 173500 | consumed samples: 10808320 | consumed tokens: 22135439360 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 3.053885E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.690 | TFLOPs: 31.88 | +7: iteration 42230/ 173500 | consumed samples: 10810880 | consumed tokens: 22140682240 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 3.046984E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.567 | TFLOPs: 31.62 | +7: iteration 42240/ 173500 | consumed samples: 10813440 | consumed tokens: 22145925120 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 3.052023E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.542 | TFLOPs: 31.98 | +7: iteration 42250/ 173500 | consumed samples: 10816000 | consumed tokens: 22151168000 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 3.043891E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.186 | TFLOPs: 31.96 | +7: iteration 42260/ 173500 | consumed samples: 10818560 | consumed tokens: 22156410880 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 3.042306E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.411 | TFLOPs: 31.71 | +7: iteration 42270/ 173500 | consumed samples: 10821120 | consumed tokens: 22161653760 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 3.046453E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.573 | TFLOPs: 31.98 | +7: iteration 42280/ 173500 | consumed samples: 10823680 | consumed tokens: 22166896640 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 3.035013E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.419 | TFLOPs: 31.98 | +7: iteration 42290/ 173500 | consumed samples: 10826240 | consumed tokens: 22172139520 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 3.039244E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.433 | TFLOPs: 31.77 | +7: iteration 42300/ 173500 | consumed samples: 10828800 | consumed tokens: 22177382400 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.046098E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.835 | TFLOPs: 31.73 | +7: iteration 42310/ 173500 | consumed samples: 10831360 | consumed tokens: 22182625280 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.040693E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.269 | TFLOPs: 31.97 | +7: iteration 42320/ 173500 | consumed samples: 10833920 | consumed tokens: 22187868160 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.040052E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.809 | TFLOPs: 31.79 | +7: iteration 42330/ 173500 | consumed samples: 10836480 | consumed tokens: 22193111040 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.035244E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.450 | TFLOPs: 31.98 | +7: iteration 42340/ 173500 | consumed samples: 10839040 | consumed tokens: 22198353920 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.041314E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.336 | TFLOPs: 31.97 | +7: iteration 42350/ 173500 | consumed samples: 10841600 | consumed tokens: 22203596800 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.033527E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.106 | TFLOPs: 31.96 | +7: iteration 42360/ 173500 | consumed samples: 10844160 | consumed tokens: 22208839680 | elapsed time per iteration (s): 0.43 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.036363E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.371 | TFLOPs: 31.08 | +7: iteration 42370/ 173500 | consumed samples: 10846720 | consumed tokens: 22214082560 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.046813E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.621 | TFLOPs: 31.99 | +7: iteration 42380/ 173500 | consumed samples: 10849280 | consumed tokens: 22219325440 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.036650E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.704 | TFLOPs: 31.99 | +7: iteration 42390/ 173500 | consumed samples: 10851840 | consumed tokens: 22224568320 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.048335E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.252 | TFLOPs: 31.76 | +7: iteration 42400/ 173500 | consumed samples: 10854400 | consumed tokens: 22229811200 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.045736E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.182 | TFLOPs: 31.96 | +7: iteration 42410/ 173500 | consumed samples: 10856960 | consumed tokens: 22235054080 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.041670E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.045 | TFLOPs: 31.96 | +7: iteration 42420/ 173500 | consumed samples: 10859520 | consumed tokens: 22240296960 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.029780E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.828 | TFLOPs: 31.94 | +7: iteration 42430/ 173500 | consumed samples: 10862080 | consumed tokens: 22245539840 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.040036E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.899 | TFLOPs: 31.95 | +7: iteration 42440/ 173500 | consumed samples: 10864640 | consumed tokens: 22250782720 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.052096E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.977 | TFLOPs: 31.95 | +7: iteration 42450/ 173500 | consumed samples: 10867200 | consumed tokens: 22256025600 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.052989E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.698 | TFLOPs: 31.94 | +7: iteration 42460/ 173500 | consumed samples: 10869760 | consumed tokens: 22261268480 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.037450E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.128 | TFLOPs: 31.96 | +7: iteration 42470/ 173500 | consumed samples: 10872320 | consumed tokens: 22266511360 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.030888E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.518 | TFLOPs: 31.88 | +7: iteration 42480/ 173500 | consumed samples: 10874880 | consumed tokens: 22271754240 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.034389E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.178 | TFLOPs: 31.96 | +7: iteration 42490/ 173500 | consumed samples: 10877440 | consumed tokens: 22276997120 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.049461E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.508 | TFLOPs: 31.98 | +7: iteration 42500/ 173500 | consumed samples: 10880000 | consumed tokens: 22282240000 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.033307E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.486 | TFLOPs: 31.98 | +7: iteration 42510/ 173500 | consumed samples: 10882560 | consumed tokens: 22287482880 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.040242E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.048 | TFLOPs: 31.96 | +7: iteration 42520/ 173500 | consumed samples: 10885120 | consumed tokens: 22292725760 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.033581E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.917 | TFLOPs: 31.95 | +7: iteration 42530/ 173500 | consumed samples: 10887680 | consumed tokens: 22297968640 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.039683E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.169 | TFLOPs: 31.96 | +7: iteration 42540/ 173500 | consumed samples: 10890240 | consumed tokens: 22303211520 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.037907E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.774 | TFLOPs: 31.94 | +7: iteration 42550/ 173500 | consumed samples: 10892800 | consumed tokens: 22308454400 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.051423E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.333 | TFLOPs: 31.97 | +7: iteration 42560/ 173500 | consumed samples: 10895360 | consumed tokens: 22313697280 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.050927E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.329 | TFLOPs: 31.97 | +7: iteration 42570/ 173500 | consumed samples: 10897920 | consumed tokens: 22318940160 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.043533E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.789 | TFLOPs: 31.94 | +7: iteration 42580/ 173500 | consumed samples: 10900480 | consumed tokens: 22324183040 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.054940E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.073 | TFLOPs: 31.96 | +7: iteration 42590/ 173500 | consumed samples: 10903040 | consumed tokens: 22329425920 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.057169E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.198 | TFLOPs: 31.96 | +7: iteration 42600/ 173500 | consumed samples: 10905600 | consumed tokens: 22334668800 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.041457E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.170 | TFLOPs: 31.96 | +7: iteration 42610/ 173500 | consumed samples: 10908160 | consumed tokens: 22339911680 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.047379E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.435 | TFLOPs: 31.98 | +7: iteration 42620/ 173500 | consumed samples: 10910720 | consumed tokens: 22345154560 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.041338E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.876 | TFLOPs: 32.00 | +7: iteration 42630/ 173500 | consumed samples: 10913280 | consumed tokens: 22350397440 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.034194E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.518 | TFLOPs: 31.98 | +7: iteration 42640/ 173500 | consumed samples: 10915840 | consumed tokens: 22355640320 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.034880E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.783 | TFLOPs: 31.99 | +7: iteration 42650/ 173500 | consumed samples: 10918400 | consumed tokens: 22360883200 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.035653E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.648 | TFLOPs: 31.99 | +7: iteration 42660/ 173500 | consumed samples: 10920960 | consumed tokens: 22366126080 | elapsed time per iteration (s): 0.42 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 3.034217E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.342 | TFLOPs: 31.97 | +7: iteration 42670/ 173500 | consumed samples: 10923520 | consumed tokens: 22371368960 | elapsed time per iteration (s): 0.42 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 3.052497E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.992 | TFLOPs: 31.95 | +7: iteration 42680/ 173500 | consumed samples: 10926080 | consumed tokens: 22376611840 | elapsed time per iteration (s): 0.42 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 3.021869E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.701 | TFLOPs: 31.73 | +7: iteration 42690/ 173500 | consumed samples: 10928640 | consumed tokens: 22381854720 | elapsed time per iteration (s): 0.42 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 3.043246E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.381 | TFLOPs: 31.97 | +7: iteration 42700/ 173500 | consumed samples: 10931200 | consumed tokens: 22387097600 | elapsed time per iteration (s): 0.42 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 3.032570E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.343 | TFLOPs: 31.97 | +7: iteration 42710/ 173500 | consumed samples: 10933760 | consumed tokens: 22392340480 | elapsed time per iteration (s): 0.43 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 3.038873E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.311 | TFLOPs: 31.55 | +7: iteration 42720/ 173500 | consumed samples: 10936320 | consumed tokens: 22397583360 | elapsed time per iteration (s): 0.42 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 3.035505E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.835 | TFLOPs: 32.00 | +7: iteration 42730/ 173500 | consumed samples: 10938880 | consumed tokens: 22402826240 | elapsed time per iteration (s): 0.42 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 3.046202E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.460 | TFLOPs: 31.98 | +7: iteration 42740/ 173500 | consumed samples: 10941440 | consumed tokens: 22408069120 | elapsed time per iteration (s): 0.43 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 3.051493E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.229 | TFLOPs: 31.44 | +7: iteration 42750/ 173500 | consumed samples: 10944000 | consumed tokens: 22413312000 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.049898E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.186 | TFLOPs: 31.54 | +7: iteration 42760/ 173500 | consumed samples: 10946560 | consumed tokens: 22418554880 | elapsed time per iteration (s): 0.42 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.049983E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.936 | TFLOPs: 32.00 | +7: iteration 42770/ 173500 | consumed samples: 10949120 | consumed tokens: 22423797760 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.054018E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.684 | TFLOPs: 31.57 | +7: iteration 42780/ 173500 | consumed samples: 10951680 | consumed tokens: 22429040640 | elapsed time per iteration (s): 0.42 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.028497E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.886 | TFLOPs: 32.00 | +7: iteration 42790/ 173500 | consumed samples: 10954240 | consumed tokens: 22434283520 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.045684E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.048 | TFLOPs: 31.59 | +7: iteration 42800/ 173500 | consumed samples: 10956800 | consumed tokens: 22439526400 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.045829E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.235 | TFLOPs: 31.60 | +7: iteration 42810/ 173500 | consumed samples: 10959360 | consumed tokens: 22444769280 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.046980E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.700 | TFLOPs: 31.57 | +7: iteration 42820/ 173500 | consumed samples: 10961920 | consumed tokens: 22450012160 | elapsed time per iteration (s): 0.42 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.023706E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.867 | TFLOPs: 32.00 | +7: iteration 42830/ 173500 | consumed samples: 10964480 | consumed tokens: 22455255040 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.029742E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.046 | TFLOPs: 31.12 | +7: iteration 42840/ 173500 | consumed samples: 10967040 | consumed tokens: 22460497920 | elapsed time per iteration (s): 0.44 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 3.050790E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.862 | TFLOPs: 30.74 | +7: iteration 42850/ 173500 | consumed samples: 10969600 | consumed tokens: 22465740800 | elapsed time per iteration (s): 0.43 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 3.035986E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.165 | TFLOPs: 31.28 | +7: iteration 42860/ 173500 | consumed samples: 10972160 | consumed tokens: 22470983680 | elapsed time per iteration (s): 0.44 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 3.048646E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.271 | TFLOPs: 30.66 | +7: iteration 42870/ 173500 | consumed samples: 10974720 | consumed tokens: 22476226560 | elapsed time per iteration (s): 0.44 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 3.046232E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.870 | TFLOPs: 30.79 | +7: iteration 42880/ 173500 | consumed samples: 10977280 | consumed tokens: 22481469440 | elapsed time per iteration (s): 0.45 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 3.044139E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.828 | TFLOPs: 29.79 | +7: iteration 42890/ 173500 | consumed samples: 10979840 | consumed tokens: 22486712320 | elapsed time per iteration (s): 0.44 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 3.039268E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.277 | TFLOPs: 30.29 | +7: iteration 42900/ 173500 | consumed samples: 10982400 | consumed tokens: 22491955200 | elapsed time per iteration (s): 0.45 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 3.040042E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.700 | TFLOPs: 29.94 | +7: iteration 42910/ 173500 | consumed samples: 10984960 | consumed tokens: 22497198080 | elapsed time per iteration (s): 0.45 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 3.045928E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.968 | TFLOPs: 29.91 | +7: iteration 42920/ 173500 | consumed samples: 10987520 | consumed tokens: 22502440960 | elapsed time per iteration (s): 0.44 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 3.051865E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.098 | TFLOPs: 30.49 | +7: iteration 42930/ 173500 | consumed samples: 10990080 | consumed tokens: 22507683840 | elapsed time per iteration (s): 0.44 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.034890E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.242 | TFLOPs: 30.76 | +7: iteration 42940/ 173500 | consumed samples: 10992640 | consumed tokens: 22512926720 | elapsed time per iteration (s): 0.42 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.045723E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.433 | TFLOPs: 32.13 | +7: iteration 42950/ 173500 | consumed samples: 10995200 | consumed tokens: 22518169600 | elapsed time per iteration (s): 0.42 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.028320E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.819 | TFLOPs: 32.10 | +7: iteration 42960/ 173500 | consumed samples: 10997760 | consumed tokens: 22523412480 | elapsed time per iteration (s): 0.42 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.041837E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.103 | TFLOPs: 32.06 | +7: iteration 42970/ 173500 | consumed samples: 11000320 | consumed tokens: 22528655360 | elapsed time per iteration (s): 0.42 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.035989E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.518 | TFLOPs: 32.03 | +7: iteration 42980/ 173500 | consumed samples: 11002880 | consumed tokens: 22533898240 | elapsed time per iteration (s): 0.42 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.042737E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.627 | TFLOPs: 32.04 | +7: iteration 42990/ 173500 | consumed samples: 11005440 | consumed tokens: 22539141120 | elapsed time per iteration (s): 0.42 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.038667E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.899 | TFLOPs: 32.05 | +7: iteration 43000/ 173500 | consumed samples: 11008000 | consumed tokens: 22544384000 | elapsed time per iteration (s): 0.42 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.027581E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.726 | TFLOPs: 32.04 | +7: iteration 43010/ 173500 | consumed samples: 11010560 | consumed tokens: 22549626880 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.043274E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.376 | TFLOPs: 32.03 | +7: iteration 43020/ 173500 | consumed samples: 11013120 | consumed tokens: 22554869760 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.037526E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.290 | TFLOPs: 32.02 | +7: iteration 43030/ 173500 | consumed samples: 11015680 | consumed tokens: 22560112640 | elapsed time per iteration (s): 0.43 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.043626E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.897 | TFLOPs: 31.58 | +7: iteration 43040/ 173500 | consumed samples: 11018240 | consumed tokens: 22565355520 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.040263E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.227 | TFLOPs: 32.02 | +7: iteration 43050/ 173500 | consumed samples: 11020800 | consumed tokens: 22570598400 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.032510E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.415 | TFLOPs: 32.03 | +7: iteration 43060/ 173500 | consumed samples: 11023360 | consumed tokens: 22575841280 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.023881E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.153 | TFLOPs: 32.01 | +7: iteration 43070/ 173500 | consumed samples: 11025920 | consumed tokens: 22581084160 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.044331E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.548 | TFLOPs: 31.93 | +7: iteration 43080/ 173500 | consumed samples: 11028480 | consumed tokens: 22586327040 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.039796E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.987 | TFLOPs: 32.01 | +7: iteration 43090/ 173500 | consumed samples: 11031040 | consumed tokens: 22591569920 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.057264E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.901 | TFLOPs: 32.00 | +7: iteration 43100/ 173500 | consumed samples: 11033600 | consumed tokens: 22596812800 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.032388E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.148 | TFLOPs: 32.01 | +7: iteration 43110/ 173500 | consumed samples: 11036160 | consumed tokens: 22602055680 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.039880E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.001 | TFLOPs: 32.01 | +7: iteration 43120/ 173500 | consumed samples: 11038720 | consumed tokens: 22607298560 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.049185E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.950 | TFLOPs: 32.00 | +7: iteration 43130/ 173500 | consumed samples: 11041280 | consumed tokens: 22612541440 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.037390E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.319 | TFLOPs: 31.71 | +7: iteration 43140/ 173500 | consumed samples: 11043840 | consumed tokens: 22617784320 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.038702E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.515 | TFLOPs: 31.77 | +7: iteration 43150/ 173500 | consumed samples: 11046400 | consumed tokens: 22623027200 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.033265E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.173 | TFLOPs: 32.01 | +7: iteration 43160/ 173500 | consumed samples: 11048960 | consumed tokens: 22628270080 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.046419E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.348 | TFLOPs: 32.02 | +7: iteration 43170/ 173500 | consumed samples: 11051520 | consumed tokens: 22633512960 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.042679E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.947 | TFLOPs: 31.74 | +7: iteration 43180/ 173500 | consumed samples: 11054080 | consumed tokens: 22638755840 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.054372E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.378 | TFLOPs: 32.03 | +7: iteration 43190/ 173500 | consumed samples: 11056640 | consumed tokens: 22643998720 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.035985E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.762 | TFLOPs: 31.99 | +7: iteration 43200/ 173500 | consumed samples: 11059200 | consumed tokens: 22649241600 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.041137E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.276 | TFLOPs: 32.02 | +7: iteration 43210/ 173500 | consumed samples: 11061760 | consumed tokens: 22654484480 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.044738E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.326 | TFLOPs: 32.02 | +7: iteration 43220/ 173500 | consumed samples: 11064320 | consumed tokens: 22659727360 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.041411E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.200 | TFLOPs: 32.02 | +7: iteration 43230/ 173500 | consumed samples: 11066880 | consumed tokens: 22664970240 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.034214E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.008 | TFLOPs: 32.01 | +7: iteration 43240/ 173500 | consumed samples: 11069440 | consumed tokens: 22670213120 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.036504E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.352 | TFLOPs: 32.02 | +7: iteration 43250/ 173500 | consumed samples: 11072000 | consumed tokens: 22675456000 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.037106E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.178 | TFLOPs: 32.02 | +7: iteration 43260/ 173500 | consumed samples: 11074560 | consumed tokens: 22680698880 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.044171E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.849 | TFLOPs: 32.00 | +7: iteration 43270/ 173500 | consumed samples: 11077120 | consumed tokens: 22685941760 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.049439E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.817 | TFLOPs: 32.00 | +7: iteration 43280/ 173500 | consumed samples: 11079680 | consumed tokens: 22691184640 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 3.033466E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.605 | TFLOPs: 31.98 | +7: iteration 43290/ 173500 | consumed samples: 11082240 | consumed tokens: 22696427520 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 3.028083E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.633 | TFLOPs: 31.99 | +7: iteration 43300/ 173500 | consumed samples: 11084800 | consumed tokens: 22701670400 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 3.040579E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.438 | TFLOPs: 31.98 | +7: iteration 43310/ 173500 | consumed samples: 11087360 | consumed tokens: 22706913280 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 3.040851E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.387 | TFLOPs: 31.97 | +7: iteration 43320/ 173500 | consumed samples: 11089920 | consumed tokens: 22712156160 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 3.038815E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.477 | TFLOPs: 31.98 | +7: iteration 43330/ 173500 | consumed samples: 11092480 | consumed tokens: 22717399040 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 3.036380E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.524 | TFLOPs: 31.98 | +7: iteration 43340/ 173500 | consumed samples: 11095040 | consumed tokens: 22722641920 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 3.059691E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.543 | TFLOPs: 31.98 | +7: iteration 43350/ 173500 | consumed samples: 11097600 | consumed tokens: 22727884800 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 3.054740E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.293 | TFLOPs: 31.97 | +7: iteration 43360/ 173500 | consumed samples: 11100160 | consumed tokens: 22733127680 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 3.036645E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.555 | TFLOPs: 31.98 | +7: iteration 43370/ 173500 | consumed samples: 11102720 | consumed tokens: 22738370560 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.032370E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.531 | TFLOPs: 31.98 | +7: iteration 43380/ 173500 | consumed samples: 11105280 | consumed tokens: 22743613440 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.036302E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.463 | TFLOPs: 31.98 | +7: iteration 43390/ 173500 | consumed samples: 11107840 | consumed tokens: 22748856320 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.048114E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.444 | TFLOPs: 31.98 | +7: iteration 43400/ 173500 | consumed samples: 11110400 | consumed tokens: 22754099200 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.049895E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.639 | TFLOPs: 31.99 | +7: iteration 43410/ 173500 | consumed samples: 11112960 | consumed tokens: 22759342080 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.040798E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.467 | TFLOPs: 31.98 | +7: iteration 43420/ 173500 | consumed samples: 11115520 | consumed tokens: 22764584960 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.036065E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.650 | TFLOPs: 31.99 | +7: iteration 43430/ 173500 | consumed samples: 11118080 | consumed tokens: 22769827840 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.039543E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.011 | TFLOPs: 31.95 | +7: iteration 43440/ 173500 | consumed samples: 11120640 | consumed tokens: 22775070720 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.050909E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.125 | TFLOPs: 31.96 | +7: iteration 43450/ 173500 | consumed samples: 11123200 | consumed tokens: 22780313600 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.044327E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.418 | TFLOPs: 31.98 | +7: iteration 43460/ 173500 | consumed samples: 11125760 | consumed tokens: 22785556480 | elapsed time per iteration (s): 0.42 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.034770E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.350 | TFLOPs: 31.97 | +7: iteration 43470/ 173500 | consumed samples: 11128320 | consumed tokens: 22790799360 | elapsed time per iteration (s): 0.42 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.038878E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.320 | TFLOPs: 31.97 | +7: iteration 43480/ 173500 | consumed samples: 11130880 | consumed tokens: 22796042240 | elapsed time per iteration (s): 0.42 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.047155E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.842 | TFLOPs: 31.94 | +7: iteration 43490/ 173500 | consumed samples: 11133440 | consumed tokens: 22801285120 | elapsed time per iteration (s): 0.42 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.057314E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.829 | TFLOPs: 31.94 | +7: iteration 43500/ 173500 | consumed samples: 11136000 | consumed tokens: 22806528000 | elapsed time per iteration (s): 0.42 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.029322E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.249 | TFLOPs: 31.97 | +7: iteration 43510/ 173500 | consumed samples: 11138560 | consumed tokens: 22811770880 | elapsed time per iteration (s): 0.42 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.051244E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.921 | TFLOPs: 31.95 | +7: iteration 43520/ 173500 | consumed samples: 11141120 | consumed tokens: 22817013760 | elapsed time per iteration (s): 0.42 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.064695E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.327 | TFLOPs: 31.97 | +7: iteration 43530/ 173500 | consumed samples: 11143680 | consumed tokens: 22822256640 | elapsed time per iteration (s): 0.42 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.032178E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.148 | TFLOPs: 31.80 | +7: iteration 43540/ 173500 | consumed samples: 11146240 | consumed tokens: 22827499520 | elapsed time per iteration (s): 0.42 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.038479E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.941 | TFLOPs: 31.95 | +7: iteration 43550/ 173500 | consumed samples: 11148800 | consumed tokens: 22832742400 | elapsed time per iteration (s): 0.42 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.035906E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.601 | TFLOPs: 31.88 | +7: iteration 43560/ 173500 | consumed samples: 11151360 | consumed tokens: 22837985280 | elapsed time per iteration (s): 0.42 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.026907E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.213 | TFLOPs: 31.96 | +7: iteration 43570/ 173500 | consumed samples: 11153920 | consumed tokens: 22843228160 | elapsed time per iteration (s): 0.42 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.028503E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.285 | TFLOPs: 31.97 | +7: iteration 43580/ 173500 | consumed samples: 11156480 | consumed tokens: 22848471040 | elapsed time per iteration (s): 0.42 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.048325E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.191 | TFLOPs: 31.96 | +7: iteration 43590/ 173500 | consumed samples: 11159040 | consumed tokens: 22853713920 | elapsed time per iteration (s): 0.42 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.027598E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.139 | TFLOPs: 31.96 | +7: iteration 43600/ 173500 | consumed samples: 11161600 | consumed tokens: 22858956800 | elapsed time per iteration (s): 0.42 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.025809E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.368 | TFLOPs: 31.97 | +7: iteration 43610/ 173500 | consumed samples: 11164160 | consumed tokens: 22864199680 | elapsed time per iteration (s): 0.42 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.043123E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.374 | TFLOPs: 31.97 | +7: iteration 43620/ 173500 | consumed samples: 11166720 | consumed tokens: 22869442560 | elapsed time per iteration (s): 0.42 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.019473E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.144 | TFLOPs: 31.96 | +7: iteration 43630/ 173500 | consumed samples: 11169280 | consumed tokens: 22874685440 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.028013E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.119 | TFLOPs: 31.96 | +7: iteration 43640/ 173500 | consumed samples: 11171840 | consumed tokens: 22879928320 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.041298E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.097 | TFLOPs: 31.96 | +7: iteration 43650/ 173500 | consumed samples: 11174400 | consumed tokens: 22885171200 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.038061E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.118 | TFLOPs: 31.96 | +7: iteration 43660/ 173500 | consumed samples: 11176960 | consumed tokens: 22890414080 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.027838E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.284 | TFLOPs: 31.97 | +7: iteration 43670/ 173500 | consumed samples: 11179520 | consumed tokens: 22895656960 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.033574E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.464 | TFLOPs: 31.93 | +7: iteration 43680/ 173500 | consumed samples: 11182080 | consumed tokens: 22900899840 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.034813E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.347 | TFLOPs: 31.92 | +7: iteration 43690/ 173500 | consumed samples: 11184640 | consumed tokens: 22906142720 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.035529E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.280 | TFLOPs: 31.92 | +7: iteration 43700/ 173500 | consumed samples: 11187200 | consumed tokens: 22911385600 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.040379E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.309 | TFLOPs: 31.92 | +7: iteration 43710/ 173500 | consumed samples: 11189760 | consumed tokens: 22916628480 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.040684E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.831 | TFLOPs: 31.94 | +7: iteration 43720/ 173500 | consumed samples: 11192320 | consumed tokens: 22921871360 | elapsed time per iteration (s): 0.42 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.044212E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.688 | TFLOPs: 31.83 | +7: iteration 43730/ 173500 | consumed samples: 11194880 | consumed tokens: 22927114240 | elapsed time per iteration (s): 0.42 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.031350E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.774 | TFLOPs: 31.94 | +7: iteration 43740/ 173500 | consumed samples: 11197440 | consumed tokens: 22932357120 | elapsed time per iteration (s): 0.42 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.032407E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.444 | TFLOPs: 31.92 | +7: iteration 43750/ 173500 | consumed samples: 11200000 | consumed tokens: 22937600000 | elapsed time per iteration (s): 0.42 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.035173E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.541 | TFLOPs: 31.98 | +7: iteration 43760/ 173500 | consumed samples: 11202560 | consumed tokens: 22942842880 | elapsed time per iteration (s): 0.42 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.042220E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.435 | TFLOPs: 31.98 | +7: iteration 43770/ 173500 | consumed samples: 11205120 | consumed tokens: 22948085760 | elapsed time per iteration (s): 0.42 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.024794E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.291 | TFLOPs: 31.97 | +7: iteration 43780/ 173500 | consumed samples: 11207680 | consumed tokens: 22953328640 | elapsed time per iteration (s): 0.42 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.029314E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.123 | TFLOPs: 31.96 | +7: iteration 43790/ 173500 | consumed samples: 11210240 | consumed tokens: 22958571520 | elapsed time per iteration (s): 0.42 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.041309E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.272 | TFLOPs: 31.97 | +7: iteration 43800/ 173500 | consumed samples: 11212800 | consumed tokens: 22963814400 | elapsed time per iteration (s): 0.42 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.040708E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.238 | TFLOPs: 31.97 | +7: iteration 43810/ 173500 | consumed samples: 11215360 | consumed tokens: 22969057280 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.033731E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.224 | TFLOPs: 31.96 | +7: iteration 43820/ 173500 | consumed samples: 11217920 | consumed tokens: 22974300160 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.044940E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.347 | TFLOPs: 31.97 | +7: iteration 43830/ 173500 | consumed samples: 11220480 | consumed tokens: 22979543040 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.031052E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.977 | TFLOPs: 31.95 | +7: iteration 43840/ 173500 | consumed samples: 11223040 | consumed tokens: 22984785920 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.028133E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.184 | TFLOPs: 31.96 | +7: iteration 43850/ 173500 | consumed samples: 11225600 | consumed tokens: 22990028800 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.031890E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.987 | TFLOPs: 31.95 | +7: iteration 43860/ 173500 | consumed samples: 11228160 | consumed tokens: 22995271680 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.048453E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.062 | TFLOPs: 31.96 | +7: iteration 43870/ 173500 | consumed samples: 11230720 | consumed tokens: 23000514560 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.037830E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.022 | TFLOPs: 31.95 | +7: iteration 43880/ 173500 | consumed samples: 11233280 | consumed tokens: 23005757440 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.038671E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.010 | TFLOPs: 31.95 | +7: iteration 43890/ 173500 | consumed samples: 11235840 | consumed tokens: 23011000320 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 3.032304E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.188 | TFLOPs: 31.96 | +7: iteration 43900/ 173500 | consumed samples: 11238400 | consumed tokens: 23016243200 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 3.036805E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.856 | TFLOPs: 31.95 | +7: iteration 43910/ 173500 | consumed samples: 11240960 | consumed tokens: 23021486080 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 3.033914E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.356 | TFLOPs: 31.97 | +7: iteration 43920/ 173500 | consumed samples: 11243520 | consumed tokens: 23026728960 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 3.018134E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.690 | TFLOPs: 31.99 | +7: iteration 43930/ 173500 | consumed samples: 11246080 | consumed tokens: 23031971840 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 3.037713E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.585 | TFLOPs: 31.93 | +7: iteration 43940/ 173500 | consumed samples: 11248640 | consumed tokens: 23037214720 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 3.046189E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.350 | TFLOPs: 31.97 | +7: iteration 43950/ 173500 | consumed samples: 11251200 | consumed tokens: 23042457600 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 3.030619E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.616 | TFLOPs: 31.99 | +7: iteration 43960/ 173500 | consumed samples: 11253760 | consumed tokens: 23047700480 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 3.060077E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.964 | TFLOPs: 32.00 | +7: iteration 43970/ 173500 | consumed samples: 11256320 | consumed tokens: 23052943360 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 3.032788E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.266 | TFLOPs: 31.97 | +7: iteration 43980/ 173500 | consumed samples: 11258880 | consumed tokens: 23058186240 | elapsed time per iteration (s): 0.42 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.025288E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.909 | TFLOPs: 32.00 | +7: iteration 43990/ 173500 | consumed samples: 11261440 | consumed tokens: 23063429120 | elapsed time per iteration (s): 0.42 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.029980E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.275 | TFLOPs: 31.97 | +0: [2023-03-17 04:24:31,459] [INFO] [logging.py:68:log_dist] [Rank 0] step=44000, skipped=0, lr=[0.00017442202015704406, 0.00017442202015704406, 0.00017442202015704406], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 44000/ 173500 | consumed samples: 11264000 | consumed tokens: 23068672000 | elapsed time per iteration (s): 0.42 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.033641E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.538 | TFLOPs: 31.98 | +0: steps: 44000 loss: 2.9915 iter time (s): 0.419 samples/sec: 610.425 +7: iteration 44010/ 173500 | consumed samples: 11266560 | consumed tokens: 23073914880 | elapsed time per iteration (s): 0.42 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.021718E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.683 | TFLOPs: 31.88 | +7: iteration 44020/ 173500 | consumed samples: 11269120 | consumed tokens: 23079157760 | elapsed time per iteration (s): 0.42 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.041967E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.338 | TFLOPs: 31.97 | +7: iteration 44030/ 173500 | consumed samples: 11271680 | consumed tokens: 23084400640 | elapsed time per iteration (s): 0.42 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.024979E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.333 | TFLOPs: 31.97 | +7: iteration 44040/ 173500 | consumed samples: 11274240 | consumed tokens: 23089643520 | elapsed time per iteration (s): 0.42 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.030201E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.512 | TFLOPs: 31.93 | +7: iteration 44050/ 173500 | consumed samples: 11276800 | consumed tokens: 23094886400 | elapsed time per iteration (s): 0.42 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.041077E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.040 | TFLOPs: 31.96 | +7: iteration 44060/ 173500 | consumed samples: 11279360 | consumed tokens: 23100129280 | elapsed time per iteration (s): 0.42 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.025255E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.226 | TFLOPs: 31.97 | +7: iteration 44070/ 173500 | consumed samples: 11281920 | consumed tokens: 23105372160 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.037067E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.617 | TFLOPs: 31.93 | +7: iteration 44080/ 173500 | consumed samples: 11284480 | consumed tokens: 23110615040 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.024091E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.588 | TFLOPs: 31.93 | +7: iteration 44090/ 173500 | consumed samples: 11287040 | consumed tokens: 23115857920 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.053888E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.924 | TFLOPs: 31.95 | +7: iteration 44100/ 173500 | consumed samples: 11289600 | consumed tokens: 23121100800 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.041793E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.326 | TFLOPs: 31.97 | +7: iteration 44110/ 173500 | consumed samples: 11292160 | consumed tokens: 23126343680 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.037296E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.000 | TFLOPs: 32.01 | +7: iteration 44120/ 173500 | consumed samples: 11294720 | consumed tokens: 23131586560 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.026894E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.981 | TFLOPs: 32.00 | +7: iteration 44130/ 173500 | consumed samples: 11297280 | consumed tokens: 23136829440 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.040345E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.982 | TFLOPs: 32.00 | +7: iteration 44140/ 173500 | consumed samples: 11299840 | consumed tokens: 23142072320 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.030352E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.653 | TFLOPs: 31.99 | +7: iteration 44150/ 173500 | consumed samples: 11302400 | consumed tokens: 23147315200 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.034228E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.769 | TFLOPs: 31.99 | +7: iteration 44160/ 173500 | consumed samples: 11304960 | consumed tokens: 23152558080 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.035343E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.495 | TFLOPs: 31.66 | +7: iteration 44170/ 173500 | consumed samples: 11307520 | consumed tokens: 23157800960 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.037500E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.052 | TFLOPs: 32.01 | +7: iteration 44180/ 173500 | consumed samples: 11310080 | consumed tokens: 23163043840 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.029822E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.161 | TFLOPs: 32.01 | +7: iteration 44190/ 173500 | consumed samples: 11312640 | consumed tokens: 23168286720 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.045443E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.685 | TFLOPs: 31.99 | +7: iteration 44200/ 173500 | consumed samples: 11315200 | consumed tokens: 23173529600 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.026295E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.521 | TFLOPs: 31.98 | +7: iteration 44210/ 173500 | consumed samples: 11317760 | consumed tokens: 23178772480 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.040376E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.812 | TFLOPs: 32.00 | +7: iteration 44220/ 173500 | consumed samples: 11320320 | consumed tokens: 23184015360 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.037431E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.486 | TFLOPs: 31.98 | +7: iteration 44230/ 173500 | consumed samples: 11322880 | consumed tokens: 23189258240 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.036862E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.682 | TFLOPs: 31.99 | +7: iteration 44240/ 173500 | consumed samples: 11325440 | consumed tokens: 23194501120 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.038658E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.661 | TFLOPs: 31.99 | +7: iteration 44250/ 173500 | consumed samples: 11328000 | consumed tokens: 23199744000 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.038168E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.490 | TFLOPs: 31.98 | +7: iteration 44260/ 173500 | consumed samples: 11330560 | consumed tokens: 23204986880 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.037078E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.548 | TFLOPs: 31.98 | +7: iteration 44270/ 173500 | consumed samples: 11333120 | consumed tokens: 23210229760 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.047556E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.141 | TFLOPs: 31.96 | +7: iteration 44280/ 173500 | consumed samples: 11335680 | consumed tokens: 23215472640 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.030738E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.634 | TFLOPs: 31.99 | +7: iteration 44290/ 173500 | consumed samples: 11338240 | consumed tokens: 23220715520 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.020218E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.707 | TFLOPs: 31.99 | +7: iteration 44300/ 173500 | consumed samples: 11340800 | consumed tokens: 23225958400 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.043313E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.498 | TFLOPs: 31.98 | +7: iteration 44310/ 173500 | consumed samples: 11343360 | consumed tokens: 23231201280 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.027178E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.486 | TFLOPs: 31.98 | +7: iteration 44320/ 173500 | consumed samples: 11345920 | consumed tokens: 23236444160 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.048405E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.900 | TFLOPs: 31.95 | +7: iteration 44330/ 173500 | consumed samples: 11348480 | consumed tokens: 23241687040 | elapsed time per iteration (s): 0.42 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 3.044666E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.651 | TFLOPs: 31.99 | +7: iteration 44340/ 173500 | consumed samples: 11351040 | consumed tokens: 23246929920 | elapsed time per iteration (s): 0.42 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 3.035415E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.673 | TFLOPs: 31.99 | +7: iteration 44350/ 173500 | consumed samples: 11353600 | consumed tokens: 23252172800 | elapsed time per iteration (s): 0.42 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 3.036442E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.663 | TFLOPs: 31.99 | +7: iteration 44360/ 173500 | consumed samples: 11356160 | consumed tokens: 23257415680 | elapsed time per iteration (s): 0.42 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 3.039024E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.639 | TFLOPs: 31.99 | +7: iteration 44370/ 173500 | consumed samples: 11358720 | consumed tokens: 23262658560 | elapsed time per iteration (s): 0.42 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 3.034384E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.404 | TFLOPs: 31.97 | +7: iteration 44380/ 173500 | consumed samples: 11361280 | consumed tokens: 23267901440 | elapsed time per iteration (s): 0.42 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 3.030795E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.409 | TFLOPs: 31.97 | +7: iteration 44390/ 173500 | consumed samples: 11363840 | consumed tokens: 23273144320 | elapsed time per iteration (s): 0.42 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 3.034285E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.713 | TFLOPs: 31.99 | +7: iteration 44400/ 173500 | consumed samples: 11366400 | consumed tokens: 23278387200 | elapsed time per iteration (s): 0.42 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 3.032585E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.617 | TFLOPs: 31.99 | +7: iteration 44410/ 173500 | consumed samples: 11368960 | consumed tokens: 23283630080 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.024651E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.622 | TFLOPs: 31.99 | +7: iteration 44420/ 173500 | consumed samples: 11371520 | consumed tokens: 23288872960 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.036156E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.955 | TFLOPs: 31.95 | +7: iteration 44430/ 173500 | consumed samples: 11374080 | consumed tokens: 23294115840 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.032992E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.490 | TFLOPs: 31.98 | +7: iteration 44440/ 173500 | consumed samples: 11376640 | consumed tokens: 23299358720 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.042719E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.425 | TFLOPs: 31.98 | +7: iteration 44450/ 173500 | consumed samples: 11379200 | consumed tokens: 23304601600 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.030412E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.930 | TFLOPs: 32.00 | +7: iteration 44460/ 173500 | consumed samples: 11381760 | consumed tokens: 23309844480 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.045038E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.862 | TFLOPs: 32.00 | +7: iteration 44470/ 173500 | consumed samples: 11384320 | consumed tokens: 23315087360 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.036842E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.682 | TFLOPs: 31.99 | +7: iteration 44480/ 173500 | consumed samples: 11386880 | consumed tokens: 23320330240 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.037015E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.433 | TFLOPs: 31.98 | +7: iteration 44490/ 173500 | consumed samples: 11389440 | consumed tokens: 23325573120 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.042810E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.088 | TFLOPs: 31.96 | +7: iteration 44500/ 173500 | consumed samples: 11392000 | consumed tokens: 23330816000 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.025455E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.537 | TFLOPs: 31.93 | +7: iteration 44510/ 173500 | consumed samples: 11394560 | consumed tokens: 23336058880 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.049658E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.385 | TFLOPs: 31.97 | +7: iteration 44520/ 173500 | consumed samples: 11397120 | consumed tokens: 23341301760 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.012350E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.392 | TFLOPs: 31.97 | +7: iteration 44530/ 173500 | consumed samples: 11399680 | consumed tokens: 23346544640 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.039302E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.616 | TFLOPs: 31.99 | +7: iteration 44540/ 173500 | consumed samples: 11402240 | consumed tokens: 23351787520 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.017992E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.722 | TFLOPs: 31.99 | +7: iteration 44550/ 173500 | consumed samples: 11404800 | consumed tokens: 23357030400 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.032403E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.656 | TFLOPs: 31.99 | +7: iteration 44560/ 173500 | consumed samples: 11407360 | consumed tokens: 23362273280 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.041181E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.345 | TFLOPs: 31.97 | +7: iteration 44570/ 173500 | consumed samples: 11409920 | consumed tokens: 23367516160 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.018443E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.758 | TFLOPs: 31.99 | +7: iteration 44580/ 173500 | consumed samples: 11412480 | consumed tokens: 23372759040 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.036284E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.544 | TFLOPs: 31.98 | +7: iteration 44590/ 173500 | consumed samples: 11415040 | consumed tokens: 23378001920 | elapsed time per iteration (s): 0.42 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 3.037876E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.868 | TFLOPs: 32.00 | +7: iteration 44600/ 173500 | consumed samples: 11417600 | consumed tokens: 23383244800 | elapsed time per iteration (s): 0.42 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 3.029538E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.712 | TFLOPs: 31.99 | +7: iteration 44610/ 173500 | consumed samples: 11420160 | consumed tokens: 23388487680 | elapsed time per iteration (s): 0.42 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 3.045276E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.476 | TFLOPs: 31.98 | +7: iteration 44620/ 173500 | consumed samples: 11422720 | consumed tokens: 23393730560 | elapsed time per iteration (s): 0.42 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 3.042356E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.547 | TFLOPs: 31.98 | +7: iteration 44630/ 173500 | consumed samples: 11425280 | consumed tokens: 23398973440 | elapsed time per iteration (s): 0.42 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 3.043480E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.842 | TFLOPs: 32.00 | +7: iteration 44640/ 173500 | consumed samples: 11427840 | consumed tokens: 23404216320 | elapsed time per iteration (s): 0.42 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 3.031226E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.830 | TFLOPs: 32.00 | +7: iteration 44650/ 173500 | consumed samples: 11430400 | consumed tokens: 23409459200 | elapsed time per iteration (s): 0.42 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 3.036721E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.911 | TFLOPs: 32.00 | +7: iteration 44660/ 173500 | consumed samples: 11432960 | consumed tokens: 23414702080 | elapsed time per iteration (s): 0.42 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 3.020561E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.431 | TFLOPs: 31.98 | +7: iteration 44670/ 173500 | consumed samples: 11435520 | consumed tokens: 23419944960 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.031644E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.848 | TFLOPs: 32.00 | +7: iteration 44680/ 173500 | consumed samples: 11438080 | consumed tokens: 23425187840 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.032347E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.741 | TFLOPs: 31.99 | +7: iteration 44690/ 173500 | consumed samples: 11440640 | consumed tokens: 23430430720 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.041751E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.903 | TFLOPs: 32.00 | +7: iteration 44700/ 173500 | consumed samples: 11443200 | consumed tokens: 23435673600 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.034903E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.907 | TFLOPs: 32.00 | +7: iteration 44710/ 173500 | consumed samples: 11445760 | consumed tokens: 23440916480 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.029233E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.499 | TFLOPs: 31.98 | +7: iteration 44720/ 173500 | consumed samples: 11448320 | consumed tokens: 23446159360 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.032334E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.795 | TFLOPs: 31.99 | +7: iteration 44730/ 173500 | consumed samples: 11450880 | consumed tokens: 23451402240 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.026532E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.834 | TFLOPs: 31.94 | +7: iteration 44740/ 173500 | consumed samples: 11453440 | consumed tokens: 23456645120 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.043398E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.770 | TFLOPs: 31.94 | +7: iteration 44750/ 173500 | consumed samples: 11456000 | consumed tokens: 23461888000 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.029846E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.566 | TFLOPs: 31.93 | +7: iteration 44760/ 173500 | consumed samples: 11458560 | consumed tokens: 23467130880 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 3.030296E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.149 | TFLOPs: 31.91 | +7: iteration 44770/ 173500 | consumed samples: 11461120 | consumed tokens: 23472373760 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 3.038226E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.572 | TFLOPs: 31.93 | +7: iteration 44780/ 173500 | consumed samples: 11463680 | consumed tokens: 23477616640 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 3.026339E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.004 | TFLOPs: 31.95 | +7: iteration 44790/ 173500 | consumed samples: 11466240 | consumed tokens: 23482859520 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 3.042435E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.874 | TFLOPs: 31.95 | +7: iteration 44800/ 173500 | consumed samples: 11468800 | consumed tokens: 23488102400 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 3.036751E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.912 | TFLOPs: 31.95 | +7: iteration 44810/ 173500 | consumed samples: 11471360 | consumed tokens: 23493345280 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 3.039718E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.096 | TFLOPs: 31.91 | +7: iteration 44820/ 173500 | consumed samples: 11473920 | consumed tokens: 23498588160 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 3.037275E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.052 | TFLOPs: 31.96 | +7: iteration 44830/ 173500 | consumed samples: 11476480 | consumed tokens: 23503831040 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 3.033388E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.623 | TFLOPs: 31.93 | +7: iteration 44840/ 173500 | consumed samples: 11479040 | consumed tokens: 23509073920 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.025977E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.899 | TFLOPs: 31.95 | +7: iteration 44850/ 173500 | consumed samples: 11481600 | consumed tokens: 23514316800 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.034032E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.524 | TFLOPs: 31.93 | +7: iteration 44860/ 173500 | consumed samples: 11484160 | consumed tokens: 23519559680 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.035332E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.658 | TFLOPs: 31.94 | +7: iteration 44870/ 173500 | consumed samples: 11486720 | consumed tokens: 23524802560 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.040533E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.094 | TFLOPs: 31.96 | +7: iteration 44880/ 173500 | consumed samples: 11489280 | consumed tokens: 23530045440 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.027085E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.116 | TFLOPs: 31.96 | +7: iteration 44890/ 173500 | consumed samples: 11491840 | consumed tokens: 23535288320 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.025256E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.811 | TFLOPs: 31.94 | +7: iteration 44900/ 173500 | consumed samples: 11494400 | consumed tokens: 23540531200 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.026749E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.658 | TFLOPs: 31.94 | +7: iteration 44910/ 173500 | consumed samples: 11496960 | consumed tokens: 23545774080 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.049749E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.155 | TFLOPs: 31.96 | +7: iteration 44920/ 173500 | consumed samples: 11499520 | consumed tokens: 23551016960 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.028305E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.857 | TFLOPs: 31.95 | +7: iteration 44930/ 173500 | consumed samples: 11502080 | consumed tokens: 23556259840 | elapsed time per iteration (s): 0.42 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.016640E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.545 | TFLOPs: 31.93 | +7: iteration 44940/ 173500 | consumed samples: 11504640 | consumed tokens: 23561502720 | elapsed time per iteration (s): 0.42 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.042530E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.866 | TFLOPs: 31.95 | +7: iteration 44950/ 173500 | consumed samples: 11507200 | consumed tokens: 23566745600 | elapsed time per iteration (s): 0.42 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.027967E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.676 | TFLOPs: 31.94 | +7: iteration 44960/ 173500 | consumed samples: 11509760 | consumed tokens: 23571988480 | elapsed time per iteration (s): 0.42 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.032789E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.513 | TFLOPs: 31.93 | +7: iteration 44970/ 173500 | consumed samples: 11512320 | consumed tokens: 23577231360 | elapsed time per iteration (s): 0.42 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.031199E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.459 | TFLOPs: 31.92 | +7: iteration 44980/ 173500 | consumed samples: 11514880 | consumed tokens: 23582474240 | elapsed time per iteration (s): 0.42 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.036980E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.491 | TFLOPs: 31.87 | +7: iteration 44990/ 173500 | consumed samples: 11517440 | consumed tokens: 23587717120 | elapsed time per iteration (s): 0.42 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.031967E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.803 | TFLOPs: 31.94 | +7: iteration 45000/ 173500 | consumed samples: 11520000 | consumed tokens: 23592960000 | elapsed time per iteration (s): 0.42 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.041166E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.565 | TFLOPs: 31.93 | +7: iteration 45010/ 173500 | consumed samples: 11522560 | consumed tokens: 23598202880 | elapsed time per iteration (s): 0.42 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.028786E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.165 | TFLOPs: 31.96 | +7: iteration 45020/ 173500 | consumed samples: 11525120 | consumed tokens: 23603445760 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.038615E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.258 | TFLOPs: 31.97 | +7: iteration 45030/ 173500 | consumed samples: 11527680 | consumed tokens: 23608688640 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.035782E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.256 | TFLOPs: 31.97 | +7: iteration 45040/ 173500 | consumed samples: 11530240 | consumed tokens: 23613931520 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.029908E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.174 | TFLOPs: 31.96 | +7: iteration 45050/ 173500 | consumed samples: 11532800 | consumed tokens: 23619174400 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.040468E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.975 | TFLOPs: 31.95 | +7: iteration 45060/ 173500 | consumed samples: 11535360 | consumed tokens: 23624417280 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.036808E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.939 | TFLOPs: 31.95 | +7: iteration 45070/ 173500 | consumed samples: 11537920 | consumed tokens: 23629660160 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.032573E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.812 | TFLOPs: 31.94 | +7: iteration 45080/ 173500 | consumed samples: 11540480 | consumed tokens: 23634903040 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.031070E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.695 | TFLOPs: 31.94 | +7: iteration 45090/ 173500 | consumed samples: 11543040 | consumed tokens: 23640145920 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.034755E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.570 | TFLOPs: 31.93 | +7: iteration 45100/ 173500 | consumed samples: 11545600 | consumed tokens: 23645388800 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.030555E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.711 | TFLOPs: 31.94 | +7: iteration 45110/ 173500 | consumed samples: 11548160 | consumed tokens: 23650631680 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.046778E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.230 | TFLOPs: 31.97 | +7: iteration 45120/ 173500 | consumed samples: 11550720 | consumed tokens: 23655874560 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.031484E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.866 | TFLOPs: 31.95 | +7: iteration 45130/ 173500 | consumed samples: 11553280 | consumed tokens: 23661117440 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.031591E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.007 | TFLOPs: 31.95 | +7: iteration 45140/ 173500 | consumed samples: 11555840 | consumed tokens: 23666360320 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.050947E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.499 | TFLOPs: 31.98 | +7: iteration 45150/ 173500 | consumed samples: 11558400 | consumed tokens: 23671603200 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.020515E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.578 | TFLOPs: 31.98 | +7: iteration 45160/ 173500 | consumed samples: 11560960 | consumed tokens: 23676846080 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.033418E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.017 | TFLOPs: 31.95 | +7: iteration 45170/ 173500 | consumed samples: 11563520 | consumed tokens: 23682088960 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.027835E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.232 | TFLOPs: 31.97 | +7: iteration 45180/ 173500 | consumed samples: 11566080 | consumed tokens: 23687331840 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.036653E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.118 | TFLOPs: 31.96 | +7: iteration 45190/ 173500 | consumed samples: 11568640 | consumed tokens: 23692574720 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.036411E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.168 | TFLOPs: 31.96 | +7: iteration 45200/ 173500 | consumed samples: 11571200 | consumed tokens: 23697817600 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.035357E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.749 | TFLOPs: 31.94 | +7: iteration 45210/ 173500 | consumed samples: 11573760 | consumed tokens: 23703060480 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.031534E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.528 | TFLOPs: 31.93 | +7: iteration 45220/ 173500 | consumed samples: 11576320 | consumed tokens: 23708303360 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.044385E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.905 | TFLOPs: 31.95 | +7: iteration 45230/ 173500 | consumed samples: 11578880 | consumed tokens: 23713546240 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.040190E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.790 | TFLOPs: 31.94 | +7: iteration 45240/ 173500 | consumed samples: 11581440 | consumed tokens: 23718789120 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.047963E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.884 | TFLOPs: 31.95 | +7: iteration 45250/ 173500 | consumed samples: 11584000 | consumed tokens: 23724032000 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.026142E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.897 | TFLOPs: 31.95 | +7: iteration 45260/ 173500 | consumed samples: 11586560 | consumed tokens: 23729274880 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.040901E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.705 | TFLOPs: 31.94 | +7: iteration 45270/ 173500 | consumed samples: 11589120 | consumed tokens: 23734517760 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.032171E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.814 | TFLOPs: 31.94 | +7: iteration 45280/ 173500 | consumed samples: 11591680 | consumed tokens: 23739760640 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.036630E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.367 | TFLOPs: 31.92 | +7: iteration 45290/ 173500 | consumed samples: 11594240 | consumed tokens: 23745003520 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.033504E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.133 | TFLOPs: 31.96 | +7: iteration 45300/ 173500 | consumed samples: 11596800 | consumed tokens: 23750246400 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.033980E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.955 | TFLOPs: 31.95 | +7: iteration 45310/ 173500 | consumed samples: 11599360 | consumed tokens: 23755489280 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.036705E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.061 | TFLOPs: 31.96 | +7: iteration 45320/ 173500 | consumed samples: 11601920 | consumed tokens: 23760732160 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.031810E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.144 | TFLOPs: 31.96 | +7: iteration 45330/ 173500 | consumed samples: 11604480 | consumed tokens: 23765975040 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.040881E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.073 | TFLOPs: 31.96 | +7: iteration 45340/ 173500 | consumed samples: 11607040 | consumed tokens: 23771217920 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.034563E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.042 | TFLOPs: 31.96 | +7: iteration 45350/ 173500 | consumed samples: 11609600 | consumed tokens: 23776460800 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.035444E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.428 | TFLOPs: 31.98 | +7: iteration 45360/ 173500 | consumed samples: 11612160 | consumed tokens: 23781703680 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 3.037609E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.929 | TFLOPs: 31.95 | +7: iteration 45370/ 173500 | consumed samples: 11614720 | consumed tokens: 23786946560 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 3.029708E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.198 | TFLOPs: 31.96 | +7: iteration 45380/ 173500 | consumed samples: 11617280 | consumed tokens: 23792189440 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 3.038130E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.419 | TFLOPs: 31.92 | +7: iteration 45390/ 173500 | consumed samples: 11619840 | consumed tokens: 23797432320 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 3.038467E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.722 | TFLOPs: 31.94 | +7: iteration 45400/ 173500 | consumed samples: 11622400 | consumed tokens: 23802675200 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 3.020585E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.924 | TFLOPs: 31.95 | +7: iteration 45410/ 173500 | consumed samples: 11624960 | consumed tokens: 23807918080 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 3.027697E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.303 | TFLOPs: 31.92 | +7: iteration 45420/ 173500 | consumed samples: 11627520 | consumed tokens: 23813160960 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 3.035508E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.882 | TFLOPs: 31.95 | +7: iteration 45430/ 173500 | consumed samples: 11630080 | consumed tokens: 23818403840 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 3.022128E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.520 | TFLOPs: 31.93 | +7: iteration 45440/ 173500 | consumed samples: 11632640 | consumed tokens: 23823646720 | elapsed time per iteration (s): 0.42 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.020785E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.448 | TFLOPs: 31.92 | +7: iteration 45450/ 173500 | consumed samples: 11635200 | consumed tokens: 23828889600 | elapsed time per iteration (s): 0.42 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.033745E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.448 | TFLOPs: 31.87 | +7: iteration 45460/ 173500 | consumed samples: 11637760 | consumed tokens: 23834132480 | elapsed time per iteration (s): 0.42 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.044635E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.067 | TFLOPs: 31.96 | +7: iteration 45470/ 173500 | consumed samples: 11640320 | consumed tokens: 23839375360 | elapsed time per iteration (s): 0.42 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.033568E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.234 | TFLOPs: 31.91 | +7: iteration 45480/ 173500 | consumed samples: 11642880 | consumed tokens: 23844618240 | elapsed time per iteration (s): 0.42 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.042956E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.558 | TFLOPs: 31.93 | +7: iteration 45490/ 173500 | consumed samples: 11645440 | consumed tokens: 23849861120 | elapsed time per iteration (s): 0.43 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.037610E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.989 | TFLOPs: 31.38 | +7: iteration 45500/ 173500 | consumed samples: 11648000 | consumed tokens: 23855104000 | elapsed time per iteration (s): 0.43 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.018620E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.618 | TFLOPs: 31.51 | +7: iteration 45510/ 173500 | consumed samples: 11650560 | consumed tokens: 23860346880 | elapsed time per iteration (s): 0.42 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.033488E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.749 | TFLOPs: 31.94 | +7: iteration 45520/ 173500 | consumed samples: 11653120 | consumed tokens: 23865589760 | elapsed time per iteration (s): 0.42 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.018414E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.641 | TFLOPs: 31.93 | +7: iteration 45530/ 173500 | consumed samples: 11655680 | consumed tokens: 23870832640 | elapsed time per iteration (s): 0.42 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 3.039864E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.501 | TFLOPs: 31.87 | +7: iteration 45540/ 173500 | consumed samples: 11658240 | consumed tokens: 23876075520 | elapsed time per iteration (s): 0.43 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 3.038412E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.021 | TFLOPs: 31.06 | +7: iteration 45550/ 173500 | consumed samples: 11660800 | consumed tokens: 23881318400 | elapsed time per iteration (s): 0.43 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 3.033052E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.074 | TFLOPs: 31.17 | +7: iteration 45560/ 173500 | consumed samples: 11663360 | consumed tokens: 23886561280 | elapsed time per iteration (s): 0.42 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 3.024797E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.467 | TFLOPs: 31.98 | +7: iteration 45570/ 173500 | consumed samples: 11665920 | consumed tokens: 23891804160 | elapsed time per iteration (s): 0.43 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 3.015835E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.475 | TFLOPs: 31.40 | +7: iteration 45580/ 173500 | consumed samples: 11668480 | consumed tokens: 23897047040 | elapsed time per iteration (s): 0.43 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 3.022084E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.951 | TFLOPs: 31.53 | +7: iteration 45590/ 173500 | consumed samples: 11671040 | consumed tokens: 23902289920 | elapsed time per iteration (s): 0.42 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 3.031033E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.442 | TFLOPs: 31.98 | +7: iteration 45600/ 173500 | consumed samples: 11673600 | consumed tokens: 23907532800 | elapsed time per iteration (s): 0.45 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 3.039748E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.341 | TFLOPs: 29.87 | +7: iteration 45610/ 173500 | consumed samples: 11676160 | consumed tokens: 23912775680 | elapsed time per iteration (s): 0.44 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.026946E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.928 | TFLOPs: 30.69 | +7: iteration 45620/ 173500 | consumed samples: 11678720 | consumed tokens: 23918018560 | elapsed time per iteration (s): 0.43 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.034584E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.056 | TFLOPs: 31.17 | +7: iteration 45630/ 173500 | consumed samples: 11681280 | consumed tokens: 23923261440 | elapsed time per iteration (s): 0.46 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.050420E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.495 | TFLOPs: 29.09 | +7: iteration 45640/ 173500 | consumed samples: 11683840 | consumed tokens: 23928504320 | elapsed time per iteration (s): 0.45 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.025619E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.663 | TFLOPs: 29.89 | +7: iteration 45650/ 173500 | consumed samples: 11686400 | consumed tokens: 23933747200 | elapsed time per iteration (s): 0.46 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.037487E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.118 | TFLOPs: 29.34 | +7: iteration 45660/ 173500 | consumed samples: 11688960 | consumed tokens: 23938990080 | elapsed time per iteration (s): 0.46 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.030014E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.369 | TFLOPs: 29.51 | +7: iteration 45670/ 173500 | consumed samples: 11691520 | consumed tokens: 23944232960 | elapsed time per iteration (s): 0.45 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.027221E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.404 | TFLOPs: 29.77 | +7: iteration 45680/ 173500 | consumed samples: 11694080 | consumed tokens: 23949475840 | elapsed time per iteration (s): 0.46 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.023704E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.357 | TFLOPs: 29.14 | +7: iteration 45690/ 173500 | consumed samples: 11696640 | consumed tokens: 23954718720 | elapsed time per iteration (s): 0.45 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.030212E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.747 | TFLOPs: 30.05 | +7: iteration 45700/ 173500 | consumed samples: 11699200 | consumed tokens: 23959961600 | elapsed time per iteration (s): 0.43 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.029758E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.008 | TFLOPs: 31.32 | +7: iteration 45710/ 173500 | consumed samples: 11701760 | consumed tokens: 23965204480 | elapsed time per iteration (s): 0.42 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.023277E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.334 | TFLOPs: 31.71 | +7: iteration 45720/ 173500 | consumed samples: 11704320 | consumed tokens: 23970447360 | elapsed time per iteration (s): 0.45 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.037152E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.465 | TFLOPs: 29.72 | +7: iteration 45730/ 173500 | consumed samples: 11706880 | consumed tokens: 23975690240 | elapsed time per iteration (s): 0.45 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.034234E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.445 | TFLOPs: 29.93 | +7: iteration 45740/ 173500 | consumed samples: 11709440 | consumed tokens: 23980933120 | elapsed time per iteration (s): 0.43 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.022474E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.545 | TFLOPs: 30.93 | +7: iteration 45750/ 173500 | consumed samples: 11712000 | consumed tokens: 23986176000 | elapsed time per iteration (s): 0.44 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.032970E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.672 | TFLOPs: 30.68 | +7: iteration 45760/ 173500 | consumed samples: 11714560 | consumed tokens: 23991418880 | elapsed time per iteration (s): 0.45 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.033575E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.912 | TFLOPs: 29.74 | +7: iteration 45770/ 173500 | consumed samples: 11717120 | consumed tokens: 23996661760 | elapsed time per iteration (s): 0.43 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.044150E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.255 | TFLOPs: 31.39 | +7: iteration 45780/ 173500 | consumed samples: 11719680 | consumed tokens: 24001904640 | elapsed time per iteration (s): 0.44 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.031539E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.212 | TFLOPs: 30.60 | +7: iteration 45790/ 173500 | consumed samples: 11722240 | consumed tokens: 24007147520 | elapsed time per iteration (s): 0.44 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.023028E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.949 | TFLOPs: 30.64 | +7: iteration 45800/ 173500 | consumed samples: 11724800 | consumed tokens: 24012390400 | elapsed time per iteration (s): 0.43 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.035986E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.633 | TFLOPs: 30.88 | +7: iteration 45810/ 173500 | consumed samples: 11727360 | consumed tokens: 24017633280 | elapsed time per iteration (s): 0.44 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.034841E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.921 | TFLOPs: 30.79 | +7: iteration 45820/ 173500 | consumed samples: 11729920 | consumed tokens: 24022876160 | elapsed time per iteration (s): 0.44 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.036466E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.297 | TFLOPs: 30.71 | +7: iteration 45830/ 173500 | consumed samples: 11732480 | consumed tokens: 24028119040 | elapsed time per iteration (s): 0.43 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.039328E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.004 | TFLOPs: 31.32 | +7: iteration 45840/ 173500 | consumed samples: 11735040 | consumed tokens: 24033361920 | elapsed time per iteration (s): 0.42 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.034179E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.603 | TFLOPs: 31.67 | +7: iteration 45850/ 173500 | consumed samples: 11737600 | consumed tokens: 24038604800 | elapsed time per iteration (s): 0.44 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.016945E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.064 | TFLOPs: 30.33 | +7: iteration 45860/ 173500 | consumed samples: 11740160 | consumed tokens: 24043847680 | elapsed time per iteration (s): 0.43 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.034142E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.292 | TFLOPs: 31.23 | +7: iteration 45870/ 173500 | consumed samples: 11742720 | consumed tokens: 24049090560 | elapsed time per iteration (s): 0.44 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.038640E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.569 | TFLOPs: 30.41 | +7: iteration 45880/ 173500 | consumed samples: 11745280 | consumed tokens: 24054333440 | elapsed time per iteration (s): 0.43 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.029574E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.280 | TFLOPs: 31.23 | +7: iteration 45890/ 173500 | consumed samples: 11747840 | consumed tokens: 24059576320 | elapsed time per iteration (s): 0.43 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.028545E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.088 | TFLOPs: 31.01 | +7: iteration 45900/ 173500 | consumed samples: 11750400 | consumed tokens: 24064819200 | elapsed time per iteration (s): 0.44 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.038340E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.818 | TFLOPs: 30.79 | +7: iteration 45910/ 173500 | consumed samples: 11752960 | consumed tokens: 24070062080 | elapsed time per iteration (s): 0.43 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.033337E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.776 | TFLOPs: 30.89 | +7: iteration 45920/ 173500 | consumed samples: 11755520 | consumed tokens: 24075304960 | elapsed time per iteration (s): 0.43 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.019746E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.166 | TFLOPs: 31.12 | +7: iteration 45930/ 173500 | consumed samples: 11758080 | consumed tokens: 24080547840 | elapsed time per iteration (s): 0.43 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.036164E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.500 | TFLOPs: 31.30 | +7: iteration 45940/ 173500 | consumed samples: 11760640 | consumed tokens: 24085790720 | elapsed time per iteration (s): 0.43 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.028281E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.133 | TFLOPs: 31.28 | +7: iteration 45950/ 173500 | consumed samples: 11763200 | consumed tokens: 24091033600 | elapsed time per iteration (s): 0.42 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 3.029421E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.847 | TFLOPs: 31.68 | +7: iteration 45960/ 173500 | consumed samples: 11765760 | consumed tokens: 24096276480 | elapsed time per iteration (s): 0.44 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 3.031055E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.313 | TFLOPs: 30.55 | +7: iteration 45970/ 173500 | consumed samples: 11768320 | consumed tokens: 24101519360 | elapsed time per iteration (s): 0.43 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 3.021023E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.648 | TFLOPs: 31.10 | +7: iteration 45980/ 173500 | consumed samples: 11770880 | consumed tokens: 24106762240 | elapsed time per iteration (s): 0.43 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 3.026436E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.308 | TFLOPs: 31.29 | +7: iteration 45990/ 173500 | consumed samples: 11773440 | consumed tokens: 24112005120 | elapsed time per iteration (s): 0.43 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 3.032937E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.500 | TFLOPs: 31.51 | +0: [2023-03-17 04:38:39,859] [INFO] [logging.py:68:log_dist] [Rank 0] step=46000, skipped=0, lr=[0.00017208047558447097, 0.00017208047558447097, 0.00017208047558447097], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 46000/ 173500 | consumed samples: 11776000 | consumed tokens: 24117248000 | elapsed time per iteration (s): 0.43 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 3.021849E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.935 | TFLOPs: 31.37 | +0: steps: 46000 loss: 3.0077 iter time (s): 0.422 samples/sec: 607.030 +7: iteration 46010/ 173500 | consumed samples: 11778560 | consumed tokens: 24122490880 | elapsed time per iteration (s): 0.43 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 3.030062E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.707 | TFLOPs: 31.15 | +7: iteration 46020/ 173500 | consumed samples: 11781120 | consumed tokens: 24127733760 | elapsed time per iteration (s): 0.43 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 3.023639E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.253 | TFLOPs: 31.60 | +7: iteration 46030/ 173500 | consumed samples: 11783680 | consumed tokens: 24132976640 | elapsed time per iteration (s): 0.43 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.029805E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.440 | TFLOPs: 31.35 | +7: iteration 46040/ 173500 | consumed samples: 11786240 | consumed tokens: 24138219520 | elapsed time per iteration (s): 0.43 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.033716E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.382 | TFLOPs: 31.55 | +7: iteration 46050/ 173500 | consumed samples: 11788800 | consumed tokens: 24143462400 | elapsed time per iteration (s): 0.43 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.025466E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.970 | TFLOPs: 31.22 | +7: iteration 46060/ 173500 | consumed samples: 11791360 | consumed tokens: 24148705280 | elapsed time per iteration (s): 0.42 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.038898E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.391 | TFLOPs: 32.08 | +7: iteration 46070/ 173500 | consumed samples: 11793920 | consumed tokens: 24153948160 | elapsed time per iteration (s): 0.43 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.020415E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.003 | TFLOPs: 31.48 | +7: iteration 46080/ 173500 | consumed samples: 11796480 | consumed tokens: 24159191040 | elapsed time per iteration (s): 0.44 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.030231E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.747 | TFLOPs: 30.79 | +7: iteration 46090/ 173500 | consumed samples: 11799040 | consumed tokens: 24164433920 | elapsed time per iteration (s): 0.43 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.037976E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.947 | TFLOPs: 31.58 | +7: iteration 46100/ 173500 | consumed samples: 11801600 | consumed tokens: 24169676800 | elapsed time per iteration (s): 0.44 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.037027E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.125 | TFLOPs: 30.65 | +7: iteration 46110/ 173500 | consumed samples: 11804160 | consumed tokens: 24174919680 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.045284E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.779 | TFLOPs: 31.26 | +7: iteration 46120/ 173500 | consumed samples: 11806720 | consumed tokens: 24180162560 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.029145E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.789 | TFLOPs: 31.10 | +7: iteration 46130/ 173500 | consumed samples: 11809280 | consumed tokens: 24185405440 | elapsed time per iteration (s): 0.42 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.030100E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.830 | TFLOPs: 31.79 | +7: iteration 46140/ 173500 | consumed samples: 11811840 | consumed tokens: 24190648320 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.025023E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.412 | TFLOPs: 31.50 | +7: iteration 46150/ 173500 | consumed samples: 11814400 | consumed tokens: 24195891200 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.025316E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.438 | TFLOPs: 31.29 | +7: iteration 46160/ 173500 | consumed samples: 11816960 | consumed tokens: 24201134080 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.026418E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.697 | TFLOPs: 31.47 | +7: iteration 46170/ 173500 | consumed samples: 11819520 | consumed tokens: 24206376960 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.050616E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.940 | TFLOPs: 31.27 | +7: iteration 46180/ 173500 | consumed samples: 11822080 | consumed tokens: 24211619840 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.026031E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.807 | TFLOPs: 31.47 | +7: iteration 46190/ 173500 | consumed samples: 11824640 | consumed tokens: 24216862720 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.038067E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.376 | TFLOPs: 31.40 | +7: iteration 46200/ 173500 | consumed samples: 11827200 | consumed tokens: 24222105600 | elapsed time per iteration (s): 0.43 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.036476E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.681 | TFLOPs: 31.36 | +7: iteration 46210/ 173500 | consumed samples: 11829760 | consumed tokens: 24227348480 | elapsed time per iteration (s): 0.43 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.029055E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.414 | TFLOPs: 31.50 | +7: iteration 46220/ 173500 | consumed samples: 11832320 | consumed tokens: 24232591360 | elapsed time per iteration (s): 0.43 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.028786E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.539 | TFLOPs: 31.09 | +7: iteration 46230/ 173500 | consumed samples: 11834880 | consumed tokens: 24237834240 | elapsed time per iteration (s): 0.44 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.029080E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.628 | TFLOPs: 30.52 | +7: iteration 46240/ 173500 | consumed samples: 11837440 | consumed tokens: 24243077120 | elapsed time per iteration (s): 0.42 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.026095E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.574 | TFLOPs: 31.67 | +7: iteration 46250/ 173500 | consumed samples: 11840000 | consumed tokens: 24248320000 | elapsed time per iteration (s): 0.43 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.045910E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.505 | TFLOPs: 31.46 | +7: iteration 46260/ 173500 | consumed samples: 11842560 | consumed tokens: 24253562880 | elapsed time per iteration (s): 0.44 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.037321E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.283 | TFLOPs: 30.60 | +7: iteration 46270/ 173500 | consumed samples: 11845120 | consumed tokens: 24258805760 | elapsed time per iteration (s): 0.42 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.023928E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.013 | TFLOPs: 31.95 | +7: iteration 46280/ 173500 | consumed samples: 11847680 | consumed tokens: 24264048640 | elapsed time per iteration (s): 0.43 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.030944E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.484 | TFLOPs: 30.93 | +7: iteration 46290/ 173500 | consumed samples: 11850240 | consumed tokens: 24269291520 | elapsed time per iteration (s): 0.43 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.034284E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.374 | TFLOPs: 31.45 | +7: iteration 46300/ 173500 | consumed samples: 11852800 | consumed tokens: 24274534400 | elapsed time per iteration (s): 0.44 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.036627E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.303 | TFLOPs: 30.45 | +7: iteration 46310/ 173500 | consumed samples: 11855360 | consumed tokens: 24279777280 | elapsed time per iteration (s): 0.43 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.029609E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.897 | TFLOPs: 31.58 | +7: iteration 46320/ 173500 | consumed samples: 11857920 | consumed tokens: 24285020160 | elapsed time per iteration (s): 0.44 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.021246E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.816 | TFLOPs: 30.63 | +7: iteration 46330/ 173500 | consumed samples: 11860480 | consumed tokens: 24290263040 | elapsed time per iteration (s): 0.43 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.043232E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.348 | TFLOPs: 31.45 | +7: iteration 46340/ 173500 | consumed samples: 11863040 | consumed tokens: 24295505920 | elapsed time per iteration (s): 0.44 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.022829E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.361 | TFLOPs: 30.56 | +7: iteration 46350/ 173500 | consumed samples: 11865600 | consumed tokens: 24300748800 | elapsed time per iteration (s): 0.44 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.022815E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.086 | TFLOPs: 30.38 | +7: iteration 46360/ 173500 | consumed samples: 11868160 | consumed tokens: 24305991680 | elapsed time per iteration (s): 0.43 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.031499E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.194 | TFLOPs: 31.07 | +7: iteration 46370/ 173500 | consumed samples: 11870720 | consumed tokens: 24311234560 | elapsed time per iteration (s): 0.42 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.030344E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.194 | TFLOPs: 31.91 | +7: iteration 46380/ 173500 | consumed samples: 11873280 | consumed tokens: 24316477440 | elapsed time per iteration (s): 0.42 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.020809E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.185 | TFLOPs: 31.70 | +7: iteration 46390/ 173500 | consumed samples: 11875840 | consumed tokens: 24321720320 | elapsed time per iteration (s): 0.43 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.018779E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.159 | TFLOPs: 31.07 | +7: iteration 46400/ 173500 | consumed samples: 11878400 | consumed tokens: 24326963200 | elapsed time per iteration (s): 0.42 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.043348E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.010 | TFLOPs: 31.64 | +7: iteration 46410/ 173500 | consumed samples: 11880960 | consumed tokens: 24332206080 | elapsed time per iteration (s): 0.43 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.027347E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.054 | TFLOPs: 31.06 | +7: iteration 46420/ 173500 | consumed samples: 11883520 | consumed tokens: 24337448960 | elapsed time per iteration (s): 0.42 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.034115E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.208 | TFLOPs: 31.96 | +7: iteration 46430/ 173500 | consumed samples: 11886080 | consumed tokens: 24342691840 | elapsed time per iteration (s): 0.44 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.033164E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.981 | TFLOPs: 30.43 | +7: iteration 46440/ 173500 | consumed samples: 11888640 | consumed tokens: 24347934720 | elapsed time per iteration (s): 0.44 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.036465E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.389 | TFLOPs: 30.66 | +7: iteration 46450/ 173500 | consumed samples: 11891200 | consumed tokens: 24353177600 | elapsed time per iteration (s): 0.43 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 3.035172E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.738 | TFLOPs: 31.52 | +7: iteration 46460/ 173500 | consumed samples: 11893760 | consumed tokens: 24358420480 | elapsed time per iteration (s): 0.44 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 3.044606E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.149 | TFLOPs: 30.70 | +7: iteration 46470/ 173500 | consumed samples: 11896320 | consumed tokens: 24363663360 | elapsed time per iteration (s): 0.42 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 3.039621E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.906 | TFLOPs: 31.79 | +7: iteration 46480/ 173500 | consumed samples: 11898880 | consumed tokens: 24368906240 | elapsed time per iteration (s): 0.42 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 3.022679E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.822 | TFLOPs: 31.84 | +7: iteration 46490/ 173500 | consumed samples: 11901440 | consumed tokens: 24374149120 | elapsed time per iteration (s): 0.43 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 3.027719E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.442 | TFLOPs: 31.29 | +7: iteration 46500/ 173500 | consumed samples: 11904000 | consumed tokens: 24379392000 | elapsed time per iteration (s): 0.44 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 3.027807E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.638 | TFLOPs: 30.83 | +7: iteration 46510/ 173500 | consumed samples: 11906560 | consumed tokens: 24384634880 | elapsed time per iteration (s): 0.42 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 3.025508E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.340 | TFLOPs: 31.92 | +7: iteration 46520/ 173500 | consumed samples: 11909120 | consumed tokens: 24389877760 | elapsed time per iteration (s): 0.43 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 3.026590E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.813 | TFLOPs: 31.10 | +7: iteration 46530/ 173500 | consumed samples: 11911680 | consumed tokens: 24395120640 | elapsed time per iteration (s): 0.44 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.033995E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.619 | TFLOPs: 30.62 | +7: iteration 46540/ 173500 | consumed samples: 11914240 | consumed tokens: 24400363520 | elapsed time per iteration (s): 0.44 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.022394E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.430 | TFLOPs: 30.66 | +7: iteration 46550/ 173500 | consumed samples: 11916800 | consumed tokens: 24405606400 | elapsed time per iteration (s): 0.44 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.033700E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.725 | TFLOPs: 30.63 | +7: iteration 46560/ 173500 | consumed samples: 11919360 | consumed tokens: 24410849280 | elapsed time per iteration (s): 0.44 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.030902E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.040 | TFLOPs: 30.22 | +7: iteration 46570/ 173500 | consumed samples: 11921920 | consumed tokens: 24416092160 | elapsed time per iteration (s): 0.43 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.037016E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.864 | TFLOPs: 31.00 | +7: iteration 46580/ 173500 | consumed samples: 11924480 | consumed tokens: 24421335040 | elapsed time per iteration (s): 0.46 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.021742E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.405 | TFLOPs: 29.30 | +7: iteration 46590/ 173500 | consumed samples: 11927040 | consumed tokens: 24426577920 | elapsed time per iteration (s): 0.42 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.027353E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.894 | TFLOPs: 31.69 | +7: iteration 46600/ 173500 | consumed samples: 11929600 | consumed tokens: 24431820800 | elapsed time per iteration (s): 0.43 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.040225E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.570 | TFLOPs: 31.30 | +7: iteration 46610/ 173500 | consumed samples: 11932160 | consumed tokens: 24437063680 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.020204E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.576 | TFLOPs: 31.46 | +7: iteration 46620/ 173500 | consumed samples: 11934720 | consumed tokens: 24442306560 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.039077E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.613 | TFLOPs: 31.51 | +7: iteration 46630/ 173500 | consumed samples: 11937280 | consumed tokens: 24447549440 | elapsed time per iteration (s): 0.44 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.027432E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.079 | TFLOPs: 30.70 | +7: iteration 46640/ 173500 | consumed samples: 11939840 | consumed tokens: 24452792320 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.026095E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.000 | TFLOPs: 31.53 | +7: iteration 46650/ 173500 | consumed samples: 11942400 | consumed tokens: 24458035200 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.039499E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.833 | TFLOPs: 31.16 | +7: iteration 46660/ 173500 | consumed samples: 11944960 | consumed tokens: 24463278080 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.033775E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.903 | TFLOPs: 30.95 | +7: iteration 46670/ 173500 | consumed samples: 11947520 | consumed tokens: 24468520960 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.030305E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.350 | TFLOPs: 31.45 | +7: iteration 46680/ 173500 | consumed samples: 11950080 | consumed tokens: 24473763840 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.022762E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.284 | TFLOPs: 31.39 | +7: iteration 46690/ 173500 | consumed samples: 11952640 | consumed tokens: 24479006720 | elapsed time per iteration (s): 0.42 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.032377E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.775 | TFLOPs: 31.63 | +7: iteration 46700/ 173500 | consumed samples: 11955200 | consumed tokens: 24484249600 | elapsed time per iteration (s): 0.43 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 3.038316E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.809 | TFLOPs: 30.95 | +7: iteration 46710/ 173500 | consumed samples: 11957760 | consumed tokens: 24489492480 | elapsed time per iteration (s): 0.43 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 3.033573E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.950 | TFLOPs: 31.06 | +7: iteration 46720/ 173500 | consumed samples: 11960320 | consumed tokens: 24494735360 | elapsed time per iteration (s): 0.43 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 3.025058E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.062 | TFLOPs: 31.48 | +7: iteration 46730/ 173500 | consumed samples: 11962880 | consumed tokens: 24499978240 | elapsed time per iteration (s): 0.43 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 3.029163E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.159 | TFLOPs: 31.54 | +7: iteration 46740/ 173500 | consumed samples: 11965440 | consumed tokens: 24505221120 | elapsed time per iteration (s): 0.42 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 3.026064E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.669 | TFLOPs: 31.67 | +7: iteration 46750/ 173500 | consumed samples: 11968000 | consumed tokens: 24510464000 | elapsed time per iteration (s): 0.42 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 3.027773E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.639 | TFLOPs: 31.93 | +7: iteration 46760/ 173500 | consumed samples: 11970560 | consumed tokens: 24515706880 | elapsed time per iteration (s): 0.42 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 3.025104E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.096 | TFLOPs: 32.06 | +7: iteration 46770/ 173500 | consumed samples: 11973120 | consumed tokens: 24520949760 | elapsed time per iteration (s): 0.42 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 3.032022E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.313 | TFLOPs: 31.71 | +7: iteration 46780/ 173500 | consumed samples: 11975680 | consumed tokens: 24526192640 | elapsed time per iteration (s): 0.43 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.029428E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.570 | TFLOPs: 31.46 | +7: iteration 46790/ 173500 | consumed samples: 11978240 | consumed tokens: 24531435520 | elapsed time per iteration (s): 0.42 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.032386E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.965 | TFLOPs: 32.06 | +7: iteration 46800/ 173500 | consumed samples: 11980800 | consumed tokens: 24536678400 | elapsed time per iteration (s): 0.42 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.033681E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.506 | TFLOPs: 31.87 | +7: iteration 46810/ 173500 | consumed samples: 11983360 | consumed tokens: 24541921280 | elapsed time per iteration (s): 0.42 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.039380E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.076 | TFLOPs: 31.85 | +7: iteration 46820/ 173500 | consumed samples: 11985920 | consumed tokens: 24547164160 | elapsed time per iteration (s): 0.42 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.034029E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.916 | TFLOPs: 31.79 | +7: iteration 46830/ 173500 | consumed samples: 11988480 | consumed tokens: 24552407040 | elapsed time per iteration (s): 0.43 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.007509E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.450 | TFLOPs: 31.19 | +7: iteration 46840/ 173500 | consumed samples: 11991040 | consumed tokens: 24557649920 | elapsed time per iteration (s): 0.42 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.038732E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.414 | TFLOPs: 31.66 | +7: iteration 46850/ 173500 | consumed samples: 11993600 | consumed tokens: 24562892800 | elapsed time per iteration (s): 0.42 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.037008E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.795 | TFLOPs: 31.84 | +7: iteration 46860/ 173500 | consumed samples: 11996160 | consumed tokens: 24568135680 | elapsed time per iteration (s): 0.43 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 3.039215E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.691 | TFLOPs: 31.46 | +7: iteration 46870/ 173500 | consumed samples: 11998720 | consumed tokens: 24573378560 | elapsed time per iteration (s): 0.43 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 3.025633E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.973 | TFLOPs: 31.43 | +7: iteration 46880/ 173500 | consumed samples: 12001280 | consumed tokens: 24578621440 | elapsed time per iteration (s): 0.42 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 3.041476E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.932 | TFLOPs: 31.63 | +7: iteration 46890/ 173500 | consumed samples: 12003840 | consumed tokens: 24583864320 | elapsed time per iteration (s): 0.42 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 3.023976E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.974 | TFLOPs: 31.85 | +7: iteration 46900/ 173500 | consumed samples: 12006400 | consumed tokens: 24589107200 | elapsed time per iteration (s): 0.43 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 3.037052E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.433 | TFLOPs: 31.45 | +7: iteration 46910/ 173500 | consumed samples: 12008960 | consumed tokens: 24594350080 | elapsed time per iteration (s): 0.42 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 3.035147E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.906 | TFLOPs: 32.05 | +7: iteration 46920/ 173500 | consumed samples: 12011520 | consumed tokens: 24599592960 | elapsed time per iteration (s): 0.43 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 3.019080E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.101 | TFLOPs: 31.33 | +7: iteration 46930/ 173500 | consumed samples: 12014080 | consumed tokens: 24604835840 | elapsed time per iteration (s): 0.42 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 3.024561E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.198 | TFLOPs: 31.65 | +7: iteration 46940/ 173500 | consumed samples: 12016640 | consumed tokens: 24610078720 | elapsed time per iteration (s): 0.42 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 3.027767E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.918 | TFLOPs: 32.05 | +7: iteration 46950/ 173500 | consumed samples: 12019200 | consumed tokens: 24615321600 | elapsed time per iteration (s): 0.42 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.030376E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.710 | TFLOPs: 31.73 | +7: iteration 46960/ 173500 | consumed samples: 12021760 | consumed tokens: 24620564480 | elapsed time per iteration (s): 0.43 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.021511E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.435 | TFLOPs: 31.35 | +7: iteration 46970/ 173500 | consumed samples: 12024320 | consumed tokens: 24625807360 | elapsed time per iteration (s): 0.42 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.045245E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.313 | TFLOPs: 31.76 | +7: iteration 46980/ 173500 | consumed samples: 12026880 | consumed tokens: 24631050240 | elapsed time per iteration (s): 0.42 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.027976E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.724 | TFLOPs: 31.78 | +7: iteration 46990/ 173500 | consumed samples: 12029440 | consumed tokens: 24636293120 | elapsed time per iteration (s): 0.42 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.025286E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.124 | TFLOPs: 31.64 | +7: iteration 47000/ 173500 | consumed samples: 12032000 | consumed tokens: 24641536000 | elapsed time per iteration (s): 0.42 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.028388E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.127 | TFLOPs: 31.80 | +7: iteration 47010/ 173500 | consumed samples: 12034560 | consumed tokens: 24646778880 | elapsed time per iteration (s): 0.42 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.051744E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.244 | TFLOPs: 31.65 | +7: iteration 47020/ 173500 | consumed samples: 12037120 | consumed tokens: 24652021760 | elapsed time per iteration (s): 0.43 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.040875E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.010 | TFLOPs: 31.01 | +7: iteration 47030/ 173500 | consumed samples: 12039680 | consumed tokens: 24657264640 | elapsed time per iteration (s): 0.43 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 3.029034E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.551 | TFLOPs: 31.35 | +7: iteration 47040/ 173500 | consumed samples: 12042240 | consumed tokens: 24662507520 | elapsed time per iteration (s): 0.42 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 3.017033E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.123 | TFLOPs: 31.91 | +7: iteration 47050/ 173500 | consumed samples: 12044800 | consumed tokens: 24667750400 | elapsed time per iteration (s): 0.42 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 3.034307E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.623 | TFLOPs: 32.04 | +7: iteration 47060/ 173500 | consumed samples: 12047360 | consumed tokens: 24672993280 | elapsed time per iteration (s): 0.42 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 3.031444E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.617 | TFLOPs: 31.88 | +7: iteration 47070/ 173500 | consumed samples: 12049920 | consumed tokens: 24678236160 | elapsed time per iteration (s): 0.42 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 3.016074E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.161 | TFLOPs: 31.75 | +7: iteration 47080/ 173500 | consumed samples: 12052480 | consumed tokens: 24683479040 | elapsed time per iteration (s): 0.42 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 3.024384E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.521 | TFLOPs: 32.03 | +7: iteration 47090/ 173500 | consumed samples: 12055040 | consumed tokens: 24688721920 | elapsed time per iteration (s): 0.42 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 3.018585E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.612 | TFLOPs: 31.83 | +7: iteration 47100/ 173500 | consumed samples: 12057600 | consumed tokens: 24693964800 | elapsed time per iteration (s): 0.42 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 3.033722E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.081 | TFLOPs: 31.64 | +7: iteration 47110/ 173500 | consumed samples: 12060160 | consumed tokens: 24699207680 | elapsed time per iteration (s): 0.42 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.028474E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.067 | TFLOPs: 31.69 | +7: iteration 47120/ 173500 | consumed samples: 12062720 | consumed tokens: 24704450560 | elapsed time per iteration (s): 0.43 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.030654E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.069 | TFLOPs: 30.96 | +7: iteration 47130/ 173500 | consumed samples: 12065280 | consumed tokens: 24709693440 | elapsed time per iteration (s): 0.42 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.039065E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.198 | TFLOPs: 31.65 | +7: iteration 47140/ 173500 | consumed samples: 12067840 | consumed tokens: 24714936320 | elapsed time per iteration (s): 0.43 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.028334E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.055 | TFLOPs: 31.54 | +7: iteration 47150/ 173500 | consumed samples: 12070400 | consumed tokens: 24720179200 | elapsed time per iteration (s): 0.42 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.013674E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.239 | TFLOPs: 31.76 | +7: iteration 47160/ 173500 | consumed samples: 12072960 | consumed tokens: 24725422080 | elapsed time per iteration (s): 0.43 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.023472E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.390 | TFLOPs: 31.08 | +7: iteration 47170/ 173500 | consumed samples: 12075520 | consumed tokens: 24730664960 | elapsed time per iteration (s): 0.44 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.023510E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.990 | TFLOPs: 30.85 | +7: iteration 47180/ 173500 | consumed samples: 12078080 | consumed tokens: 24735907840 | elapsed time per iteration (s): 0.42 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.016240E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.944 | TFLOPs: 31.69 | +7: iteration 47190/ 173500 | consumed samples: 12080640 | consumed tokens: 24741150720 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.011509E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.723 | TFLOPs: 31.47 | +7: iteration 47200/ 173500 | consumed samples: 12083200 | consumed tokens: 24746393600 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.028036E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.812 | TFLOPs: 31.58 | +7: iteration 47210/ 173500 | consumed samples: 12085760 | consumed tokens: 24751636480 | elapsed time per iteration (s): 0.42 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.039909E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.605 | TFLOPs: 31.83 | +7: iteration 47220/ 173500 | consumed samples: 12088320 | consumed tokens: 24756879360 | elapsed time per iteration (s): 0.42 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.021149E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.488 | TFLOPs: 31.61 | +7: iteration 47230/ 173500 | consumed samples: 12090880 | consumed tokens: 24762122240 | elapsed time per iteration (s): 0.42 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.016763E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.457 | TFLOPs: 32.03 | +7: iteration 47240/ 173500 | consumed samples: 12093440 | consumed tokens: 24767365120 | elapsed time per iteration (s): 0.42 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.020992E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.337 | TFLOPs: 31.76 | +7: iteration 47250/ 173500 | consumed samples: 12096000 | consumed tokens: 24772608000 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.033590E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.197 | TFLOPs: 31.33 | +7: iteration 47260/ 173500 | consumed samples: 12098560 | consumed tokens: 24777850880 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.031010E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.233 | TFLOPs: 31.02 | +7: iteration 47270/ 173500 | consumed samples: 12101120 | consumed tokens: 24783093760 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.029907E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.737 | TFLOPs: 31.41 | +7: iteration 47280/ 173500 | consumed samples: 12103680 | consumed tokens: 24788336640 | elapsed time per iteration (s): 0.43 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.034672E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.195 | TFLOPs: 30.97 | +7: iteration 47290/ 173500 | consumed samples: 12106240 | consumed tokens: 24793579520 | elapsed time per iteration (s): 0.42 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.026407E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.555 | TFLOPs: 31.67 | +7: iteration 47300/ 173500 | consumed samples: 12108800 | consumed tokens: 24798822400 | elapsed time per iteration (s): 0.43 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.030207E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.708 | TFLOPs: 31.57 | +7: iteration 47310/ 173500 | consumed samples: 12111360 | consumed tokens: 24804065280 | elapsed time per iteration (s): 0.43 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.013937E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.292 | TFLOPs: 30.92 | +7: iteration 47320/ 173500 | consumed samples: 12113920 | consumed tokens: 24809308160 | elapsed time per iteration (s): 0.45 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.033827E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.310 | TFLOPs: 30.08 | +7: iteration 47330/ 173500 | consumed samples: 12116480 | consumed tokens: 24814551040 | elapsed time per iteration (s): 0.44 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.031490E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.323 | TFLOPs: 30.19 | +7: iteration 47340/ 173500 | consumed samples: 12119040 | consumed tokens: 24819793920 | elapsed time per iteration (s): 0.44 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.024718E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.336 | TFLOPs: 30.66 | +7: iteration 47350/ 173500 | consumed samples: 12121600 | consumed tokens: 24825036800 | elapsed time per iteration (s): 0.44 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.037831E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.001 | TFLOPs: 30.54 | +7: iteration 47360/ 173500 | consumed samples: 12124160 | consumed tokens: 24830279680 | elapsed time per iteration (s): 0.43 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 3.023389E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.789 | TFLOPs: 31.47 | +7: iteration 47370/ 173500 | consumed samples: 12126720 | consumed tokens: 24835522560 | elapsed time per iteration (s): 0.43 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 3.018731E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.155 | TFLOPs: 31.44 | +7: iteration 47380/ 173500 | consumed samples: 12129280 | consumed tokens: 24840765440 | elapsed time per iteration (s): 0.42 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 3.034570E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.194 | TFLOPs: 31.86 | +7: iteration 47390/ 173500 | consumed samples: 12131840 | consumed tokens: 24846008320 | elapsed time per iteration (s): 0.43 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 3.042646E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.843 | TFLOPs: 31.42 | +7: iteration 47400/ 173500 | consumed samples: 12134400 | consumed tokens: 24851251200 | elapsed time per iteration (s): 0.43 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 3.027098E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.248 | TFLOPs: 31.49 | +7: iteration 47410/ 173500 | consumed samples: 12136960 | consumed tokens: 24856494080 | elapsed time per iteration (s): 0.43 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 3.038123E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.045 | TFLOPs: 31.54 | +7: iteration 47420/ 173500 | consumed samples: 12139520 | consumed tokens: 24861736960 | elapsed time per iteration (s): 0.43 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 3.015294E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.988 | TFLOPs: 31.32 | +7: iteration 47430/ 173500 | consumed samples: 12142080 | consumed tokens: 24866979840 | elapsed time per iteration (s): 0.43 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 3.005303E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.446 | TFLOPs: 31.14 | +7: iteration 47440/ 173500 | consumed samples: 12144640 | consumed tokens: 24872222720 | elapsed time per iteration (s): 0.42 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.038411E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.196 | TFLOPs: 32.07 | +7: iteration 47450/ 173500 | consumed samples: 12147200 | consumed tokens: 24877465600 | elapsed time per iteration (s): 0.42 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.018010E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.194 | TFLOPs: 31.91 | +7: iteration 47460/ 173500 | consumed samples: 12149760 | consumed tokens: 24882708480 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.027872E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.844 | TFLOPs: 31.37 | +7: iteration 47470/ 173500 | consumed samples: 12152320 | consumed tokens: 24887951360 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.024743E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.148 | TFLOPs: 31.38 | +7: iteration 47480/ 173500 | consumed samples: 12154880 | consumed tokens: 24893194240 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.038945E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.274 | TFLOPs: 31.34 | +7: iteration 47490/ 173500 | consumed samples: 12157440 | consumed tokens: 24898437120 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.010890E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.048 | TFLOPs: 31.59 | +7: iteration 47500/ 173500 | consumed samples: 12160000 | consumed tokens: 24903680000 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.037109E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.261 | TFLOPs: 31.34 | +7: iteration 47510/ 173500 | consumed samples: 12162560 | consumed tokens: 24908922880 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.038479E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.471 | TFLOPs: 31.40 | +7: iteration 47520/ 173500 | consumed samples: 12165120 | consumed tokens: 24914165760 | elapsed time per iteration (s): 0.43 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.020351E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.191 | TFLOPs: 31.23 | +7: iteration 47530/ 173500 | consumed samples: 12167680 | consumed tokens: 24919408640 | elapsed time per iteration (s): 0.42 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.017486E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.136 | TFLOPs: 31.86 | +7: iteration 47540/ 173500 | consumed samples: 12170240 | consumed tokens: 24924651520 | elapsed time per iteration (s): 0.43 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.033145E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.571 | TFLOPs: 31.41 | +7: iteration 47550/ 173500 | consumed samples: 12172800 | consumed tokens: 24929894400 | elapsed time per iteration (s): 0.45 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.025434E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.508 | TFLOPs: 30.04 | +7: iteration 47560/ 173500 | consumed samples: 12175360 | consumed tokens: 24935137280 | elapsed time per iteration (s): 0.45 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.026037E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.729 | TFLOPs: 29.74 | +7: iteration 47570/ 173500 | consumed samples: 12177920 | consumed tokens: 24940380160 | elapsed time per iteration (s): 0.42 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.024134E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.317 | TFLOPs: 32.13 | +7: iteration 47580/ 173500 | consumed samples: 12180480 | consumed tokens: 24945623040 | elapsed time per iteration (s): 0.43 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.025102E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.996 | TFLOPs: 31.59 | +7: iteration 47590/ 173500 | consumed samples: 12183040 | consumed tokens: 24950865920 | elapsed time per iteration (s): 0.42 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.022093E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.019 | TFLOPs: 31.90 | +7: iteration 47600/ 173500 | consumed samples: 12185600 | consumed tokens: 24956108800 | elapsed time per iteration (s): 0.42 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 3.030175E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.226 | TFLOPs: 31.65 | +7: iteration 47610/ 173500 | consumed samples: 12188160 | consumed tokens: 24961351680 | elapsed time per iteration (s): 0.42 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 3.021337E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.135 | TFLOPs: 31.86 | +7: iteration 47620/ 173500 | consumed samples: 12190720 | consumed tokens: 24966594560 | elapsed time per iteration (s): 0.43 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 3.020902E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.585 | TFLOPs: 30.99 | +7: iteration 47630/ 173500 | consumed samples: 12193280 | consumed tokens: 24971837440 | elapsed time per iteration (s): 0.43 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 3.021087E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.502 | TFLOPs: 31.30 | +7: iteration 47640/ 173500 | consumed samples: 12195840 | consumed tokens: 24977080320 | elapsed time per iteration (s): 0.43 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 3.028657E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.841 | TFLOPs: 31.47 | +7: iteration 47650/ 173500 | consumed samples: 12198400 | consumed tokens: 24982323200 | elapsed time per iteration (s): 0.43 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 3.022832E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.655 | TFLOPs: 31.46 | +7: iteration 47660/ 173500 | consumed samples: 12200960 | consumed tokens: 24987566080 | elapsed time per iteration (s): 0.43 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 3.029527E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.071 | TFLOPs: 31.43 | +7: iteration 47670/ 173500 | consumed samples: 12203520 | consumed tokens: 24992808960 | elapsed time per iteration (s): 0.42 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 3.013461E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.505 | TFLOPs: 32.08 | +7: iteration 47680/ 173500 | consumed samples: 12206080 | consumed tokens: 24998051840 | elapsed time per iteration (s): 0.42 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.028827E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.050 | TFLOPs: 31.85 | +7: iteration 47690/ 173500 | consumed samples: 12208640 | consumed tokens: 25003294720 | elapsed time per iteration (s): 0.43 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.034723E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.725 | TFLOPs: 31.41 | +7: iteration 47700/ 173500 | consumed samples: 12211200 | consumed tokens: 25008537600 | elapsed time per iteration (s): 0.42 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.043150E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.410 | TFLOPs: 32.03 | +7: iteration 47710/ 173500 | consumed samples: 12213760 | consumed tokens: 25013780480 | elapsed time per iteration (s): 0.43 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.012927E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.377 | TFLOPs: 31.45 | +7: iteration 47720/ 173500 | consumed samples: 12216320 | consumed tokens: 25019023360 | elapsed time per iteration (s): 0.43 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.016308E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.016 | TFLOPs: 30.90 | +7: iteration 47730/ 173500 | consumed samples: 12218880 | consumed tokens: 25024266240 | elapsed time per iteration (s): 0.43 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.031645E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.898 | TFLOPs: 31.48 | +7: iteration 47740/ 173500 | consumed samples: 12221440 | consumed tokens: 25029509120 | elapsed time per iteration (s): 0.43 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.021231E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.388 | TFLOPs: 30.98 | +7: iteration 47750/ 173500 | consumed samples: 12224000 | consumed tokens: 25034752000 | elapsed time per iteration (s): 0.42 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.030041E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.119 | TFLOPs: 31.80 | +7: iteration 47760/ 173500 | consumed samples: 12226560 | consumed tokens: 25039994880 | elapsed time per iteration (s): 0.42 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.013962E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.143 | TFLOPs: 32.01 | +7: iteration 47770/ 173500 | consumed samples: 12229120 | consumed tokens: 25045237760 | elapsed time per iteration (s): 0.43 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 3.025899E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.552 | TFLOPs: 31.56 | +7: iteration 47780/ 173500 | consumed samples: 12231680 | consumed tokens: 25050480640 | elapsed time per iteration (s): 0.43 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 3.024449E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.908 | TFLOPs: 31.53 | +7: iteration 47790/ 173500 | consumed samples: 12234240 | consumed tokens: 25055723520 | elapsed time per iteration (s): 0.43 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 3.034824E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.423 | TFLOPs: 31.45 | +7: iteration 47800/ 173500 | consumed samples: 12236800 | consumed tokens: 25060966400 | elapsed time per iteration (s): 0.42 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 3.008104E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.982 | TFLOPs: 31.85 | +7: iteration 47810/ 173500 | consumed samples: 12239360 | consumed tokens: 25066209280 | elapsed time per iteration (s): 0.42 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 3.021879E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.731 | TFLOPs: 31.68 | +7: iteration 47820/ 173500 | consumed samples: 12241920 | consumed tokens: 25071452160 | elapsed time per iteration (s): 0.43 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 3.024707E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.135 | TFLOPs: 31.54 | +7: iteration 47830/ 173500 | consumed samples: 12244480 | consumed tokens: 25076695040 | elapsed time per iteration (s): 0.43 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 3.019860E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.363 | TFLOPs: 31.45 | +7: iteration 47840/ 173500 | consumed samples: 12247040 | consumed tokens: 25081937920 | elapsed time per iteration (s): 0.42 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 3.021842E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.000 | TFLOPs: 31.80 | +7: iteration 47850/ 173500 | consumed samples: 12249600 | consumed tokens: 25087180800 | elapsed time per iteration (s): 0.43 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.023792E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.914 | TFLOPs: 31.11 | +7: iteration 47860/ 173500 | consumed samples: 12252160 | consumed tokens: 25092423680 | elapsed time per iteration (s): 0.44 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.030090E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.414 | TFLOPs: 30.40 | +7: iteration 47870/ 173500 | consumed samples: 12254720 | consumed tokens: 25097666560 | elapsed time per iteration (s): 0.42 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.036674E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.733 | TFLOPs: 31.73 | +7: iteration 47880/ 173500 | consumed samples: 12257280 | consumed tokens: 25102909440 | elapsed time per iteration (s): 0.42 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.018842E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.571 | TFLOPs: 31.62 | +7: iteration 47890/ 173500 | consumed samples: 12259840 | consumed tokens: 25108152320 | elapsed time per iteration (s): 0.42 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.035068E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.818 | TFLOPs: 32.00 | +7: iteration 47900/ 173500 | consumed samples: 12262400 | consumed tokens: 25113395200 | elapsed time per iteration (s): 0.42 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.022386E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.419 | TFLOPs: 31.66 | +7: iteration 47910/ 173500 | consumed samples: 12264960 | consumed tokens: 25118638080 | elapsed time per iteration (s): 0.43 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.021962E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.413 | TFLOPs: 30.93 | +7: iteration 47920/ 173500 | consumed samples: 12267520 | consumed tokens: 25123880960 | elapsed time per iteration (s): 0.42 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.026727E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.016 | TFLOPs: 31.74 | +7: iteration 47930/ 173500 | consumed samples: 12270080 | consumed tokens: 25129123840 | elapsed time per iteration (s): 0.43 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 3.009509E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.672 | TFLOPs: 31.25 | +7: iteration 47940/ 173500 | consumed samples: 12272640 | consumed tokens: 25134366720 | elapsed time per iteration (s): 0.42 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 3.030441E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.935 | TFLOPs: 31.69 | +7: iteration 47950/ 173500 | consumed samples: 12275200 | consumed tokens: 25139609600 | elapsed time per iteration (s): 0.43 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 3.015065E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.413 | TFLOPs: 31.45 | +7: iteration 47960/ 173500 | consumed samples: 12277760 | consumed tokens: 25144852480 | elapsed time per iteration (s): 0.43 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 3.031420E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.975 | TFLOPs: 31.53 | +7: iteration 47970/ 173500 | consumed samples: 12280320 | consumed tokens: 25150095360 | elapsed time per iteration (s): 0.44 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 3.019355E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.913 | TFLOPs: 30.37 | +7: iteration 47980/ 173500 | consumed samples: 12282880 | consumed tokens: 25155338240 | elapsed time per iteration (s): 0.42 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 3.026513E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.883 | TFLOPs: 31.63 | +7: iteration 47990/ 173500 | consumed samples: 12285440 | consumed tokens: 25160581120 | elapsed time per iteration (s): 0.42 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 3.028067E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.385 | TFLOPs: 31.61 | +0: [2023-03-17 04:52:55,802] [INFO] [logging.py:68:log_dist] [Rank 0] step=48000, skipped=0, lr=[0.00016965587057872074, 0.00016965587057872074, 0.00016965587057872074], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 48000/ 173500 | consumed samples: 12288000 | consumed tokens: 25165824000 | elapsed time per iteration (s): 0.43 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 3.011541E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.009 | TFLOPs: 31.53 | +0: steps: 48000 loss: 2.9954 iter time (s): 0.426 samples/sec: 601.354 +7: iteration 48010/ 173500 | consumed samples: 12290560 | consumed tokens: 25171066880 | elapsed time per iteration (s): 0.42 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.006409E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.478 | TFLOPs: 31.72 | +7: iteration 48020/ 173500 | consumed samples: 12293120 | consumed tokens: 25176309760 | elapsed time per iteration (s): 0.42 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.027094E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.316 | TFLOPs: 32.07 | +7: iteration 48030/ 173500 | consumed samples: 12295680 | consumed tokens: 25181552640 | elapsed time per iteration (s): 0.43 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.022703E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.843 | TFLOPs: 31.53 | +7: iteration 48040/ 173500 | consumed samples: 12298240 | consumed tokens: 25186795520 | elapsed time per iteration (s): 0.42 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.024535E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.856 | TFLOPs: 31.74 | +7: iteration 48050/ 173500 | consumed samples: 12300800 | consumed tokens: 25192038400 | elapsed time per iteration (s): 0.43 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.021649E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.690 | TFLOPs: 31.41 | +7: iteration 48060/ 173500 | consumed samples: 12303360 | consumed tokens: 25197281280 | elapsed time per iteration (s): 0.43 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.020951E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.588 | TFLOPs: 31.35 | +7: iteration 48070/ 173500 | consumed samples: 12305920 | consumed tokens: 25202524160 | elapsed time per iteration (s): 0.43 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.016024E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.906 | TFLOPs: 31.58 | +7: iteration 48080/ 173500 | consumed samples: 12308480 | consumed tokens: 25207767040 | elapsed time per iteration (s): 0.42 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.040378E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.229 | TFLOPs: 31.81 | +7: iteration 48090/ 173500 | consumed samples: 12311040 | consumed tokens: 25213009920 | elapsed time per iteration (s): 0.43 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 3.029065E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.632 | TFLOPs: 31.25 | +7: iteration 48100/ 173500 | consumed samples: 12313600 | consumed tokens: 25218252800 | elapsed time per iteration (s): 0.42 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 3.047458E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.378 | TFLOPs: 31.66 | +7: iteration 48110/ 173500 | consumed samples: 12316160 | consumed tokens: 25223495680 | elapsed time per iteration (s): 0.43 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 3.033406E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.694 | TFLOPs: 30.89 | +7: iteration 48120/ 173500 | consumed samples: 12318720 | consumed tokens: 25228738560 | elapsed time per iteration (s): 0.42 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 3.015096E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.683 | TFLOPs: 31.62 | +7: iteration 48130/ 173500 | consumed samples: 12321280 | consumed tokens: 25233981440 | elapsed time per iteration (s): 0.43 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 3.052608E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.837 | TFLOPs: 31.11 | +7: iteration 48140/ 173500 | consumed samples: 12323840 | consumed tokens: 25239224320 | elapsed time per iteration (s): 0.42 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 3.042340E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.674 | TFLOPs: 31.78 | +7: iteration 48150/ 173500 | consumed samples: 12326400 | consumed tokens: 25244467200 | elapsed time per iteration (s): 0.43 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 3.012942E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.404 | TFLOPs: 31.40 | +7: iteration 48160/ 173500 | consumed samples: 12328960 | consumed tokens: 25249710080 | elapsed time per iteration (s): 0.45 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 3.026514E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.561 | TFLOPs: 29.88 | +7: iteration 48170/ 173500 | consumed samples: 12331520 | consumed tokens: 25254952960 | elapsed time per iteration (s): 0.43 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.038719E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.328 | TFLOPs: 30.97 | +7: iteration 48180/ 173500 | consumed samples: 12334080 | consumed tokens: 25260195840 | elapsed time per iteration (s): 0.43 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.022004E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.935 | TFLOPs: 31.22 | +7: iteration 48190/ 173500 | consumed samples: 12336640 | consumed tokens: 25265438720 | elapsed time per iteration (s): 0.42 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.024201E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.297 | TFLOPs: 31.92 | +7: iteration 48200/ 173500 | consumed samples: 12339200 | consumed tokens: 25270681600 | elapsed time per iteration (s): 0.42 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.022209E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.256 | TFLOPs: 31.91 | +7: iteration 48210/ 173500 | consumed samples: 12341760 | consumed tokens: 25275924480 | elapsed time per iteration (s): 0.44 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.009083E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.038 | TFLOPs: 30.85 | +7: iteration 48220/ 173500 | consumed samples: 12344320 | consumed tokens: 25281167360 | elapsed time per iteration (s): 0.42 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.017754E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.101 | TFLOPs: 31.85 | +7: iteration 48230/ 173500 | consumed samples: 12346880 | consumed tokens: 25286410240 | elapsed time per iteration (s): 0.43 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.027697E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.154 | TFLOPs: 31.33 | +7: iteration 48240/ 173500 | consumed samples: 12349440 | consumed tokens: 25291653120 | elapsed time per iteration (s): 0.42 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.031381E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.647 | TFLOPs: 31.67 | +7: iteration 48250/ 173500 | consumed samples: 12352000 | consumed tokens: 25296896000 | elapsed time per iteration (s): 0.42 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 3.017998E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.600 | TFLOPs: 31.93 | +7: iteration 48260/ 173500 | consumed samples: 12354560 | consumed tokens: 25302138880 | elapsed time per iteration (s): 0.43 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 3.019709E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.434 | TFLOPs: 31.50 | +7: iteration 48270/ 173500 | consumed samples: 12357120 | consumed tokens: 25307381760 | elapsed time per iteration (s): 0.42 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 3.021094E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.810 | TFLOPs: 32.05 | +7: iteration 48280/ 173500 | consumed samples: 12359680 | consumed tokens: 25312624640 | elapsed time per iteration (s): 0.44 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 3.023847E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.549 | TFLOPs: 30.78 | +7: iteration 48290/ 173500 | consumed samples: 12362240 | consumed tokens: 25317867520 | elapsed time per iteration (s): 0.43 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 3.029745E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.508 | TFLOPs: 31.35 | +7: iteration 48300/ 173500 | consumed samples: 12364800 | consumed tokens: 25323110400 | elapsed time per iteration (s): 0.43 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 3.027585E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.839 | TFLOPs: 30.95 | +7: iteration 48310/ 173500 | consumed samples: 12367360 | consumed tokens: 25328353280 | elapsed time per iteration (s): 0.42 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 3.013996E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.288 | TFLOPs: 31.76 | +7: iteration 48320/ 173500 | consumed samples: 12369920 | consumed tokens: 25333596160 | elapsed time per iteration (s): 0.43 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 3.029384E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.119 | TFLOPs: 31.12 | +7: iteration 48330/ 173500 | consumed samples: 12372480 | consumed tokens: 25338839040 | elapsed time per iteration (s): 0.43 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.017092E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.496 | TFLOPs: 31.24 | +7: iteration 48340/ 173500 | consumed samples: 12375040 | consumed tokens: 25344081920 | elapsed time per iteration (s): 0.45 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.019996E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.373 | TFLOPs: 29.87 | +7: iteration 48350/ 173500 | consumed samples: 12377600 | consumed tokens: 25349324800 | elapsed time per iteration (s): 0.43 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.012923E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.936 | TFLOPs: 30.95 | +7: iteration 48360/ 173500 | consumed samples: 12380160 | consumed tokens: 25354567680 | elapsed time per iteration (s): 0.43 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.021007E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.982 | TFLOPs: 31.22 | +7: iteration 48370/ 173500 | consumed samples: 12382720 | consumed tokens: 25359810560 | elapsed time per iteration (s): 0.45 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.018380E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.972 | TFLOPs: 30.17 | +7: iteration 48380/ 173500 | consumed samples: 12385280 | consumed tokens: 25365053440 | elapsed time per iteration (s): 0.42 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.010254E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.347 | TFLOPs: 31.66 | +7: iteration 48390/ 173500 | consumed samples: 12387840 | consumed tokens: 25370296320 | elapsed time per iteration (s): 0.42 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.035504E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.651 | TFLOPs: 31.93 | +7: iteration 48400/ 173500 | consumed samples: 12390400 | consumed tokens: 25375539200 | elapsed time per iteration (s): 0.44 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.018012E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.717 | TFLOPs: 30.68 | +7: iteration 48410/ 173500 | consumed samples: 12392960 | consumed tokens: 25380782080 | elapsed time per iteration (s): 0.44 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 3.023949E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.344 | TFLOPs: 30.61 | +7: iteration 48420/ 173500 | consumed samples: 12395520 | consumed tokens: 25386024960 | elapsed time per iteration (s): 0.44 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 3.020013E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.721 | TFLOPs: 30.36 | +7: iteration 48430/ 173500 | consumed samples: 12398080 | consumed tokens: 25391267840 | elapsed time per iteration (s): 0.43 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 3.003673E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.297 | TFLOPs: 31.55 | +7: iteration 48440/ 173500 | consumed samples: 12400640 | consumed tokens: 25396510720 | elapsed time per iteration (s): 0.43 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 3.031924E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.608 | TFLOPs: 31.15 | +7: iteration 48450/ 173500 | consumed samples: 12403200 | consumed tokens: 25401753600 | elapsed time per iteration (s): 0.44 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 3.023417E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.353 | TFLOPs: 30.19 | +7: iteration 48460/ 173500 | consumed samples: 12405760 | consumed tokens: 25406996480 | elapsed time per iteration (s): 0.45 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 3.019681E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.146 | TFLOPs: 29.97 | +7: iteration 48470/ 173500 | consumed samples: 12408320 | consumed tokens: 25412239360 | elapsed time per iteration (s): 0.45 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 3.023182E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.123 | TFLOPs: 29.65 | +7: iteration 48480/ 173500 | consumed samples: 12410880 | consumed tokens: 25417482240 | elapsed time per iteration (s): 0.45 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 3.018501E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.420 | TFLOPs: 29.98 | +7: iteration 48490/ 173500 | consumed samples: 12413440 | consumed tokens: 25422725120 | elapsed time per iteration (s): 0.44 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.021519E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.669 | TFLOPs: 30.36 | +7: iteration 48500/ 173500 | consumed samples: 12416000 | consumed tokens: 25427968000 | elapsed time per iteration (s): 0.46 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.023508E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.442 | TFLOPs: 29.20 | +7: iteration 48510/ 173500 | consumed samples: 12418560 | consumed tokens: 25433210880 | elapsed time per iteration (s): 0.42 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.013713E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.883 | TFLOPs: 31.95 | +7: iteration 48520/ 173500 | consumed samples: 12421120 | consumed tokens: 25438453760 | elapsed time per iteration (s): 0.44 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.021026E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.976 | TFLOPs: 30.54 | +7: iteration 48530/ 173500 | consumed samples: 12423680 | consumed tokens: 25443696640 | elapsed time per iteration (s): 0.44 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.008548E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.512 | TFLOPs: 30.83 | +7: iteration 48540/ 173500 | consumed samples: 12426240 | consumed tokens: 25448939520 | elapsed time per iteration (s): 0.44 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.022012E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.050 | TFLOPs: 30.64 | +7: iteration 48550/ 173500 | consumed samples: 12428800 | consumed tokens: 25454182400 | elapsed time per iteration (s): 0.43 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.023120E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.292 | TFLOPs: 31.29 | +7: iteration 48560/ 173500 | consumed samples: 12431360 | consumed tokens: 25459425280 | elapsed time per iteration (s): 0.43 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.016292E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.317 | TFLOPs: 31.45 | +7: iteration 48570/ 173500 | consumed samples: 12433920 | consumed tokens: 25464668160 | elapsed time per iteration (s): 0.42 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.021491E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.118 | TFLOPs: 31.64 | +7: iteration 48580/ 173500 | consumed samples: 12436480 | consumed tokens: 25469911040 | elapsed time per iteration (s): 0.42 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.033904E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.761 | TFLOPs: 31.68 | +7: iteration 48590/ 173500 | consumed samples: 12439040 | consumed tokens: 25475153920 | elapsed time per iteration (s): 0.42 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.021136E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.130 | TFLOPs: 31.80 | +7: iteration 48600/ 173500 | consumed samples: 12441600 | consumed tokens: 25480396800 | elapsed time per iteration (s): 0.42 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.029972E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.564 | TFLOPs: 31.88 | +7: iteration 48610/ 173500 | consumed samples: 12444160 | consumed tokens: 25485639680 | elapsed time per iteration (s): 0.43 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.007236E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.476 | TFLOPs: 31.19 | +7: iteration 48620/ 173500 | consumed samples: 12446720 | consumed tokens: 25490882560 | elapsed time per iteration (s): 0.43 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.013028E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.155 | TFLOPs: 31.49 | +7: iteration 48630/ 173500 | consumed samples: 12449280 | consumed tokens: 25496125440 | elapsed time per iteration (s): 0.43 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.018607E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.653 | TFLOPs: 31.46 | +7: iteration 48640/ 173500 | consumed samples: 12451840 | consumed tokens: 25501368320 | elapsed time per iteration (s): 0.42 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.022774E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.000 | TFLOPs: 31.69 | +7: iteration 48650/ 173500 | consumed samples: 12454400 | consumed tokens: 25506611200 | elapsed time per iteration (s): 0.43 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.023068E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.363 | TFLOPs: 31.55 | +7: iteration 48660/ 173500 | consumed samples: 12456960 | consumed tokens: 25511854080 | elapsed time per iteration (s): 0.43 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 3.018247E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.627 | TFLOPs: 30.94 | +7: iteration 48670/ 173500 | consumed samples: 12459520 | consumed tokens: 25517096960 | elapsed time per iteration (s): 0.43 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 3.021360E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.667 | TFLOPs: 31.31 | +7: iteration 48680/ 173500 | consumed samples: 12462080 | consumed tokens: 25522339840 | elapsed time per iteration (s): 0.42 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 3.003517E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.337 | TFLOPs: 31.76 | +7: iteration 48690/ 173500 | consumed samples: 12464640 | consumed tokens: 25527582720 | elapsed time per iteration (s): 0.43 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 3.028328E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.157 | TFLOPs: 31.54 | +7: iteration 48700/ 173500 | consumed samples: 12467200 | consumed tokens: 25532825600 | elapsed time per iteration (s): 0.42 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 3.032425E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.020 | TFLOPs: 32.01 | +7: iteration 48710/ 173500 | consumed samples: 12469760 | consumed tokens: 25538068480 | elapsed time per iteration (s): 0.43 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 3.017746E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.013 | TFLOPs: 31.59 | +7: iteration 48720/ 173500 | consumed samples: 12472320 | consumed tokens: 25543311360 | elapsed time per iteration (s): 0.43 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 3.021996E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.095 | TFLOPs: 31.54 | +7: iteration 48730/ 173500 | consumed samples: 12474880 | consumed tokens: 25548554240 | elapsed time per iteration (s): 0.43 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 3.023138E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.298 | TFLOPs: 31.60 | +7: iteration 48740/ 173500 | consumed samples: 12477440 | consumed tokens: 25553797120 | elapsed time per iteration (s): 0.42 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.020717E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.783 | TFLOPs: 31.89 | +7: iteration 48750/ 173500 | consumed samples: 12480000 | consumed tokens: 25559040000 | elapsed time per iteration (s): 0.43 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.024627E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.684 | TFLOPs: 31.46 | +7: iteration 48760/ 173500 | consumed samples: 12482560 | consumed tokens: 25564282880 | elapsed time per iteration (s): 0.43 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.009762E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.110 | TFLOPs: 31.54 | +7: iteration 48770/ 173500 | consumed samples: 12485120 | consumed tokens: 25569525760 | elapsed time per iteration (s): 0.43 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.024444E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.363 | TFLOPs: 31.03 | +7: iteration 48780/ 173500 | consumed samples: 12487680 | consumed tokens: 25574768640 | elapsed time per iteration (s): 0.42 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.010878E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.940 | TFLOPs: 31.79 | +7: iteration 48790/ 173500 | consumed samples: 12490240 | consumed tokens: 25580011520 | elapsed time per iteration (s): 0.42 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.032340E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.166 | TFLOPs: 32.07 | +7: iteration 48800/ 173500 | consumed samples: 12492800 | consumed tokens: 25585254400 | elapsed time per iteration (s): 0.43 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.012608E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.788 | TFLOPs: 31.57 | +7: iteration 48810/ 173500 | consumed samples: 12495360 | consumed tokens: 25590497280 | elapsed time per iteration (s): 0.43 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.027346E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.706 | TFLOPs: 30.99 | +7: iteration 48820/ 173500 | consumed samples: 12497920 | consumed tokens: 25595740160 | elapsed time per iteration (s): 0.43 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 3.009888E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.455 | TFLOPs: 31.50 | +7: iteration 48830/ 173500 | consumed samples: 12500480 | consumed tokens: 25600983040 | elapsed time per iteration (s): 0.43 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 3.036087E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.191 | TFLOPs: 31.28 | +7: iteration 48840/ 173500 | consumed samples: 12503040 | consumed tokens: 25606225920 | elapsed time per iteration (s): 0.42 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 3.009998E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.007 | TFLOPs: 31.80 | +7: iteration 48850/ 173500 | consumed samples: 12505600 | consumed tokens: 25611468800 | elapsed time per iteration (s): 0.42 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 3.017968E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.570 | TFLOPs: 31.83 | +7: iteration 48860/ 173500 | consumed samples: 12508160 | consumed tokens: 25616711680 | elapsed time per iteration (s): 0.43 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 3.030365E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.434 | TFLOPs: 31.40 | +7: iteration 48870/ 173500 | consumed samples: 12510720 | consumed tokens: 25621954560 | elapsed time per iteration (s): 0.43 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 3.017764E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.997 | TFLOPs: 31.27 | +7: iteration 48880/ 173500 | consumed samples: 12513280 | consumed tokens: 25627197440 | elapsed time per iteration (s): 0.44 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 3.020008E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.581 | TFLOPs: 30.83 | +7: iteration 48890/ 173500 | consumed samples: 12515840 | consumed tokens: 25632440320 | elapsed time per iteration (s): 0.46 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 3.014802E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.561 | TFLOPs: 29.04 | +7: iteration 48900/ 173500 | consumed samples: 12518400 | consumed tokens: 25637683200 | elapsed time per iteration (s): 0.44 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.028534E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.800 | TFLOPs: 30.63 | +7: iteration 48910/ 173500 | consumed samples: 12520960 | consumed tokens: 25642926080 | elapsed time per iteration (s): 0.42 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.016146E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.000 | TFLOPs: 31.64 | +7: iteration 48920/ 173500 | consumed samples: 12523520 | consumed tokens: 25648168960 | elapsed time per iteration (s): 0.42 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.030093E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.203 | TFLOPs: 31.65 | +7: iteration 48930/ 173500 | consumed samples: 12526080 | consumed tokens: 25653411840 | elapsed time per iteration (s): 0.42 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.024009E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.323 | TFLOPs: 31.81 | +7: iteration 48940/ 173500 | consumed samples: 12528640 | consumed tokens: 25658654720 | elapsed time per iteration (s): 0.42 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.034658E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.420 | TFLOPs: 31.66 | +7: iteration 48950/ 173500 | consumed samples: 12531200 | consumed tokens: 25663897600 | elapsed time per iteration (s): 0.42 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.019554E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.622 | TFLOPs: 31.93 | +7: iteration 48960/ 173500 | consumed samples: 12533760 | consumed tokens: 25669140480 | elapsed time per iteration (s): 0.43 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.024191E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.003 | TFLOPs: 31.43 | +7: iteration 48970/ 173500 | consumed samples: 12536320 | consumed tokens: 25674383360 | elapsed time per iteration (s): 0.43 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.021864E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.519 | TFLOPs: 31.40 | +7: iteration 48980/ 173500 | consumed samples: 12538880 | consumed tokens: 25679626240 | elapsed time per iteration (s): 0.42 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 3.027321E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.363 | TFLOPs: 31.87 | +7: iteration 48990/ 173500 | consumed samples: 12541440 | consumed tokens: 25684869120 | elapsed time per iteration (s): 0.42 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 3.023801E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.564 | TFLOPs: 31.72 | +7: iteration 49000/ 173500 | consumed samples: 12544000 | consumed tokens: 25690112000 | elapsed time per iteration (s): 0.42 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 3.032337E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.580 | TFLOPs: 31.88 | +7: iteration 49010/ 173500 | consumed samples: 12546560 | consumed tokens: 25695354880 | elapsed time per iteration (s): 0.43 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 3.013371E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.015 | TFLOPs: 31.53 | +7: iteration 49020/ 173500 | consumed samples: 12549120 | consumed tokens: 25700597760 | elapsed time per iteration (s): 0.43 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 3.026590E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.419 | TFLOPs: 31.40 | +7: iteration 49030/ 173500 | consumed samples: 12551680 | consumed tokens: 25705840640 | elapsed time per iteration (s): 0.43 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 3.031051E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.131 | TFLOPs: 31.44 | +7: iteration 49040/ 173500 | consumed samples: 12554240 | consumed tokens: 25711083520 | elapsed time per iteration (s): 0.42 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 3.022599E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.113 | TFLOPs: 31.85 | +7: iteration 49050/ 173500 | consumed samples: 12556800 | consumed tokens: 25716326400 | elapsed time per iteration (s): 0.43 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 3.028139E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.005 | TFLOPs: 31.59 | +7: iteration 49060/ 173500 | consumed samples: 12559360 | consumed tokens: 25721569280 | elapsed time per iteration (s): 0.43 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.014961E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.173 | TFLOPs: 31.54 | +7: iteration 49070/ 173500 | consumed samples: 12561920 | consumed tokens: 25726812160 | elapsed time per iteration (s): 0.43 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.014004E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.777 | TFLOPs: 31.52 | +7: iteration 49080/ 173500 | consumed samples: 12564480 | consumed tokens: 25732055040 | elapsed time per iteration (s): 0.42 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.017831E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.350 | TFLOPs: 31.81 | +7: iteration 49090/ 173500 | consumed samples: 12567040 | consumed tokens: 25737297920 | elapsed time per iteration (s): 0.43 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.018005E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.229 | TFLOPs: 31.60 | +7: iteration 49100/ 173500 | consumed samples: 12569600 | consumed tokens: 25742540800 | elapsed time per iteration (s): 0.43 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.031117E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.183 | TFLOPs: 31.44 | +7: iteration 49110/ 173500 | consumed samples: 12572160 | consumed tokens: 25747783680 | elapsed time per iteration (s): 0.42 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.026291E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.915 | TFLOPs: 31.74 | +7: iteration 49120/ 173500 | consumed samples: 12574720 | consumed tokens: 25753026560 | elapsed time per iteration (s): 0.42 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.019578E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.399 | TFLOPs: 31.82 | +7: iteration 49130/ 173500 | consumed samples: 12577280 | consumed tokens: 25758269440 | elapsed time per iteration (s): 0.42 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.012885E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.489 | TFLOPs: 31.77 | +7: iteration 49140/ 173500 | consumed samples: 12579840 | consumed tokens: 25763512320 | elapsed time per iteration (s): 0.43 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 3.029804E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.880 | TFLOPs: 31.47 | +7: iteration 49150/ 173500 | consumed samples: 12582400 | consumed tokens: 25768755200 | elapsed time per iteration (s): 0.42 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 3.014799E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.857 | TFLOPs: 31.95 | +7: iteration 49160/ 173500 | consumed samples: 12584960 | consumed tokens: 25773998080 | elapsed time per iteration (s): 0.42 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 3.013021E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.204 | TFLOPs: 32.02 | +7: iteration 49170/ 173500 | consumed samples: 12587520 | consumed tokens: 25779240960 | elapsed time per iteration (s): 0.43 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 3.005610E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.318 | TFLOPs: 31.50 | +7: iteration 49180/ 173500 | consumed samples: 12590080 | consumed tokens: 25784483840 | elapsed time per iteration (s): 0.42 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 3.032517E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.034 | TFLOPs: 31.69 | +7: iteration 49190/ 173500 | consumed samples: 12592640 | consumed tokens: 25789726720 | elapsed time per iteration (s): 0.46 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 3.032611E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.401 | TFLOPs: 28.93 | +7: iteration 49200/ 173500 | consumed samples: 12595200 | consumed tokens: 25794969600 | elapsed time per iteration (s): 0.43 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 3.028491E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.932 | TFLOPs: 31.58 | +7: iteration 49210/ 173500 | consumed samples: 12597760 | consumed tokens: 25800212480 | elapsed time per iteration (s): 0.42 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 3.036540E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.529 | TFLOPs: 31.88 | +7: iteration 49220/ 173500 | consumed samples: 12600320 | consumed tokens: 25805455360 | elapsed time per iteration (s): 0.42 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.003750E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.347 | TFLOPs: 31.87 | +7: iteration 49230/ 173500 | consumed samples: 12602880 | consumed tokens: 25810698240 | elapsed time per iteration (s): 0.43 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.016557E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.996 | TFLOPs: 31.48 | +7: iteration 49240/ 173500 | consumed samples: 12605440 | consumed tokens: 25815941120 | elapsed time per iteration (s): 0.44 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.014019E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.892 | TFLOPs: 30.37 | +7: iteration 49250/ 173500 | consumed samples: 12608000 | consumed tokens: 25821184000 | elapsed time per iteration (s): 0.43 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.027263E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.598 | TFLOPs: 31.35 | +7: iteration 49260/ 173500 | consumed samples: 12610560 | consumed tokens: 25826426880 | elapsed time per iteration (s): 0.42 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.014697E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.301 | TFLOPs: 31.97 | +7: iteration 49270/ 173500 | consumed samples: 12613120 | consumed tokens: 25831669760 | elapsed time per iteration (s): 0.43 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.019108E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.462 | TFLOPs: 30.98 | +7: iteration 49280/ 173500 | consumed samples: 12615680 | consumed tokens: 25836912640 | elapsed time per iteration (s): 0.42 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.027864E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.472 | TFLOPs: 31.87 | +7: iteration 49290/ 173500 | consumed samples: 12618240 | consumed tokens: 25842155520 | elapsed time per iteration (s): 0.43 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 3.025810E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.144 | TFLOPs: 30.96 | +7: iteration 49300/ 173500 | consumed samples: 12620800 | consumed tokens: 25847398400 | elapsed time per iteration (s): 0.42 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 3.024467E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.566 | TFLOPs: 31.88 | +7: iteration 49310/ 173500 | consumed samples: 12623360 | consumed tokens: 25852641280 | elapsed time per iteration (s): 0.42 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 3.036546E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.459 | TFLOPs: 32.08 | +7: iteration 49320/ 173500 | consumed samples: 12625920 | consumed tokens: 25857884160 | elapsed time per iteration (s): 0.43 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 3.030283E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.836 | TFLOPs: 31.32 | +7: iteration 49330/ 173500 | consumed samples: 12628480 | consumed tokens: 25863127040 | elapsed time per iteration (s): 0.44 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 3.016771E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.225 | TFLOPs: 30.44 | +7: iteration 49340/ 173500 | consumed samples: 12631040 | consumed tokens: 25868369920 | elapsed time per iteration (s): 0.42 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 3.029369E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.634 | TFLOPs: 31.78 | +7: iteration 49350/ 173500 | consumed samples: 12633600 | consumed tokens: 25873612800 | elapsed time per iteration (s): 0.42 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 3.024759E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.533 | TFLOPs: 32.09 | +7: iteration 49360/ 173500 | consumed samples: 12636160 | consumed tokens: 25878855680 | elapsed time per iteration (s): 0.43 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 3.009741E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.391 | TFLOPs: 31.34 | +7: iteration 49370/ 173500 | consumed samples: 12638720 | consumed tokens: 25884098560 | elapsed time per iteration (s): 0.43 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.035255E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.580 | TFLOPs: 31.46 | +7: iteration 49380/ 173500 | consumed samples: 12641280 | consumed tokens: 25889341440 | elapsed time per iteration (s): 0.42 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.007882E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.141 | TFLOPs: 32.07 | +7: iteration 49390/ 173500 | consumed samples: 12643840 | consumed tokens: 25894584320 | elapsed time per iteration (s): 0.44 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.008207E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.416 | TFLOPs: 30.82 | +7: iteration 49400/ 173500 | consumed samples: 12646400 | consumed tokens: 25899827200 | elapsed time per iteration (s): 0.42 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.014404E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.727 | TFLOPs: 31.94 | +7: iteration 49410/ 173500 | consumed samples: 12648960 | consumed tokens: 25905070080 | elapsed time per iteration (s): 0.42 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.025961E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.494 | TFLOPs: 31.93 | +7: iteration 49420/ 173500 | consumed samples: 12651520 | consumed tokens: 25910312960 | elapsed time per iteration (s): 0.43 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.016698E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.349 | TFLOPs: 31.45 | +7: iteration 49430/ 173500 | consumed samples: 12654080 | consumed tokens: 25915555840 | elapsed time per iteration (s): 0.42 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.018727E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.737 | TFLOPs: 31.78 | +7: iteration 49440/ 173500 | consumed samples: 12656640 | consumed tokens: 25920798720 | elapsed time per iteration (s): 0.43 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.032429E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.975 | TFLOPs: 31.43 | +7: iteration 49450/ 173500 | consumed samples: 12659200 | consumed tokens: 25926041600 | elapsed time per iteration (s): 0.42 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 3.012752E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.700 | TFLOPs: 31.73 | +7: iteration 49460/ 173500 | consumed samples: 12661760 | consumed tokens: 25931284480 | elapsed time per iteration (s): 0.42 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 3.017760E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.887 | TFLOPs: 31.74 | +7: iteration 49470/ 173500 | consumed samples: 12664320 | consumed tokens: 25936527360 | elapsed time per iteration (s): 0.42 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 3.006446E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.276 | TFLOPs: 31.86 | +7: iteration 49480/ 173500 | consumed samples: 12666880 | consumed tokens: 25941770240 | elapsed time per iteration (s): 0.43 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 3.018220E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.827 | TFLOPs: 31.37 | +7: iteration 49490/ 173500 | consumed samples: 12669440 | consumed tokens: 25947013120 | elapsed time per iteration (s): 0.43 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 3.019369E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.898 | TFLOPs: 31.32 | +7: iteration 49500/ 173500 | consumed samples: 12672000 | consumed tokens: 25952256000 | elapsed time per iteration (s): 0.43 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 3.007688E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.433 | TFLOPs: 31.50 | +7: iteration 49510/ 173500 | consumed samples: 12674560 | consumed tokens: 25957498880 | elapsed time per iteration (s): 0.44 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 3.036036E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.520 | TFLOPs: 30.25 | +7: iteration 49520/ 173500 | consumed samples: 12677120 | consumed tokens: 25962741760 | elapsed time per iteration (s): 0.42 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 3.022951E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.811 | TFLOPs: 32.00 | +7: iteration 49530/ 173500 | consumed samples: 12679680 | consumed tokens: 25967984640 | elapsed time per iteration (s): 0.42 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.015324E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.567 | TFLOPs: 31.77 | +7: iteration 49540/ 173500 | consumed samples: 12682240 | consumed tokens: 25973227520 | elapsed time per iteration (s): 0.43 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.032404E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.343 | TFLOPs: 31.55 | +7: iteration 49550/ 173500 | consumed samples: 12684800 | consumed tokens: 25978470400 | elapsed time per iteration (s): 0.45 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.009903E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.767 | TFLOPs: 30.10 | +7: iteration 49560/ 173500 | consumed samples: 12687360 | consumed tokens: 25983713280 | elapsed time per iteration (s): 0.43 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.022302E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.344 | TFLOPs: 31.39 | +7: iteration 49570/ 173500 | consumed samples: 12689920 | consumed tokens: 25988956160 | elapsed time per iteration (s): 0.42 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.015221E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.849 | TFLOPs: 31.79 | +7: iteration 49580/ 173500 | consumed samples: 12692480 | consumed tokens: 25994199040 | elapsed time per iteration (s): 0.42 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.027313E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.411 | TFLOPs: 32.03 | +7: iteration 49590/ 173500 | consumed samples: 12695040 | consumed tokens: 25999441920 | elapsed time per iteration (s): 0.42 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.007259E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.501 | TFLOPs: 31.77 | +7: iteration 49600/ 173500 | consumed samples: 12697600 | consumed tokens: 26004684800 | elapsed time per iteration (s): 0.42 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.035541E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.796 | TFLOPs: 31.79 | +7: iteration 49610/ 173500 | consumed samples: 12700160 | consumed tokens: 26009927680 | elapsed time per iteration (s): 0.42 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 3.009451E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.419 | TFLOPs: 31.61 | +7: iteration 49620/ 173500 | consumed samples: 12702720 | consumed tokens: 26015170560 | elapsed time per iteration (s): 0.42 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 3.006237E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.531 | TFLOPs: 32.03 | +7: iteration 49630/ 173500 | consumed samples: 12705280 | consumed tokens: 26020413440 | elapsed time per iteration (s): 0.43 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 3.023223E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.127 | TFLOPs: 31.44 | +7: iteration 49640/ 173500 | consumed samples: 12707840 | consumed tokens: 26025656320 | elapsed time per iteration (s): 0.42 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 3.019371E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.860 | TFLOPs: 31.89 | +7: iteration 49650/ 173500 | consumed samples: 12710400 | consumed tokens: 26030899200 | elapsed time per iteration (s): 0.43 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 3.006460E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.736 | TFLOPs: 31.26 | +7: iteration 49660/ 173500 | consumed samples: 12712960 | consumed tokens: 26036142080 | elapsed time per iteration (s): 0.43 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 3.027118E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.997 | TFLOPs: 31.27 | +7: iteration 49670/ 173500 | consumed samples: 12715520 | consumed tokens: 26041384960 | elapsed time per iteration (s): 0.42 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 3.031249E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.067 | TFLOPs: 32.06 | +7: iteration 49680/ 173500 | consumed samples: 12718080 | consumed tokens: 26046627840 | elapsed time per iteration (s): 0.42 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 3.020180E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.405 | TFLOPs: 31.71 | +7: iteration 49690/ 173500 | consumed samples: 12720640 | consumed tokens: 26051870720 | elapsed time per iteration (s): 0.42 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.009962E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.038 | TFLOPs: 31.64 | +7: iteration 49700/ 173500 | consumed samples: 12723200 | consumed tokens: 26057113600 | elapsed time per iteration (s): 0.42 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.013703E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.048 | TFLOPs: 31.90 | +7: iteration 49710/ 173500 | consumed samples: 12725760 | consumed tokens: 26062356480 | elapsed time per iteration (s): 0.43 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.014307E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.003 | TFLOPs: 31.53 | +7: iteration 49720/ 173500 | consumed samples: 12728320 | consumed tokens: 26067599360 | elapsed time per iteration (s): 0.43 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.009981E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.056 | TFLOPs: 31.59 | +7: iteration 49730/ 173500 | consumed samples: 12730880 | consumed tokens: 26072842240 | elapsed time per iteration (s): 0.42 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.026219E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.139 | TFLOPs: 32.01 | +7: iteration 49740/ 173500 | consumed samples: 12733440 | consumed tokens: 26078085120 | elapsed time per iteration (s): 0.42 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.023398E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.015 | TFLOPs: 32.01 | +7: iteration 49750/ 173500 | consumed samples: 12736000 | consumed tokens: 26083328000 | elapsed time per iteration (s): 0.42 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.016838E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.723 | TFLOPs: 31.99 | +7: iteration 49760/ 173500 | consumed samples: 12738560 | consumed tokens: 26088570880 | elapsed time per iteration (s): 0.42 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.010958E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.593 | TFLOPs: 31.77 | +7: iteration 49770/ 173500 | consumed samples: 12741120 | consumed tokens: 26093813760 | elapsed time per iteration (s): 0.42 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 3.017243E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.095 | TFLOPs: 32.01 | +7: iteration 49780/ 173500 | consumed samples: 12743680 | consumed tokens: 26099056640 | elapsed time per iteration (s): 0.43 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 3.021961E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.505 | TFLOPs: 31.09 | +7: iteration 49790/ 173500 | consumed samples: 12746240 | consumed tokens: 26104299520 | elapsed time per iteration (s): 0.42 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 3.027153E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.005 | TFLOPs: 31.80 | +7: iteration 49800/ 173500 | consumed samples: 12748800 | consumed tokens: 26109542400 | elapsed time per iteration (s): 0.43 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 3.027440E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.639 | TFLOPs: 31.20 | +7: iteration 49810/ 173500 | consumed samples: 12751360 | consumed tokens: 26114785280 | elapsed time per iteration (s): 0.44 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 3.014657E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.100 | TFLOPs: 30.86 | +7: iteration 49820/ 173500 | consumed samples: 12753920 | consumed tokens: 26120028160 | elapsed time per iteration (s): 0.42 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 3.016427E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.808 | TFLOPs: 32.05 | +7: iteration 49830/ 173500 | consumed samples: 12756480 | consumed tokens: 26125271040 | elapsed time per iteration (s): 0.42 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 3.030762E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.077 | TFLOPs: 31.85 | +7: iteration 49840/ 173500 | consumed samples: 12759040 | consumed tokens: 26130513920 | elapsed time per iteration (s): 0.42 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 3.010450E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.916 | TFLOPs: 31.74 | +7: iteration 49850/ 173500 | consumed samples: 12761600 | consumed tokens: 26135756800 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.007516E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.907 | TFLOPs: 31.74 | +7: iteration 49860/ 173500 | consumed samples: 12764160 | consumed tokens: 26140999680 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.014944E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.223 | TFLOPs: 31.86 | +7: iteration 49870/ 173500 | consumed samples: 12766720 | consumed tokens: 26146242560 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.011976E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.686 | TFLOPs: 31.78 | +7: iteration 49880/ 173500 | consumed samples: 12769280 | consumed tokens: 26151485440 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.007746E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.066 | TFLOPs: 31.90 | +7: iteration 49890/ 173500 | consumed samples: 12771840 | consumed tokens: 26156728320 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.021147E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.998 | TFLOPs: 31.69 | +7: iteration 49900/ 173500 | consumed samples: 12774400 | consumed tokens: 26161971200 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.014493E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.082 | TFLOPs: 32.01 | +7: iteration 49910/ 173500 | consumed samples: 12776960 | consumed tokens: 26167214080 | elapsed time per iteration (s): 0.43 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.020947E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.450 | TFLOPs: 31.14 | +7: iteration 49920/ 173500 | consumed samples: 12779520 | consumed tokens: 26172456960 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.016293E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.856 | TFLOPs: 31.84 | +7: iteration 49930/ 173500 | consumed samples: 12782080 | consumed tokens: 26177699840 | elapsed time per iteration (s): 0.42 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.027110E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.109 | TFLOPs: 32.01 | +7: iteration 49940/ 173500 | consumed samples: 12784640 | consumed tokens: 26182942720 | elapsed time per iteration (s): 0.42 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.025149E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.634 | TFLOPs: 31.99 | +7: iteration 49950/ 173500 | consumed samples: 12787200 | consumed tokens: 26188185600 | elapsed time per iteration (s): 0.42 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.010590E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.404 | TFLOPs: 31.66 | +7: iteration 49960/ 173500 | consumed samples: 12789760 | consumed tokens: 26193428480 | elapsed time per iteration (s): 0.42 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.024012E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.642 | TFLOPs: 31.78 | +7: iteration 49970/ 173500 | consumed samples: 12792320 | consumed tokens: 26198671360 | elapsed time per iteration (s): 0.42 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.013475E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.963 | TFLOPs: 31.69 | +7: iteration 49980/ 173500 | consumed samples: 12794880 | consumed tokens: 26203914240 | elapsed time per iteration (s): 0.43 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.021409E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.200 | TFLOPs: 31.39 | +7: iteration 49990/ 173500 | consumed samples: 12797440 | consumed tokens: 26209157120 | elapsed time per iteration (s): 0.42 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.017366E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.387 | TFLOPs: 31.82 | +0: [2023-03-17 05:07:10,758] [INFO] [logging.py:68:log_dist] [Rank 0] step=50000, skipped=0, lr=[0.00016715144913462704, 0.00016715144913462704, 0.00016715144913462704], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 50000/ 173500 | consumed samples: 12800000 | consumed tokens: 26214400000 | elapsed time per iteration (s): 0.44 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.028363E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.812 | TFLOPs: 30.26 | +0: steps: 50000 loss: 3.0385 iter time (s): 0.425 samples/sec: 601.979 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 50000 | lm loss value: 3.285561E+00 | lm loss PPL: 2.672396E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 50000 to checkpoints_221m91b400m +0: [2023-03-17 05:07:10,941] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step50000 is begin to save! +0: [2023-03-17 05:07:10,950] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_01-model_00-model_states.pt... +0: [2023-03-17 05:07:11,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_01-model_00-model_states.pt. +0: [2023-03-17 05:07:11,127] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_03-model_00-model_states.pt... +0: [2023-03-17 05:07:11,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_03-model_00-model_states.pt. +0: [2023-03-17 05:07:11,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_04-model_00-model_states.pt... +0: [2023-03-17 05:07:11,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_04-model_00-model_states.pt. +0: [2023-03-17 05:07:11,177] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_05-model_00-model_states.pt... +0: [2023-03-17 05:07:11,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_05-model_00-model_states.pt. +0: [2023-03-17 05:07:11,203] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_06-model_00-model_states.pt... +0: [2023-03-17 05:07:11,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_06-model_00-model_states.pt. +0: [2023-03-17 05:07:11,228] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_07-model_00-model_states.pt... +0: [2023-03-17 05:07:11,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_07-model_00-model_states.pt. +0: [2023-03-17 05:07:11,253] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_08-model_00-model_states.pt... +0: [2023-03-17 05:07:11,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_08-model_00-model_states.pt. +0: [2023-03-17 05:07:11,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_09-model_00-model_states.pt... +0: [2023-03-17 05:07:11,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_09-model_00-model_states.pt. +0: [2023-03-17 05:07:11,302] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_10-model_00-model_states.pt... +0: [2023-03-17 05:07:11,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_10-model_00-model_states.pt. +0: [2023-03-17 05:07:11,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_11-model_00-model_states.pt... +0: [2023-03-17 05:07:11,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_11-model_00-model_states.pt. +0: [2023-03-17 05:07:11,353] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_12-model_00-model_states.pt... +0: [2023-03-17 05:07:11,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_12-model_00-model_states.pt. +0: [2023-03-17 05:07:11,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_13-model_00-model_states.pt... +0: [2023-03-17 05:07:11,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_13-model_00-model_states.pt. +0: [2023-03-17 05:07:11,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_14-model_00-model_states.pt... +0: [2023-03-17 05:07:11,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_14-model_00-model_states.pt. +0: [2023-03-17 05:07:11,426] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_15-model_00-model_states.pt... +0: [2023-03-17 05:07:11,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_15-model_00-model_states.pt. +0: [2023-03-17 05:07:11,450] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_16-model_00-model_states.pt... +0: [2023-03-17 05:07:11,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_16-model_00-model_states.pt. +0: [2023-03-17 05:07:11,474] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_17-model_00-model_states.pt... +0: [2023-03-17 05:07:11,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_17-model_00-model_states.pt. +0: [2023-03-17 05:07:11,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_18-model_00-model_states.pt... +0: [2023-03-17 05:07:11,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_18-model_00-model_states.pt. +0: [2023-03-17 05:07:11,522] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_19-model_00-model_states.pt... +0: [2023-03-17 05:07:11,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_19-model_00-model_states.pt. +0: [2023-03-17 05:07:11,547] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_20-model_00-model_states.pt... +0: [2023-03-17 05:07:11,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_20-model_00-model_states.pt. +0: [2023-03-17 05:07:11,572] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/layer_22-model_00-model_states.pt... +0: [2023-03-17 05:07:11,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/layer_22-model_00-model_states.pt. +0: [2023-03-17 05:07:11,577] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step50000/mp_rank_00_model_states.pt +0: [2023-03-17 05:07:11,577] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/mp_rank_00_model_states.pt... +0: [2023-03-17 05:07:11,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/mp_rank_00_model_states.pt. +0: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +1: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +7: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +6: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +2: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +4: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +5: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +0: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +3: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +1: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +7: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +6: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +2: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +4: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +5: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +1: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +6: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +4: [2023-03-17 05:07:11,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +0: [2023-03-17 05:07:11,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 05:07:11,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 05:07:11,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 05:07:11,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +0: [2023-03-17 05:07:11,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 05:07:11,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 05:07:11,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +0: [2023-03-17 05:07:11,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 05:07:11,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 05:07:11,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +0: [2023-03-17 05:07:11,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 05:07:11,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 05:07:11,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +6: [2023-03-17 05:07:11,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 05:07:11,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 05:07:11,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 05:07:11,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 05:07:11,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 05:07:11,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 05:07:11,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +6: [2023-03-17 05:07:11,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +6: [2023-03-17 05:07:11,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +6: [2023-03-17 05:07:11,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 05:07:11,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 05:07:11,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +6: [2023-03-17 05:07:11,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 05:07:11,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 05:07:11,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +6: [2023-03-17 05:07:11,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 05:07:11,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 05:07:11,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +6: [2023-03-17 05:07:11,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 05:07:11,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 05:07:11,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 05:07:11,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 05:07:11,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +6: [2023-03-17 05:07:11,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +2: [2023-03-17 05:07:11,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 05:07:11,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 05:07:11,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 05:07:11,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 05:07:11,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 05:07:11,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 05:07:11,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +2: [2023-03-17 05:07:11,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +2: [2023-03-17 05:07:11,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +2: [2023-03-17 05:07:11,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 05:07:11,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 05:07:11,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +2: [2023-03-17 05:07:11,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 05:07:11,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 05:07:11,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +2: [2023-03-17 05:07:11,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 05:07:11,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 05:07:11,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +2: [2023-03-17 05:07:11,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 05:07:11,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 05:07:11,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +2: [2023-03-17 05:07:11,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +5: [2023-03-17 05:07:11,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 05:07:11,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 05:07:11,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +5: [2023-03-17 05:07:11,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 05:07:11,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 05:07:11,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +5: [2023-03-17 05:07:11,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 05:07:11,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 05:07:11,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +5: [2023-03-17 05:07:11,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 05:07:11,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 05:07:11,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +0: [2023-03-17 05:07:11,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 05:07:11,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 05:07:11,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 05:07:11,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 05:07:11,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +0: [2023-03-17 05:07:11,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 05:07:11,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 05:07:11,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +0: [2023-03-17 05:07:11,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +2: [2023-03-17 05:07:11,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 05:07:11,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +5: [2023-03-17 05:07:11,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 05:07:11,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 05:07:11,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 05:07:11,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 05:07:11,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 05:07:11,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 05:07:11,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 05:07:11,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 05:07:11,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +5: [2023-03-17 05:07:11,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +5: [2023-03-17 05:07:11,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +5: [2023-03-17 05:07:11,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 05:07:11,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 05:07:11,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +3: [2023-03-17 05:07:11,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 05:07:11,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 05:07:11,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +3: [2023-03-17 05:07:11,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +3: [2023-03-17 05:07:11,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 05:07:11,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +3: [2023-03-17 05:07:11,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 05:07:11,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 05:07:11,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 05:07:11,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 05:07:11,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 05:07:11,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 05:07:11,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 05:07:11,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +4: [2023-03-17 05:07:11,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 05:07:11,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 05:07:11,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +4: [2023-03-17 05:07:11,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +4: [2023-03-17 05:07:11,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +7: [2023-03-17 05:07:11,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 05:07:11,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 05:07:11,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 05:07:11,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 05:07:11,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 05:07:11,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 05:07:11,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 05:07:11,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 05:07:11,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 05:07:11,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 05:07:11,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 05:07:11,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 05:07:11,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 05:07:11,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +7: [2023-03-17 05:07:11,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +7: [2023-03-17 05:07:11,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 05:07:11,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 05:07:11,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 05:07:11,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +7: [2023-03-17 05:07:11,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +7: [2023-03-17 05:07:11,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +7: [2023-03-17 05:07:11,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +7: [2023-03-17 05:07:11,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +7: [2023-03-17 05:07:11,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +0: [2023-03-17 05:07:11,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 05:07:11,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 05:07:11,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 05:07:11,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 05:07:11,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 05:07:11,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 05:07:11,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 05:07:11,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 05:07:11,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 05:07:11,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +1: [2023-03-17 05:07:11,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! +0: successfully saved checkpoint at iteration 50000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 832.32 +7: iteration 50010/ 173500 | consumed samples: 12802560 | consumed tokens: 26219642880 | elapsed time per iteration (s): 0.60 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 3.020863E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 429.768 | TFLOPs: 22.55 | +7: iteration 50020/ 173500 | consumed samples: 12805120 | consumed tokens: 26224885760 | elapsed time per iteration (s): 0.56 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 3.006710E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 453.372 | TFLOPs: 23.79 | +7: iteration 50030/ 173500 | consumed samples: 12807680 | consumed tokens: 26230128640 | elapsed time per iteration (s): 0.53 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 3.025194E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 478.802 | TFLOPs: 25.12 | +7: iteration 50040/ 173500 | consumed samples: 12810240 | consumed tokens: 26235371520 | elapsed time per iteration (s): 0.50 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 3.010669E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 516.150 | TFLOPs: 27.08 | +7: iteration 50050/ 173500 | consumed samples: 12812800 | consumed tokens: 26240614400 | elapsed time per iteration (s): 0.56 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 3.019617E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 454.820 | TFLOPs: 23.86 | +7: iteration 50060/ 173500 | consumed samples: 12815360 | consumed tokens: 26245857280 | elapsed time per iteration (s): 0.42 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 3.023775E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.871 | TFLOPs: 32.31 | +7: iteration 50070/ 173500 | consumed samples: 12817920 | consumed tokens: 26251100160 | elapsed time per iteration (s): 0.42 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 3.004341E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.429 | TFLOPs: 31.61 | +7: iteration 50080/ 173500 | consumed samples: 12820480 | consumed tokens: 26256343040 | elapsed time per iteration (s): 0.42 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.009025E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.128 | TFLOPs: 31.80 | +7: iteration 50090/ 173500 | consumed samples: 12823040 | consumed tokens: 26261585920 | elapsed time per iteration (s): 0.42 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.019866E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.793 | TFLOPs: 31.99 | +7: iteration 50100/ 173500 | consumed samples: 12825600 | consumed tokens: 26266828800 | elapsed time per iteration (s): 0.42 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.013143E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.395 | TFLOPs: 31.97 | +7: iteration 50110/ 173500 | consumed samples: 12828160 | consumed tokens: 26272071680 | elapsed time per iteration (s): 0.42 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.023755E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.725 | TFLOPs: 31.78 | +7: iteration 50120/ 173500 | consumed samples: 12830720 | consumed tokens: 26277314560 | elapsed time per iteration (s): 0.42 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.016215E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.828 | TFLOPs: 31.68 | +7: iteration 50130/ 173500 | consumed samples: 12833280 | consumed tokens: 26282557440 | elapsed time per iteration (s): 0.43 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.013364E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.005 | TFLOPs: 31.53 | +7: iteration 50140/ 173500 | consumed samples: 12835840 | consumed tokens: 26287800320 | elapsed time per iteration (s): 0.43 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.015536E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.969 | TFLOPs: 31.58 | +7: iteration 50150/ 173500 | consumed samples: 12838400 | consumed tokens: 26293043200 | elapsed time per iteration (s): 0.43 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.036803E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.796 | TFLOPs: 31.42 | +7: iteration 50160/ 173500 | consumed samples: 12840960 | consumed tokens: 26298286080 | elapsed time per iteration (s): 0.42 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 3.020630E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.661 | TFLOPs: 31.78 | +7: iteration 50170/ 173500 | consumed samples: 12843520 | consumed tokens: 26303528960 | elapsed time per iteration (s): 0.42 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 3.027457E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.011 | TFLOPs: 31.90 | +7: iteration 50180/ 173500 | consumed samples: 12846080 | consumed tokens: 26308771840 | elapsed time per iteration (s): 0.42 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 3.019336E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.653 | TFLOPs: 31.94 | +7: iteration 50190/ 173500 | consumed samples: 12848640 | consumed tokens: 26314014720 | elapsed time per iteration (s): 0.42 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 3.012929E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.139 | TFLOPs: 31.75 | +7: iteration 50200/ 173500 | consumed samples: 12851200 | consumed tokens: 26319257600 | elapsed time per iteration (s): 0.43 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 3.018365E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.477 | TFLOPs: 30.98 | +7: iteration 50210/ 173500 | consumed samples: 12853760 | consumed tokens: 26324500480 | elapsed time per iteration (s): 0.43 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 3.033413E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.394 | TFLOPs: 31.29 | +7: iteration 50220/ 173500 | consumed samples: 12856320 | consumed tokens: 26329743360 | elapsed time per iteration (s): 0.43 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 3.003063E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.397 | TFLOPs: 31.55 | +7: iteration 50230/ 173500 | consumed samples: 12858880 | consumed tokens: 26334986240 | elapsed time per iteration (s): 0.43 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 3.014353E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.006 | TFLOPs: 31.48 | +7: iteration 50240/ 173500 | consumed samples: 12861440 | consumed tokens: 26340229120 | elapsed time per iteration (s): 0.42 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 3.007637E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.162 | TFLOPs: 31.80 | +7: iteration 50250/ 173500 | consumed samples: 12864000 | consumed tokens: 26345472000 | elapsed time per iteration (s): 0.43 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 2.999460E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.834 | TFLOPs: 31.31 | +7: iteration 50260/ 173500 | consumed samples: 12866560 | consumed tokens: 26350714880 | elapsed time per iteration (s): 0.43 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 3.029655E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.934 | TFLOPs: 31.58 | +7: iteration 50270/ 173500 | consumed samples: 12869120 | consumed tokens: 26355957760 | elapsed time per iteration (s): 0.42 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 3.031805E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.827 | TFLOPs: 31.84 | +7: iteration 50280/ 173500 | consumed samples: 12871680 | consumed tokens: 26361200640 | elapsed time per iteration (s): 0.42 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 3.026601E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.666 | TFLOPs: 31.73 | +7: iteration 50290/ 173500 | consumed samples: 12874240 | consumed tokens: 26366443520 | elapsed time per iteration (s): 0.43 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 3.030160E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.423 | TFLOPs: 31.45 | +7: iteration 50300/ 173500 | consumed samples: 12876800 | consumed tokens: 26371686400 | elapsed time per iteration (s): 0.42 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 3.011389E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.636 | TFLOPs: 31.88 | +7: iteration 50310/ 173500 | consumed samples: 12879360 | consumed tokens: 26376929280 | elapsed time per iteration (s): 0.43 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 3.028513E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.976 | TFLOPs: 31.27 | +7: iteration 50320/ 173500 | consumed samples: 12881920 | consumed tokens: 26382172160 | elapsed time per iteration (s): 0.42 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 3.019008E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.315 | TFLOPs: 31.86 | +7: iteration 50330/ 173500 | consumed samples: 12884480 | consumed tokens: 26387415040 | elapsed time per iteration (s): 0.42 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 3.014941E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.991 | TFLOPs: 31.95 | +7: iteration 50340/ 173500 | consumed samples: 12887040 | consumed tokens: 26392657920 | elapsed time per iteration (s): 0.43 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 3.021033E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.358 | TFLOPs: 31.55 | +7: iteration 50350/ 173500 | consumed samples: 12889600 | consumed tokens: 26397900800 | elapsed time per iteration (s): 0.42 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 3.022527E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.614 | TFLOPs: 32.09 | +7: iteration 50360/ 173500 | consumed samples: 12892160 | consumed tokens: 26403143680 | elapsed time per iteration (s): 0.43 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 3.012994E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.619 | TFLOPs: 31.09 | +7: iteration 50370/ 173500 | consumed samples: 12894720 | consumed tokens: 26408386560 | elapsed time per iteration (s): 0.43 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 3.013324E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.653 | TFLOPs: 31.57 | +7: iteration 50380/ 173500 | consumed samples: 12897280 | consumed tokens: 26413629440 | elapsed time per iteration (s): 0.44 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 3.020677E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.429 | TFLOPs: 30.51 | +7: iteration 50390/ 173500 | consumed samples: 12899840 | consumed tokens: 26418872320 | elapsed time per iteration (s): 0.43 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 3.016145E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.643 | TFLOPs: 31.25 | +7: iteration 50400/ 173500 | consumed samples: 12902400 | consumed tokens: 26424115200 | elapsed time per iteration (s): 0.43 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.013886E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.987 | TFLOPs: 31.01 | +7: iteration 50410/ 173500 | consumed samples: 12904960 | consumed tokens: 26429358080 | elapsed time per iteration (s): 0.43 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.021017E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.241 | TFLOPs: 31.49 | +7: iteration 50420/ 173500 | consumed samples: 12907520 | consumed tokens: 26434600960 | elapsed time per iteration (s): 0.43 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.004621E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.458 | TFLOPs: 31.51 | +7: iteration 50430/ 173500 | consumed samples: 12910080 | consumed tokens: 26439843840 | elapsed time per iteration (s): 0.42 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.023511E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.031 | TFLOPs: 31.95 | +7: iteration 50440/ 173500 | consumed samples: 12912640 | consumed tokens: 26445086720 | elapsed time per iteration (s): 0.42 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.010556E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.974 | TFLOPs: 31.64 | +7: iteration 50450/ 173500 | consumed samples: 12915200 | consumed tokens: 26450329600 | elapsed time per iteration (s): 0.43 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.022267E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.409 | TFLOPs: 31.03 | +7: iteration 50460/ 173500 | consumed samples: 12917760 | consumed tokens: 26455572480 | elapsed time per iteration (s): 0.42 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.026904E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.841 | TFLOPs: 32.10 | +7: iteration 50470/ 173500 | consumed samples: 12920320 | consumed tokens: 26460815360 | elapsed time per iteration (s): 0.42 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.016152E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.911 | TFLOPs: 31.84 | +7: iteration 50480/ 173500 | consumed samples: 12922880 | consumed tokens: 26466058240 | elapsed time per iteration (s): 0.42 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 3.012099E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.798 | TFLOPs: 31.68 | +7: iteration 50490/ 173500 | consumed samples: 12925440 | consumed tokens: 26471301120 | elapsed time per iteration (s): 0.43 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 3.022998E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.667 | TFLOPs: 31.52 | +7: iteration 50500/ 173500 | consumed samples: 12928000 | consumed tokens: 26476544000 | elapsed time per iteration (s): 0.43 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 3.014902E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.545 | TFLOPs: 31.35 | +7: iteration 50510/ 173500 | consumed samples: 12930560 | consumed tokens: 26481786880 | elapsed time per iteration (s): 0.42 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 3.012525E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.477 | TFLOPs: 31.93 | +7: iteration 50520/ 173500 | consumed samples: 12933120 | consumed tokens: 26487029760 | elapsed time per iteration (s): 0.42 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 3.021688E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.382 | TFLOPs: 31.61 | +7: iteration 50530/ 173500 | consumed samples: 12935680 | consumed tokens: 26492272640 | elapsed time per iteration (s): 0.43 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 3.019625E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.533 | TFLOPs: 31.30 | +7: iteration 50540/ 173500 | consumed samples: 12938240 | consumed tokens: 26497515520 | elapsed time per iteration (s): 0.42 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 3.002204E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.221 | TFLOPs: 31.70 | +7: iteration 50550/ 173500 | consumed samples: 12940800 | consumed tokens: 26502758400 | elapsed time per iteration (s): 0.42 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.017866E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.642 | TFLOPs: 31.62 | +7: iteration 50560/ 173500 | consumed samples: 12943360 | consumed tokens: 26508001280 | elapsed time per iteration (s): 0.42 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.005228E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.897 | TFLOPs: 31.84 | +7: iteration 50570/ 173500 | consumed samples: 12945920 | consumed tokens: 26513244160 | elapsed time per iteration (s): 0.43 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.018832E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.389 | TFLOPs: 31.55 | +7: iteration 50580/ 173500 | consumed samples: 12948480 | consumed tokens: 26518487040 | elapsed time per iteration (s): 0.42 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.016510E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.909 | TFLOPs: 31.84 | +7: iteration 50590/ 173500 | consumed samples: 12951040 | consumed tokens: 26523729920 | elapsed time per iteration (s): 0.43 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.026628E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.689 | TFLOPs: 31.57 | +7: iteration 50600/ 173500 | consumed samples: 12953600 | consumed tokens: 26528972800 | elapsed time per iteration (s): 0.42 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.000208E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.571 | TFLOPs: 31.93 | +7: iteration 50610/ 173500 | consumed samples: 12956160 | consumed tokens: 26534215680 | elapsed time per iteration (s): 0.42 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.018877E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.011 | TFLOPs: 31.74 | +7: iteration 50620/ 173500 | consumed samples: 12958720 | consumed tokens: 26539458560 | elapsed time per iteration (s): 0.43 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.018403E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.402 | TFLOPs: 31.34 | +7: iteration 50630/ 173500 | consumed samples: 12961280 | consumed tokens: 26544701440 | elapsed time per iteration (s): 0.43 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 3.025994E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.756 | TFLOPs: 31.36 | +7: iteration 50640/ 173500 | consumed samples: 12963840 | consumed tokens: 26549944320 | elapsed time per iteration (s): 0.43 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 3.018712E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.068 | TFLOPs: 31.38 | +7: iteration 50650/ 173500 | consumed samples: 12966400 | consumed tokens: 26555187200 | elapsed time per iteration (s): 0.43 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 3.022216E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.993 | TFLOPs: 31.38 | +7: iteration 50660/ 173500 | consumed samples: 12968960 | consumed tokens: 26560430080 | elapsed time per iteration (s): 0.42 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 3.010183E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.435 | TFLOPs: 32.08 | +7: iteration 50670/ 173500 | consumed samples: 12971520 | consumed tokens: 26565672960 | elapsed time per iteration (s): 0.42 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 3.013806E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.994 | TFLOPs: 32.06 | +7: iteration 50680/ 173500 | consumed samples: 12974080 | consumed tokens: 26570915840 | elapsed time per iteration (s): 0.43 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 3.025398E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.393 | TFLOPs: 31.55 | +7: iteration 50690/ 173500 | consumed samples: 12976640 | consumed tokens: 26576158720 | elapsed time per iteration (s): 0.42 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 3.020882E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.346 | TFLOPs: 31.81 | +7: iteration 50700/ 173500 | consumed samples: 12979200 | consumed tokens: 26581401600 | elapsed time per iteration (s): 0.42 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 3.013611E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.555 | TFLOPs: 31.77 | +7: iteration 50710/ 173500 | consumed samples: 12981760 | consumed tokens: 26586644480 | elapsed time per iteration (s): 0.42 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.021020E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.037 | TFLOPs: 31.64 | +7: iteration 50720/ 173500 | consumed samples: 12984320 | consumed tokens: 26591887360 | elapsed time per iteration (s): 0.42 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.022462E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.188 | TFLOPs: 31.70 | +7: iteration 50730/ 173500 | consumed samples: 12986880 | consumed tokens: 26597130240 | elapsed time per iteration (s): 0.44 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.016967E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.223 | TFLOPs: 30.76 | +7: iteration 50740/ 173500 | consumed samples: 12989440 | consumed tokens: 26602373120 | elapsed time per iteration (s): 0.43 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.019233E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.280 | TFLOPs: 31.23 | +7: iteration 50750/ 173500 | consumed samples: 12992000 | consumed tokens: 26607616000 | elapsed time per iteration (s): 0.42 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.006690E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.415 | TFLOPs: 31.77 | +7: iteration 50760/ 173500 | consumed samples: 12994560 | consumed tokens: 26612858880 | elapsed time per iteration (s): 0.42 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.010888E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.583 | TFLOPs: 32.09 | +7: iteration 50770/ 173500 | consumed samples: 12997120 | consumed tokens: 26618101760 | elapsed time per iteration (s): 0.43 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.027308E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.568 | TFLOPs: 31.35 | +7: iteration 50780/ 173500 | consumed samples: 12999680 | consumed tokens: 26623344640 | elapsed time per iteration (s): 0.43 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.002652E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.328 | TFLOPs: 31.55 | +7: iteration 50790/ 173500 | consumed samples: 13002240 | consumed tokens: 26628587520 | elapsed time per iteration (s): 0.43 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 3.020461E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.751 | TFLOPs: 31.52 | +7: iteration 50800/ 173500 | consumed samples: 13004800 | consumed tokens: 26633830400 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 2.997349E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.470 | TFLOPs: 31.93 | +7: iteration 50810/ 173500 | consumed samples: 13007360 | consumed tokens: 26639073280 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 3.022583E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.532 | TFLOPs: 32.03 | +7: iteration 50820/ 173500 | consumed samples: 13009920 | consumed tokens: 26644316160 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 3.031662E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.585 | TFLOPs: 31.83 | +7: iteration 50830/ 173500 | consumed samples: 13012480 | consumed tokens: 26649559040 | elapsed time per iteration (s): 0.43 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 3.015866E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.349 | TFLOPs: 31.55 | +7: iteration 50840/ 173500 | consumed samples: 13015040 | consumed tokens: 26654801920 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 3.012544E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.984 | TFLOPs: 31.85 | +7: iteration 50850/ 173500 | consumed samples: 13017600 | consumed tokens: 26660044800 | elapsed time per iteration (s): 0.43 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 3.019540E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.652 | TFLOPs: 31.41 | +7: iteration 50860/ 173500 | consumed samples: 13020160 | consumed tokens: 26665287680 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 3.018160E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.401 | TFLOPs: 31.71 | +7: iteration 50870/ 173500 | consumed samples: 13022720 | consumed tokens: 26670530560 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 3.002505E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.931 | TFLOPs: 31.74 | +7: iteration 50880/ 173500 | consumed samples: 13025280 | consumed tokens: 26675773440 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 3.010629E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.948 | TFLOPs: 31.69 | +7: iteration 50890/ 173500 | consumed samples: 13027840 | consumed tokens: 26681016320 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 3.021184E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.215 | TFLOPs: 31.75 | +7: iteration 50900/ 173500 | consumed samples: 13030400 | consumed tokens: 26686259200 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 3.008140E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.139 | TFLOPs: 31.86 | +7: iteration 50910/ 173500 | consumed samples: 13032960 | consumed tokens: 26691502080 | elapsed time per iteration (s): 0.43 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 3.017279E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.140 | TFLOPs: 31.33 | +7: iteration 50920/ 173500 | consumed samples: 13035520 | consumed tokens: 26696744960 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 3.023013E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.917 | TFLOPs: 31.74 | +7: iteration 50930/ 173500 | consumed samples: 13038080 | consumed tokens: 26701987840 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 3.027641E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.843 | TFLOPs: 31.63 | +7: iteration 50940/ 173500 | consumed samples: 13040640 | consumed tokens: 26707230720 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 3.004629E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.954 | TFLOPs: 32.06 | +7: iteration 50950/ 173500 | consumed samples: 13043200 | consumed tokens: 26712473600 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 3.014627E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.543 | TFLOPs: 31.82 | +7: iteration 50960/ 173500 | consumed samples: 13045760 | consumed tokens: 26717716480 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 3.021887E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.235 | TFLOPs: 32.02 | +7: iteration 50970/ 173500 | consumed samples: 13048320 | consumed tokens: 26722959360 | elapsed time per iteration (s): 0.43 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 3.025767E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.167 | TFLOPs: 31.07 | +7: iteration 50980/ 173500 | consumed samples: 13050880 | consumed tokens: 26728202240 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 3.000696E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.657 | TFLOPs: 31.88 | +7: iteration 50990/ 173500 | consumed samples: 13053440 | consumed tokens: 26733445120 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 3.013068E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.686 | TFLOPs: 31.62 | +7: iteration 51000/ 173500 | consumed samples: 13056000 | consumed tokens: 26738688000 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 3.015187E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.096 | TFLOPs: 31.70 | +7: iteration 51010/ 173500 | consumed samples: 13058560 | consumed tokens: 26743930880 | elapsed time per iteration (s): 0.43 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 3.014627E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.450 | TFLOPs: 31.56 | +7: iteration 51020/ 173500 | consumed samples: 13061120 | consumed tokens: 26749173760 | elapsed time per iteration (s): 0.42 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.011672E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.874 | TFLOPs: 31.63 | +7: iteration 51030/ 173500 | consumed samples: 13063680 | consumed tokens: 26754416640 | elapsed time per iteration (s): 0.42 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.017034E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.168 | TFLOPs: 32.07 | +7: iteration 51040/ 173500 | consumed samples: 13066240 | consumed tokens: 26759659520 | elapsed time per iteration (s): 0.42 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.010223E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.984 | TFLOPs: 31.79 | +7: iteration 51050/ 173500 | consumed samples: 13068800 | consumed tokens: 26764902400 | elapsed time per iteration (s): 0.42 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.019020E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.645 | TFLOPs: 31.78 | +7: iteration 51060/ 173500 | consumed samples: 13071360 | consumed tokens: 26770145280 | elapsed time per iteration (s): 0.43 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.020601E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.628 | TFLOPs: 31.15 | +7: iteration 51070/ 173500 | consumed samples: 13073920 | consumed tokens: 26775388160 | elapsed time per iteration (s): 0.44 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.021685E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.123 | TFLOPs: 30.70 | +7: iteration 51080/ 173500 | consumed samples: 13076480 | consumed tokens: 26780631040 | elapsed time per iteration (s): 0.43 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.011565E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.207 | TFLOPs: 31.60 | +7: iteration 51090/ 173500 | consumed samples: 13079040 | consumed tokens: 26785873920 | elapsed time per iteration (s): 0.42 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.013182E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.768 | TFLOPs: 31.94 | +7: iteration 51100/ 173500 | consumed samples: 13081600 | consumed tokens: 26791116800 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 3.024473E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.417 | TFLOPs: 31.92 | +7: iteration 51110/ 173500 | consumed samples: 13084160 | consumed tokens: 26796359680 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 3.008625E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.112 | TFLOPs: 31.91 | +7: iteration 51120/ 173500 | consumed samples: 13086720 | consumed tokens: 26801602560 | elapsed time per iteration (s): 0.43 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 3.016464E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.135 | TFLOPs: 31.28 | +7: iteration 51130/ 173500 | consumed samples: 13089280 | consumed tokens: 26806845440 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 3.016118E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.826 | TFLOPs: 31.89 | +7: iteration 51140/ 173500 | consumed samples: 13091840 | consumed tokens: 26812088320 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 3.010512E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.971 | TFLOPs: 31.74 | +7: iteration 51150/ 173500 | consumed samples: 13094400 | consumed tokens: 26817331200 | elapsed time per iteration (s): 0.43 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 3.004067E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.223 | TFLOPs: 31.13 | +7: iteration 51160/ 173500 | consumed samples: 13096960 | consumed tokens: 26822574080 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 3.022402E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.802 | TFLOPs: 32.10 | +7: iteration 51170/ 173500 | consumed samples: 13099520 | consumed tokens: 26827816960 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 3.015933E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.682 | TFLOPs: 31.62 | +7: iteration 51180/ 173500 | consumed samples: 13102080 | consumed tokens: 26833059840 | elapsed time per iteration (s): 0.42 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 3.022470E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.228 | TFLOPs: 31.86 | +7: iteration 51190/ 173500 | consumed samples: 13104640 | consumed tokens: 26838302720 | elapsed time per iteration (s): 0.43 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 3.018442E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.097 | TFLOPs: 31.54 | +7: iteration 51200/ 173500 | consumed samples: 13107200 | consumed tokens: 26843545600 | elapsed time per iteration (s): 0.43 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 3.010995E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.053 | TFLOPs: 31.54 | +7: iteration 51210/ 173500 | consumed samples: 13109760 | consumed tokens: 26848788480 | elapsed time per iteration (s): 0.43 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 3.015913E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.632 | TFLOPs: 31.30 | +7: iteration 51220/ 173500 | consumed samples: 13112320 | consumed tokens: 26854031360 | elapsed time per iteration (s): 0.44 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 3.001872E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.455 | TFLOPs: 30.67 | +7: iteration 51230/ 173500 | consumed samples: 13114880 | consumed tokens: 26859274240 | elapsed time per iteration (s): 0.42 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 3.023264E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.681 | TFLOPs: 32.09 | +7: iteration 51240/ 173500 | consumed samples: 13117440 | consumed tokens: 26864517120 | elapsed time per iteration (s): 0.43 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 2.998035E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.516 | TFLOPs: 31.40 | +7: iteration 51250/ 173500 | consumed samples: 13120000 | consumed tokens: 26869760000 | elapsed time per iteration (s): 0.44 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 3.003746E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.620 | TFLOPs: 30.25 | +7: iteration 51260/ 173500 | consumed samples: 13122560 | consumed tokens: 26875002880 | elapsed time per iteration (s): 0.44 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 3.016823E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.976 | TFLOPs: 30.69 | +7: iteration 51270/ 173500 | consumed samples: 13125120 | consumed tokens: 26880245760 | elapsed time per iteration (s): 0.46 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 3.014332E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.187 | TFLOPs: 29.44 | +7: iteration 51280/ 173500 | consumed samples: 13127680 | consumed tokens: 26885488640 | elapsed time per iteration (s): 0.43 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 3.003537E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.028 | TFLOPs: 31.01 | +7: iteration 51290/ 173500 | consumed samples: 13130240 | consumed tokens: 26890731520 | elapsed time per iteration (s): 0.44 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 3.029187E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.379 | TFLOPs: 30.61 | +7: iteration 51300/ 173500 | consumed samples: 13132800 | consumed tokens: 26895974400 | elapsed time per iteration (s): 0.46 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 3.005386E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.840 | TFLOPs: 29.48 | +7: iteration 51310/ 173500 | consumed samples: 13135360 | consumed tokens: 26901217280 | elapsed time per iteration (s): 0.43 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 2.998615E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.045 | TFLOPs: 31.27 | +7: iteration 51320/ 173500 | consumed samples: 13137920 | consumed tokens: 26906460160 | elapsed time per iteration (s): 0.45 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 3.018785E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.392 | TFLOPs: 29.88 | +7: iteration 51330/ 173500 | consumed samples: 13140480 | consumed tokens: 26911703040 | elapsed time per iteration (s): 0.45 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.020255E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.283 | TFLOPs: 29.82 | +7: iteration 51340/ 173500 | consumed samples: 13143040 | consumed tokens: 26916945920 | elapsed time per iteration (s): 0.43 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.013798E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.850 | TFLOPs: 31.00 | +7: iteration 51350/ 173500 | consumed samples: 13145600 | consumed tokens: 26922188800 | elapsed time per iteration (s): 0.44 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.006468E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.345 | TFLOPs: 30.29 | +7: iteration 51360/ 173500 | consumed samples: 13148160 | consumed tokens: 26927431680 | elapsed time per iteration (s): 0.43 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.013046E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.741 | TFLOPs: 31.21 | +7: iteration 51370/ 173500 | consumed samples: 13150720 | consumed tokens: 26932674560 | elapsed time per iteration (s): 0.45 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.009227E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.457 | TFLOPs: 29.67 | +7: iteration 51380/ 173500 | consumed samples: 13153280 | consumed tokens: 26937917440 | elapsed time per iteration (s): 0.45 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.021229E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.751 | TFLOPs: 29.79 | +7: iteration 51390/ 173500 | consumed samples: 13155840 | consumed tokens: 26943160320 | elapsed time per iteration (s): 0.45 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.022642E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.207 | TFLOPs: 29.87 | +7: iteration 51400/ 173500 | consumed samples: 13158400 | consumed tokens: 26948403200 | elapsed time per iteration (s): 0.43 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.016711E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.424 | TFLOPs: 30.98 | +7: iteration 51410/ 173500 | consumed samples: 13160960 | consumed tokens: 26953646080 | elapsed time per iteration (s): 0.42 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 3.018125E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.682 | TFLOPs: 31.99 | +7: iteration 51420/ 173500 | consumed samples: 13163520 | consumed tokens: 26958888960 | elapsed time per iteration (s): 0.42 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 3.024030E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.461 | TFLOPs: 31.92 | +7: iteration 51430/ 173500 | consumed samples: 13166080 | consumed tokens: 26964131840 | elapsed time per iteration (s): 0.42 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 3.019012E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.361 | TFLOPs: 31.92 | +7: iteration 51440/ 173500 | consumed samples: 13168640 | consumed tokens: 26969374720 | elapsed time per iteration (s): 0.42 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 3.006232E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.787 | TFLOPs: 31.89 | +7: iteration 51450/ 173500 | consumed samples: 13171200 | consumed tokens: 26974617600 | elapsed time per iteration (s): 0.42 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 3.006050E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.280 | TFLOPs: 31.81 | +7: iteration 51460/ 173500 | consumed samples: 13173760 | consumed tokens: 26979860480 | elapsed time per iteration (s): 0.42 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 3.010732E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.838 | TFLOPs: 32.10 | +7: iteration 51470/ 173500 | consumed samples: 13176320 | consumed tokens: 26985103360 | elapsed time per iteration (s): 0.43 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 3.020705E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.452 | TFLOPs: 31.40 | +7: iteration 51480/ 173500 | consumed samples: 13178880 | consumed tokens: 26990346240 | elapsed time per iteration (s): 0.42 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 3.017517E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.604 | TFLOPs: 31.83 | +7: iteration 51490/ 173500 | consumed samples: 13181440 | consumed tokens: 26995589120 | elapsed time per iteration (s): 0.42 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 3.015682E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.604 | TFLOPs: 31.62 | +7: iteration 51500/ 173500 | consumed samples: 13184000 | consumed tokens: 27000832000 | elapsed time per iteration (s): 0.42 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 3.025253E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.637 | TFLOPs: 31.78 | +7: iteration 51510/ 173500 | consumed samples: 13186560 | consumed tokens: 27006074880 | elapsed time per iteration (s): 0.42 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 3.012889E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.546 | TFLOPs: 31.77 | +7: iteration 51520/ 173500 | consumed samples: 13189120 | consumed tokens: 27011317760 | elapsed time per iteration (s): 0.42 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 2.992876E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.130 | TFLOPs: 31.86 | +7: iteration 51530/ 173500 | consumed samples: 13191680 | consumed tokens: 27016560640 | elapsed time per iteration (s): 0.42 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 3.012255E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.054 | TFLOPs: 32.06 | +7: iteration 51540/ 173500 | consumed samples: 13194240 | consumed tokens: 27021803520 | elapsed time per iteration (s): 0.43 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 3.007230E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.717 | TFLOPs: 31.57 | +7: iteration 51550/ 173500 | consumed samples: 13196800 | consumed tokens: 27027046400 | elapsed time per iteration (s): 0.42 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 3.013443E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.583 | TFLOPs: 31.88 | +7: iteration 51560/ 173500 | consumed samples: 13199360 | consumed tokens: 27032289280 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 3.016413E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.559 | TFLOPs: 31.88 | +7: iteration 51570/ 173500 | consumed samples: 13201920 | consumed tokens: 27037532160 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 3.025508E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.690 | TFLOPs: 31.94 | +7: iteration 51580/ 173500 | consumed samples: 13204480 | consumed tokens: 27042775040 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 3.006472E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.133 | TFLOPs: 32.07 | +7: iteration 51590/ 173500 | consumed samples: 13207040 | consumed tokens: 27048017920 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 3.005880E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.860 | TFLOPs: 31.89 | +7: iteration 51600/ 173500 | consumed samples: 13209600 | consumed tokens: 27053260800 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 2.998118E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.731 | TFLOPs: 32.04 | +7: iteration 51610/ 173500 | consumed samples: 13212160 | consumed tokens: 27058503680 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 3.018490E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.686 | TFLOPs: 31.99 | +7: iteration 51620/ 173500 | consumed samples: 13214720 | consumed tokens: 27063746560 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 3.003275E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.656 | TFLOPs: 31.78 | +7: iteration 51630/ 173500 | consumed samples: 13217280 | consumed tokens: 27068989440 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 3.023920E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.228 | TFLOPs: 31.76 | +7: iteration 51640/ 173500 | consumed samples: 13219840 | consumed tokens: 27074232320 | elapsed time per iteration (s): 0.42 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 3.000346E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.979 | TFLOPs: 31.74 | +7: iteration 51650/ 173500 | consumed samples: 13222400 | consumed tokens: 27079475200 | elapsed time per iteration (s): 0.42 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 3.011318E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.637 | TFLOPs: 31.83 | +7: iteration 51660/ 173500 | consumed samples: 13224960 | consumed tokens: 27084718080 | elapsed time per iteration (s): 0.42 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 3.008510E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.255 | TFLOPs: 31.86 | +7: iteration 51670/ 173500 | consumed samples: 13227520 | consumed tokens: 27089960960 | elapsed time per iteration (s): 0.42 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 3.002995E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.225 | TFLOPs: 31.70 | +7: iteration 51680/ 173500 | consumed samples: 13230080 | consumed tokens: 27095203840 | elapsed time per iteration (s): 0.42 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 3.029704E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.946 | TFLOPs: 32.00 | +7: iteration 51690/ 173500 | consumed samples: 13232640 | consumed tokens: 27100446720 | elapsed time per iteration (s): 0.42 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 3.008478E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.610 | TFLOPs: 31.83 | +7: iteration 51700/ 173500 | consumed samples: 13235200 | consumed tokens: 27105689600 | elapsed time per iteration (s): 0.42 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 3.005995E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.334 | TFLOPs: 31.92 | +7: iteration 51710/ 173500 | consumed samples: 13237760 | consumed tokens: 27110932480 | elapsed time per iteration (s): 0.42 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 3.016905E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.953 | TFLOPs: 32.06 | +7: iteration 51720/ 173500 | consumed samples: 13240320 | consumed tokens: 27116175360 | elapsed time per iteration (s): 0.42 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 3.002369E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.395 | TFLOPs: 31.82 | +7: iteration 51730/ 173500 | consumed samples: 13242880 | consumed tokens: 27121418240 | elapsed time per iteration (s): 0.43 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 3.009380E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.336 | TFLOPs: 31.45 | +7: iteration 51740/ 173500 | consumed samples: 13245440 | consumed tokens: 27126661120 | elapsed time per iteration (s): 0.42 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 3.017319E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.760 | TFLOPs: 31.78 | +7: iteration 51750/ 173500 | consumed samples: 13248000 | consumed tokens: 27131904000 | elapsed time per iteration (s): 0.43 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 3.013735E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.039 | TFLOPs: 31.59 | +7: iteration 51760/ 173500 | consumed samples: 13250560 | consumed tokens: 27137146880 | elapsed time per iteration (s): 0.42 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 3.006424E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.541 | TFLOPs: 32.09 | +7: iteration 51770/ 173500 | consumed samples: 13253120 | consumed tokens: 27142389760 | elapsed time per iteration (s): 0.42 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 3.005005E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.550 | TFLOPs: 31.72 | +7: iteration 51780/ 173500 | consumed samples: 13255680 | consumed tokens: 27147632640 | elapsed time per iteration (s): 0.42 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 3.002196E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.833 | TFLOPs: 31.84 | +7: iteration 51790/ 173500 | consumed samples: 13258240 | consumed tokens: 27152875520 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 3.014550E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.025 | TFLOPs: 31.85 | +7: iteration 51800/ 173500 | consumed samples: 13260800 | consumed tokens: 27158118400 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 3.017881E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.677 | TFLOPs: 31.83 | +7: iteration 51810/ 173500 | consumed samples: 13263360 | consumed tokens: 27163361280 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 3.004047E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.218 | TFLOPs: 31.91 | +7: iteration 51820/ 173500 | consumed samples: 13265920 | consumed tokens: 27168604160 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 3.013165E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.179 | TFLOPs: 31.91 | +7: iteration 51830/ 173500 | consumed samples: 13268480 | consumed tokens: 27173847040 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 2.998583E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.879 | TFLOPs: 31.74 | +7: iteration 51840/ 173500 | consumed samples: 13271040 | consumed tokens: 27179089920 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 3.020744E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.581 | TFLOPs: 31.83 | +7: iteration 51850/ 173500 | consumed samples: 13273600 | consumed tokens: 27184332800 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 3.015242E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.314 | TFLOPs: 31.81 | +7: iteration 51860/ 173500 | consumed samples: 13276160 | consumed tokens: 27189575680 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 3.024647E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.030 | TFLOPs: 31.85 | +7: iteration 51870/ 173500 | consumed samples: 13278720 | consumed tokens: 27194818560 | elapsed time per iteration (s): 0.42 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 3.001746E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.750 | TFLOPs: 31.73 | +7: iteration 51880/ 173500 | consumed samples: 13281280 | consumed tokens: 27200061440 | elapsed time per iteration (s): 0.42 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 3.019226E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.641 | TFLOPs: 32.04 | +7: iteration 51890/ 173500 | consumed samples: 13283840 | consumed tokens: 27205304320 | elapsed time per iteration (s): 0.43 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 2.997247E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.470 | TFLOPs: 31.56 | +7: iteration 51900/ 173500 | consumed samples: 13286400 | consumed tokens: 27210547200 | elapsed time per iteration (s): 0.43 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 3.012860E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.273 | TFLOPs: 31.60 | +7: iteration 51910/ 173500 | consumed samples: 13288960 | consumed tokens: 27215790080 | elapsed time per iteration (s): 0.42 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 3.013363E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.713 | TFLOPs: 31.62 | +7: iteration 51920/ 173500 | consumed samples: 13291520 | consumed tokens: 27221032960 | elapsed time per iteration (s): 0.42 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 3.018404E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.008 | TFLOPs: 31.74 | +7: iteration 51930/ 173500 | consumed samples: 13294080 | consumed tokens: 27226275840 | elapsed time per iteration (s): 0.43 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 3.014289E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.352 | TFLOPs: 31.50 | +7: iteration 51940/ 173500 | consumed samples: 13296640 | consumed tokens: 27231518720 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.003474E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.767 | TFLOPs: 32.05 | +7: iteration 51950/ 173500 | consumed samples: 13299200 | consumed tokens: 27236761600 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.022081E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.887 | TFLOPs: 31.68 | +7: iteration 51960/ 173500 | consumed samples: 13301760 | consumed tokens: 27242004480 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.019925E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.526 | TFLOPs: 32.03 | +7: iteration 51970/ 173500 | consumed samples: 13304320 | consumed tokens: 27247247360 | elapsed time per iteration (s): 0.43 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.021197E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.180 | TFLOPs: 31.12 | +7: iteration 51980/ 173500 | consumed samples: 13306880 | consumed tokens: 27252490240 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.025894E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.328 | TFLOPs: 32.08 | +7: iteration 51990/ 173500 | consumed samples: 13309440 | consumed tokens: 27257733120 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.017883E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.434 | TFLOPs: 32.03 | +0: [2023-03-17 05:21:27,890] [INFO] [logging.py:68:log_dist] [Rank 0] step=52000, skipped=0, lr=[0.00016457056203724818, 0.00016457056203724818, 0.00016457056203724818], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 52000/ 173500 | consumed samples: 13312000 | consumed tokens: 27262976000 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.020646E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.934 | TFLOPs: 31.90 | +0: steps: 52000 loss: 3.0399 iter time (s): 0.426 samples/sec: 600.470 +7: iteration 52010/ 173500 | consumed samples: 13314560 | consumed tokens: 27268218880 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.025255E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.646 | TFLOPs: 31.99 | +7: iteration 52020/ 173500 | consumed samples: 13317120 | consumed tokens: 27273461760 | elapsed time per iteration (s): 0.42 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 3.002826E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.254 | TFLOPs: 31.76 | +7: iteration 52030/ 173500 | consumed samples: 13319680 | consumed tokens: 27278704640 | elapsed time per iteration (s): 0.42 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 3.018066E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.709 | TFLOPs: 32.04 | +7: iteration 52040/ 173500 | consumed samples: 13322240 | consumed tokens: 27283947520 | elapsed time per iteration (s): 0.42 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 3.014620E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.710 | TFLOPs: 31.78 | +7: iteration 52050/ 173500 | consumed samples: 13324800 | consumed tokens: 27289190400 | elapsed time per iteration (s): 0.43 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 3.012694E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.494 | TFLOPs: 31.51 | +7: iteration 52060/ 173500 | consumed samples: 13327360 | consumed tokens: 27294433280 | elapsed time per iteration (s): 0.42 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 3.007664E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.480 | TFLOPs: 31.72 | +7: iteration 52070/ 173500 | consumed samples: 13329920 | consumed tokens: 27299676160 | elapsed time per iteration (s): 0.42 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 3.018154E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.111 | TFLOPs: 31.85 | +7: iteration 52080/ 173500 | consumed samples: 13332480 | consumed tokens: 27304919040 | elapsed time per iteration (s): 0.43 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 3.012536E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.718 | TFLOPs: 31.52 | +7: iteration 52090/ 173500 | consumed samples: 13335040 | consumed tokens: 27310161920 | elapsed time per iteration (s): 0.42 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 3.011919E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.852 | TFLOPs: 31.63 | +7: iteration 52100/ 173500 | consumed samples: 13337600 | consumed tokens: 27315404800 | elapsed time per iteration (s): 0.42 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 3.005444E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.496 | TFLOPs: 31.82 | +7: iteration 52110/ 173500 | consumed samples: 13340160 | consumed tokens: 27320647680 | elapsed time per iteration (s): 0.43 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 3.025777E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.786 | TFLOPs: 31.52 | +7: iteration 52120/ 173500 | consumed samples: 13342720 | consumed tokens: 27325890560 | elapsed time per iteration (s): 0.42 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 3.013444E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.676 | TFLOPs: 31.67 | +7: iteration 52130/ 173500 | consumed samples: 13345280 | consumed tokens: 27331133440 | elapsed time per iteration (s): 0.44 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 3.011943E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.281 | TFLOPs: 30.66 | +7: iteration 52140/ 173500 | consumed samples: 13347840 | consumed tokens: 27336376320 | elapsed time per iteration (s): 0.43 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 3.013015E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.885 | TFLOPs: 31.42 | +7: iteration 52150/ 173500 | consumed samples: 13350400 | consumed tokens: 27341619200 | elapsed time per iteration (s): 0.43 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 3.014457E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.760 | TFLOPs: 31.57 | +7: iteration 52160/ 173500 | consumed samples: 13352960 | consumed tokens: 27346862080 | elapsed time per iteration (s): 0.42 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 3.016965E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.102 | TFLOPs: 32.06 | +7: iteration 52170/ 173500 | consumed samples: 13355520 | consumed tokens: 27352104960 | elapsed time per iteration (s): 0.42 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 3.011033E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.773 | TFLOPs: 31.68 | +7: iteration 52180/ 173500 | consumed samples: 13358080 | consumed tokens: 27357347840 | elapsed time per iteration (s): 0.42 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 3.004255E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.362 | TFLOPs: 31.92 | +7: iteration 52190/ 173500 | consumed samples: 13360640 | consumed tokens: 27362590720 | elapsed time per iteration (s): 0.43 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 3.015097E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.867 | TFLOPs: 31.53 | +7: iteration 52200/ 173500 | consumed samples: 13363200 | consumed tokens: 27367833600 | elapsed time per iteration (s): 0.43 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 3.000715E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.348 | TFLOPs: 31.29 | +7: iteration 52210/ 173500 | consumed samples: 13365760 | consumed tokens: 27373076480 | elapsed time per iteration (s): 0.42 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 3.012891E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.690 | TFLOPs: 31.83 | +7: iteration 52220/ 173500 | consumed samples: 13368320 | consumed tokens: 27378319360 | elapsed time per iteration (s): 0.42 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 3.016094E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.075 | TFLOPs: 31.96 | +7: iteration 52230/ 173500 | consumed samples: 13370880 | consumed tokens: 27383562240 | elapsed time per iteration (s): 0.42 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 3.008043E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.286 | TFLOPs: 31.92 | +7: iteration 52240/ 173500 | consumed samples: 13373440 | consumed tokens: 27388805120 | elapsed time per iteration (s): 0.42 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 3.015496E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.709 | TFLOPs: 31.89 | +7: iteration 52250/ 173500 | consumed samples: 13376000 | consumed tokens: 27394048000 | elapsed time per iteration (s): 0.43 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.020215E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.924 | TFLOPs: 31.58 | +7: iteration 52260/ 173500 | consumed samples: 13378560 | consumed tokens: 27399290880 | elapsed time per iteration (s): 0.42 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.011952E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.485 | TFLOPs: 31.98 | +7: iteration 52270/ 173500 | consumed samples: 13381120 | consumed tokens: 27404533760 | elapsed time per iteration (s): 0.42 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.012762E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.899 | TFLOPs: 32.00 | +7: iteration 52280/ 173500 | consumed samples: 13383680 | consumed tokens: 27409776640 | elapsed time per iteration (s): 0.42 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.013201E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.963 | TFLOPs: 31.69 | +7: iteration 52290/ 173500 | consumed samples: 13386240 | consumed tokens: 27415019520 | elapsed time per iteration (s): 0.42 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.002557E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.737 | TFLOPs: 31.94 | +7: iteration 52300/ 173500 | consumed samples: 13388800 | consumed tokens: 27420262400 | elapsed time per iteration (s): 0.42 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.014817E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.197 | TFLOPs: 31.91 | +7: iteration 52310/ 173500 | consumed samples: 13391360 | consumed tokens: 27425505280 | elapsed time per iteration (s): 0.44 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.011773E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.152 | TFLOPs: 30.70 | +7: iteration 52320/ 173500 | consumed samples: 13393920 | consumed tokens: 27430748160 | elapsed time per iteration (s): 0.43 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.009835E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.081 | TFLOPs: 31.38 | +7: iteration 52330/ 173500 | consumed samples: 13396480 | consumed tokens: 27435991040 | elapsed time per iteration (s): 0.42 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 3.010185E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.844 | TFLOPs: 31.95 | +7: iteration 52340/ 173500 | consumed samples: 13399040 | consumed tokens: 27441233920 | elapsed time per iteration (s): 0.42 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 3.004881E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.543 | TFLOPs: 31.98 | +7: iteration 52350/ 173500 | consumed samples: 13401600 | consumed tokens: 27446476800 | elapsed time per iteration (s): 0.42 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 3.013803E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.707 | TFLOPs: 31.83 | +7: iteration 52360/ 173500 | consumed samples: 13404160 | consumed tokens: 27451719680 | elapsed time per iteration (s): 0.42 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 3.014635E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.111 | TFLOPs: 31.70 | +7: iteration 52370/ 173500 | consumed samples: 13406720 | consumed tokens: 27456962560 | elapsed time per iteration (s): 0.42 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 2.997165E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.463 | TFLOPs: 31.61 | +7: iteration 52380/ 173500 | consumed samples: 13409280 | consumed tokens: 27462205440 | elapsed time per iteration (s): 0.43 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 3.015406E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.923 | TFLOPs: 31.48 | +7: iteration 52390/ 173500 | consumed samples: 13411840 | consumed tokens: 27467448320 | elapsed time per iteration (s): 0.42 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 3.000017E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.179 | TFLOPs: 31.75 | +7: iteration 52400/ 173500 | consumed samples: 13414400 | consumed tokens: 27472691200 | elapsed time per iteration (s): 0.42 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 3.009892E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.005 | TFLOPs: 32.01 | +7: iteration 52410/ 173500 | consumed samples: 13416960 | consumed tokens: 27477934080 | elapsed time per iteration (s): 0.42 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 3.011291E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.831 | TFLOPs: 31.68 | +7: iteration 52420/ 173500 | consumed samples: 13419520 | consumed tokens: 27483176960 | elapsed time per iteration (s): 0.42 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 3.004075E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.412 | TFLOPs: 31.66 | +7: iteration 52430/ 173500 | consumed samples: 13422080 | consumed tokens: 27488419840 | elapsed time per iteration (s): 0.42 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 2.998067E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.489 | TFLOPs: 31.72 | +7: iteration 52440/ 173500 | consumed samples: 13424640 | consumed tokens: 27493662720 | elapsed time per iteration (s): 0.42 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 3.010766E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.175 | TFLOPs: 31.91 | +7: iteration 52450/ 173500 | consumed samples: 13427200 | consumed tokens: 27498905600 | elapsed time per iteration (s): 0.43 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 3.015543E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.278 | TFLOPs: 31.50 | +7: iteration 52460/ 173500 | consumed samples: 13429760 | consumed tokens: 27504148480 | elapsed time per iteration (s): 0.44 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 3.016230E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.690 | TFLOPs: 30.21 | +7: iteration 52470/ 173500 | consumed samples: 13432320 | consumed tokens: 27509391360 | elapsed time per iteration (s): 0.43 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 3.003325E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.907 | TFLOPs: 31.58 | +7: iteration 52480/ 173500 | consumed samples: 13434880 | consumed tokens: 27514634240 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 3.012343E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.636 | TFLOPs: 31.78 | +7: iteration 52490/ 173500 | consumed samples: 13437440 | consumed tokens: 27519877120 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 3.000684E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.892 | TFLOPs: 32.05 | +7: iteration 52500/ 173500 | consumed samples: 13440000 | consumed tokens: 27525120000 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 3.014453E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.011 | TFLOPs: 31.85 | +7: iteration 52510/ 173500 | consumed samples: 13442560 | consumed tokens: 27530362880 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 3.017897E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.997 | TFLOPs: 32.06 | +7: iteration 52520/ 173500 | consumed samples: 13445120 | consumed tokens: 27535605760 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 3.016393E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.462 | TFLOPs: 32.03 | +7: iteration 52530/ 173500 | consumed samples: 13447680 | consumed tokens: 27540848640 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 3.004281E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.522 | TFLOPs: 31.88 | +7: iteration 52540/ 173500 | consumed samples: 13450240 | consumed tokens: 27546091520 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 3.009436E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.183 | TFLOPs: 31.75 | +7: iteration 52550/ 173500 | consumed samples: 13452800 | consumed tokens: 27551334400 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.008764E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.122 | TFLOPs: 31.70 | +7: iteration 52560/ 173500 | consumed samples: 13455360 | consumed tokens: 27556577280 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.018563E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.610 | TFLOPs: 31.83 | +7: iteration 52570/ 173500 | consumed samples: 13457920 | consumed tokens: 27561820160 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.009357E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.514 | TFLOPs: 32.03 | +7: iteration 52580/ 173500 | consumed samples: 13460480 | consumed tokens: 27567063040 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.020532E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.224 | TFLOPs: 31.91 | +7: iteration 52590/ 173500 | consumed samples: 13463040 | consumed tokens: 27572305920 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.020304E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.771 | TFLOPs: 31.78 | +7: iteration 52600/ 173500 | consumed samples: 13465600 | consumed tokens: 27577548800 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.008777E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.167 | TFLOPs: 31.70 | +7: iteration 52610/ 173500 | consumed samples: 13468160 | consumed tokens: 27582791680 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.019340E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.227 | TFLOPs: 31.81 | +7: iteration 52620/ 173500 | consumed samples: 13470720 | consumed tokens: 27588034560 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.016521E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.761 | TFLOPs: 31.89 | +7: iteration 52630/ 173500 | consumed samples: 13473280 | consumed tokens: 27593277440 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 3.009950E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.360 | TFLOPs: 31.60 | +7: iteration 52640/ 173500 | consumed samples: 13475840 | consumed tokens: 27598520320 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 3.030167E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.789 | TFLOPs: 31.73 | +7: iteration 52650/ 173500 | consumed samples: 13478400 | consumed tokens: 27603763200 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 3.016310E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.936 | TFLOPs: 32.05 | +7: iteration 52660/ 173500 | consumed samples: 13480960 | consumed tokens: 27609006080 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 3.016209E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.792 | TFLOPs: 31.89 | +7: iteration 52670/ 173500 | consumed samples: 13483520 | consumed tokens: 27614248960 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 3.015640E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.324 | TFLOPs: 31.81 | +7: iteration 52680/ 173500 | consumed samples: 13486080 | consumed tokens: 27619491840 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 3.005094E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.509 | TFLOPs: 31.77 | +7: iteration 52690/ 173500 | consumed samples: 13488640 | consumed tokens: 27624734720 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 3.012243E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.538 | TFLOPs: 31.98 | +7: iteration 52700/ 173500 | consumed samples: 13491200 | consumed tokens: 27629977600 | elapsed time per iteration (s): 0.42 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.015830E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.996 | TFLOPs: 31.85 | +7: iteration 52710/ 173500 | consumed samples: 13493760 | consumed tokens: 27635220480 | elapsed time per iteration (s): 0.43 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.029101E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.388 | TFLOPs: 31.45 | +7: iteration 52720/ 173500 | consumed samples: 13496320 | consumed tokens: 27640463360 | elapsed time per iteration (s): 0.43 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.018817E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.791 | TFLOPs: 31.21 | +7: iteration 52730/ 173500 | consumed samples: 13498880 | consumed tokens: 27645706240 | elapsed time per iteration (s): 0.42 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.020828E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.751 | TFLOPs: 31.94 | +7: iteration 52740/ 173500 | consumed samples: 13501440 | consumed tokens: 27650949120 | elapsed time per iteration (s): 0.42 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.015390E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.961 | TFLOPs: 31.69 | +7: iteration 52750/ 173500 | consumed samples: 13504000 | consumed tokens: 27656192000 | elapsed time per iteration (s): 0.42 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.008002E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.877 | TFLOPs: 32.05 | +7: iteration 52760/ 173500 | consumed samples: 13506560 | consumed tokens: 27661434880 | elapsed time per iteration (s): 0.42 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.014777E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.743 | TFLOPs: 32.04 | +7: iteration 52770/ 173500 | consumed samples: 13509120 | consumed tokens: 27666677760 | elapsed time per iteration (s): 0.42 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.017009E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.489 | TFLOPs: 32.03 | +7: iteration 52780/ 173500 | consumed samples: 13511680 | consumed tokens: 27671920640 | elapsed time per iteration (s): 0.42 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 3.011425E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.436 | TFLOPs: 31.87 | +7: iteration 52790/ 173500 | consumed samples: 13514240 | consumed tokens: 27677163520 | elapsed time per iteration (s): 0.42 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 3.012767E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.368 | TFLOPs: 32.02 | +7: iteration 52800/ 173500 | consumed samples: 13516800 | consumed tokens: 27682406400 | elapsed time per iteration (s): 0.42 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 2.994543E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.242 | TFLOPs: 31.97 | +7: iteration 52810/ 173500 | consumed samples: 13519360 | consumed tokens: 27687649280 | elapsed time per iteration (s): 0.42 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 3.004589E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.092 | TFLOPs: 31.85 | +7: iteration 52820/ 173500 | consumed samples: 13521920 | consumed tokens: 27692892160 | elapsed time per iteration (s): 0.43 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 3.009284E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.189 | TFLOPs: 31.07 | +7: iteration 52830/ 173500 | consumed samples: 13524480 | consumed tokens: 27698135040 | elapsed time per iteration (s): 0.42 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 3.022769E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.863 | TFLOPs: 31.74 | +7: iteration 52840/ 173500 | consumed samples: 13527040 | consumed tokens: 27703377920 | elapsed time per iteration (s): 0.43 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 3.014592E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.835 | TFLOPs: 31.58 | +7: iteration 52850/ 173500 | consumed samples: 13529600 | consumed tokens: 27708620800 | elapsed time per iteration (s): 0.43 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 3.018359E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.328 | TFLOPs: 31.55 | +7: iteration 52860/ 173500 | consumed samples: 13532160 | consumed tokens: 27713863680 | elapsed time per iteration (s): 0.42 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 3.022979E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.854 | TFLOPs: 32.05 | +7: iteration 52870/ 173500 | consumed samples: 13534720 | consumed tokens: 27719106560 | elapsed time per iteration (s): 0.42 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 3.026068E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.518 | TFLOPs: 31.72 | +7: iteration 52880/ 173500 | consumed samples: 13537280 | consumed tokens: 27724349440 | elapsed time per iteration (s): 0.42 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 2.999605E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.575 | TFLOPs: 31.98 | +7: iteration 52890/ 173500 | consumed samples: 13539840 | consumed tokens: 27729592320 | elapsed time per iteration (s): 0.43 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 3.013182E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.037 | TFLOPs: 31.38 | +7: iteration 52900/ 173500 | consumed samples: 13542400 | consumed tokens: 27734835200 | elapsed time per iteration (s): 0.42 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 3.014346E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.450 | TFLOPs: 31.77 | +7: iteration 52910/ 173500 | consumed samples: 13544960 | consumed tokens: 27740078080 | elapsed time per iteration (s): 0.42 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 3.007166E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.342 | TFLOPs: 31.81 | +7: iteration 52920/ 173500 | consumed samples: 13547520 | consumed tokens: 27745320960 | elapsed time per iteration (s): 0.42 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 3.011972E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.167 | TFLOPs: 31.70 | +7: iteration 52930/ 173500 | consumed samples: 13550080 | consumed tokens: 27750563840 | elapsed time per iteration (s): 0.42 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 3.010762E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.763 | TFLOPs: 32.05 | +7: iteration 52940/ 173500 | consumed samples: 13552640 | consumed tokens: 27755806720 | elapsed time per iteration (s): 0.43 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 3.007672E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.149 | TFLOPs: 31.59 | +7: iteration 52950/ 173500 | consumed samples: 13555200 | consumed tokens: 27761049600 | elapsed time per iteration (s): 0.42 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 3.008576E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.021 | TFLOPs: 31.85 | +7: iteration 52960/ 173500 | consumed samples: 13557760 | consumed tokens: 27766292480 | elapsed time per iteration (s): 0.42 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 3.002434E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.784 | TFLOPs: 31.63 | +7: iteration 52970/ 173500 | consumed samples: 13560320 | consumed tokens: 27771535360 | elapsed time per iteration (s): 0.42 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 3.001003E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.688 | TFLOPs: 31.88 | +7: iteration 52980/ 173500 | consumed samples: 13562880 | consumed tokens: 27776778240 | elapsed time per iteration (s): 0.42 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 3.017848E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.697 | TFLOPs: 32.04 | +7: iteration 52990/ 173500 | consumed samples: 13565440 | consumed tokens: 27782021120 | elapsed time per iteration (s): 0.42 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 3.012764E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.546 | TFLOPs: 32.03 | +7: iteration 53000/ 173500 | consumed samples: 13568000 | consumed tokens: 27787264000 | elapsed time per iteration (s): 0.42 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 3.019267E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.604 | TFLOPs: 31.62 | +7: iteration 53010/ 173500 | consumed samples: 13570560 | consumed tokens: 27792506880 | elapsed time per iteration (s): 0.42 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 3.017710E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.331 | TFLOPs: 31.76 | +7: iteration 53020/ 173500 | consumed samples: 13573120 | consumed tokens: 27797749760 | elapsed time per iteration (s): 0.43 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 3.002863E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.251 | TFLOPs: 31.28 | +7: iteration 53030/ 173500 | consumed samples: 13575680 | consumed tokens: 27802992640 | elapsed time per iteration (s): 0.42 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 3.016351E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.346 | TFLOPs: 31.87 | +7: iteration 53040/ 173500 | consumed samples: 13578240 | consumed tokens: 27808235520 | elapsed time per iteration (s): 0.42 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 3.005915E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.847 | TFLOPs: 31.74 | +7: iteration 53050/ 173500 | consumed samples: 13580800 | consumed tokens: 27813478400 | elapsed time per iteration (s): 0.42 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 3.003401E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.070 | TFLOPs: 32.06 | +7: iteration 53060/ 173500 | consumed samples: 13583360 | consumed tokens: 27818721280 | elapsed time per iteration (s): 0.42 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 3.026155E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.553 | TFLOPs: 32.03 | +7: iteration 53070/ 173500 | consumed samples: 13585920 | consumed tokens: 27823964160 | elapsed time per iteration (s): 0.42 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 3.002113E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.597 | TFLOPs: 31.77 | +7: iteration 53080/ 173500 | consumed samples: 13588480 | consumed tokens: 27829207040 | elapsed time per iteration (s): 0.42 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 3.005243E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.652 | TFLOPs: 31.73 | +7: iteration 53090/ 173500 | consumed samples: 13591040 | consumed tokens: 27834449920 | elapsed time per iteration (s): 0.43 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 3.005172E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.329 | TFLOPs: 31.39 | +7: iteration 53100/ 173500 | consumed samples: 13593600 | consumed tokens: 27839692800 | elapsed time per iteration (s): 0.43 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 3.006589E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.952 | TFLOPs: 31.58 | +7: iteration 53110/ 173500 | consumed samples: 13596160 | consumed tokens: 27844935680 | elapsed time per iteration (s): 0.43 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 3.015439E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.270 | TFLOPs: 31.60 | +7: iteration 53120/ 173500 | consumed samples: 13598720 | consumed tokens: 27850178560 | elapsed time per iteration (s): 0.42 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 2.998894E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.739 | TFLOPs: 31.62 | +7: iteration 53130/ 173500 | consumed samples: 13601280 | consumed tokens: 27855421440 | elapsed time per iteration (s): 0.43 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 3.013132E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.573 | TFLOPs: 31.51 | +7: iteration 53140/ 173500 | consumed samples: 13603840 | consumed tokens: 27860664320 | elapsed time per iteration (s): 0.42 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 2.998226E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.066 | TFLOPs: 31.80 | +7: iteration 53150/ 173500 | consumed samples: 13606400 | consumed tokens: 27865907200 | elapsed time per iteration (s): 0.42 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 3.021236E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.973 | TFLOPs: 31.90 | +7: iteration 53160/ 173500 | consumed samples: 13608960 | consumed tokens: 27871150080 | elapsed time per iteration (s): 0.42 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 3.007617E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.585 | TFLOPs: 31.72 | +7: iteration 53170/ 173500 | consumed samples: 13611520 | consumed tokens: 27876392960 | elapsed time per iteration (s): 0.42 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 3.005517E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.341 | TFLOPs: 32.02 | +7: iteration 53180/ 173500 | consumed samples: 13614080 | consumed tokens: 27881635840 | elapsed time per iteration (s): 0.42 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 3.032199E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.891 | TFLOPs: 31.79 | +7: iteration 53190/ 173500 | consumed samples: 13616640 | consumed tokens: 27886878720 | elapsed time per iteration (s): 0.42 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 3.012135E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.131 | TFLOPs: 31.70 | +7: iteration 53200/ 173500 | consumed samples: 13619200 | consumed tokens: 27892121600 | elapsed time per iteration (s): 0.43 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 3.002635E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.987 | TFLOPs: 31.59 | +7: iteration 53210/ 173500 | consumed samples: 13621760 | consumed tokens: 27897364480 | elapsed time per iteration (s): 0.42 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 2.998732E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.596 | TFLOPs: 31.93 | +7: iteration 53220/ 173500 | consumed samples: 13624320 | consumed tokens: 27902607360 | elapsed time per iteration (s): 0.42 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 3.007913E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.519 | TFLOPs: 31.93 | +7: iteration 53230/ 173500 | consumed samples: 13626880 | consumed tokens: 27907850240 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 3.010245E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.968 | TFLOPs: 32.06 | +7: iteration 53240/ 173500 | consumed samples: 13629440 | consumed tokens: 27913093120 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 3.009442E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.334 | TFLOPs: 32.02 | +7: iteration 53250/ 173500 | consumed samples: 13632000 | consumed tokens: 27918336000 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 3.012369E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.812 | TFLOPs: 32.05 | +7: iteration 53260/ 173500 | consumed samples: 13634560 | consumed tokens: 27923578880 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 3.021854E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.169 | TFLOPs: 31.65 | +7: iteration 53270/ 173500 | consumed samples: 13637120 | consumed tokens: 27928821760 | elapsed time per iteration (s): 0.43 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 3.017348E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.024 | TFLOPs: 31.59 | +7: iteration 53280/ 173500 | consumed samples: 13639680 | consumed tokens: 27934064640 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 2.996526E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.057 | TFLOPs: 31.64 | +7: iteration 53290/ 173500 | consumed samples: 13642240 | consumed tokens: 27939307520 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 3.010033E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.763 | TFLOPs: 32.05 | +7: iteration 53300/ 173500 | consumed samples: 13644800 | consumed tokens: 27944550400 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 3.012334E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.393 | TFLOPs: 31.82 | +7: iteration 53310/ 173500 | consumed samples: 13647360 | consumed tokens: 27949793280 | elapsed time per iteration (s): 0.42 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 3.008062E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.130 | TFLOPs: 31.91 | +7: iteration 53320/ 173500 | consumed samples: 13649920 | consumed tokens: 27955036160 | elapsed time per iteration (s): 0.42 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 3.006203E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.728 | TFLOPs: 32.04 | +7: iteration 53330/ 173500 | consumed samples: 13652480 | consumed tokens: 27960279040 | elapsed time per iteration (s): 0.43 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 2.994040E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.726 | TFLOPs: 31.57 | +7: iteration 53340/ 173500 | consumed samples: 13655040 | consumed tokens: 27965521920 | elapsed time per iteration (s): 0.42 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 3.011563E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.465 | TFLOPs: 31.82 | +7: iteration 53350/ 173500 | consumed samples: 13657600 | consumed tokens: 27970764800 | elapsed time per iteration (s): 0.42 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 3.020225E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.703 | TFLOPs: 32.04 | +7: iteration 53360/ 173500 | consumed samples: 13660160 | consumed tokens: 27976007680 | elapsed time per iteration (s): 0.43 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 3.006941E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.161 | TFLOPs: 31.33 | +7: iteration 53370/ 173500 | consumed samples: 13662720 | consumed tokens: 27981250560 | elapsed time per iteration (s): 0.42 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 3.016144E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.539 | TFLOPs: 31.77 | +7: iteration 53380/ 173500 | consumed samples: 13665280 | consumed tokens: 27986493440 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 3.009245E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.404 | TFLOPs: 31.82 | +7: iteration 53390/ 173500 | consumed samples: 13667840 | consumed tokens: 27991736320 | elapsed time per iteration (s): 0.43 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 3.017163E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.198 | TFLOPs: 31.49 | +7: iteration 53400/ 173500 | consumed samples: 13670400 | consumed tokens: 27996979200 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 3.003536E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.417 | TFLOPs: 31.87 | +7: iteration 53410/ 173500 | consumed samples: 13672960 | consumed tokens: 28002222080 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 3.009148E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.076 | TFLOPs: 31.75 | +7: iteration 53420/ 173500 | consumed samples: 13675520 | consumed tokens: 28007464960 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 3.009044E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.028 | TFLOPs: 31.90 | +7: iteration 53430/ 173500 | consumed samples: 13678080 | consumed tokens: 28012707840 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 3.013795E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.340 | TFLOPs: 31.76 | +7: iteration 53440/ 173500 | consumed samples: 13680640 | consumed tokens: 28017950720 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 3.019497E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.225 | TFLOPs: 31.91 | +7: iteration 53450/ 173500 | consumed samples: 13683200 | consumed tokens: 28023193600 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 3.000001E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.792 | TFLOPs: 31.84 | +7: iteration 53460/ 173500 | consumed samples: 13685760 | consumed tokens: 28028436480 | elapsed time per iteration (s): 0.42 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 2.998191E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.968 | TFLOPs: 32.06 | +7: iteration 53470/ 173500 | consumed samples: 13688320 | consumed tokens: 28033679360 | elapsed time per iteration (s): 0.42 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 3.006298E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.709 | TFLOPs: 31.62 | +7: iteration 53480/ 173500 | consumed samples: 13690880 | consumed tokens: 28038922240 | elapsed time per iteration (s): 0.42 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 3.011020E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.986 | TFLOPs: 31.85 | +7: iteration 53490/ 173500 | consumed samples: 13693440 | consumed tokens: 28044165120 | elapsed time per iteration (s): 0.42 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 3.018237E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.192 | TFLOPs: 32.02 | +7: iteration 53500/ 173500 | consumed samples: 13696000 | consumed tokens: 28049408000 | elapsed time per iteration (s): 0.42 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 3.024836E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.414 | TFLOPs: 31.77 | +7: iteration 53510/ 173500 | consumed samples: 13698560 | consumed tokens: 28054650880 | elapsed time per iteration (s): 0.42 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 3.017731E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.529 | TFLOPs: 31.88 | +7: iteration 53520/ 173500 | consumed samples: 13701120 | consumed tokens: 28059893760 | elapsed time per iteration (s): 0.42 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 3.003945E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.798 | TFLOPs: 31.73 | +7: iteration 53530/ 173500 | consumed samples: 13703680 | consumed tokens: 28065136640 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 2.998016E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.896 | TFLOPs: 31.74 | +7: iteration 53540/ 173500 | consumed samples: 13706240 | consumed tokens: 28070379520 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 3.009184E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.448 | TFLOPs: 31.77 | +7: iteration 53550/ 173500 | consumed samples: 13708800 | consumed tokens: 28075622400 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 3.021014E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.798 | TFLOPs: 31.68 | +7: iteration 53560/ 173500 | consumed samples: 13711360 | consumed tokens: 28080865280 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 3.006311E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.047 | TFLOPs: 32.06 | +7: iteration 53570/ 173500 | consumed samples: 13713920 | consumed tokens: 28086108160 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 3.013501E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.555 | TFLOPs: 32.03 | +7: iteration 53580/ 173500 | consumed samples: 13716480 | consumed tokens: 28091351040 | elapsed time per iteration (s): 0.43 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 3.000920E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.208 | TFLOPs: 31.39 | +7: iteration 53590/ 173500 | consumed samples: 13719040 | consumed tokens: 28096593920 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 3.005968E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.885 | TFLOPs: 31.79 | +7: iteration 53600/ 173500 | consumed samples: 13721600 | consumed tokens: 28101836800 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 3.008401E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.649 | TFLOPs: 31.83 | +7: iteration 53610/ 173500 | consumed samples: 13724160 | consumed tokens: 28107079680 | elapsed time per iteration (s): 0.42 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 3.014716E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.872 | TFLOPs: 31.79 | +7: iteration 53620/ 173500 | consumed samples: 13726720 | consumed tokens: 28112322560 | elapsed time per iteration (s): 0.43 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 3.013865E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.860 | TFLOPs: 31.58 | +7: iteration 53630/ 173500 | consumed samples: 13729280 | consumed tokens: 28117565440 | elapsed time per iteration (s): 0.42 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 3.017824E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.975 | TFLOPs: 31.79 | +7: iteration 53640/ 173500 | consumed samples: 13731840 | consumed tokens: 28122808320 | elapsed time per iteration (s): 0.42 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 3.009359E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.498 | TFLOPs: 31.61 | +7: iteration 53650/ 173500 | consumed samples: 13734400 | consumed tokens: 28128051200 | elapsed time per iteration (s): 0.42 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 3.023770E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.451 | TFLOPs: 31.92 | +7: iteration 53660/ 173500 | consumed samples: 13736960 | consumed tokens: 28133294080 | elapsed time per iteration (s): 0.42 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 3.000721E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.483 | TFLOPs: 31.82 | +7: iteration 53670/ 173500 | consumed samples: 13739520 | consumed tokens: 28138536960 | elapsed time per iteration (s): 0.42 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 3.004806E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.044 | TFLOPs: 31.64 | +7: iteration 53680/ 173500 | consumed samples: 13742080 | consumed tokens: 28143779840 | elapsed time per iteration (s): 0.42 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 3.009302E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.088 | TFLOPs: 32.06 | +7: iteration 53690/ 173500 | consumed samples: 13744640 | consumed tokens: 28149022720 | elapsed time per iteration (s): 0.42 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 3.020036E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.682 | TFLOPs: 32.04 | +7: iteration 53700/ 173500 | consumed samples: 13747200 | consumed tokens: 28154265600 | elapsed time per iteration (s): 0.42 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 3.007096E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.326 | TFLOPs: 31.81 | +7: iteration 53710/ 173500 | consumed samples: 13749760 | consumed tokens: 28159508480 | elapsed time per iteration (s): 0.42 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 2.995499E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.751 | TFLOPs: 31.68 | +7: iteration 53720/ 173500 | consumed samples: 13752320 | consumed tokens: 28164751360 | elapsed time per iteration (s): 0.42 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 2.994419E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.863 | TFLOPs: 31.84 | +7: iteration 53730/ 173500 | consumed samples: 13754880 | consumed tokens: 28169994240 | elapsed time per iteration (s): 0.42 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 3.000901E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.493 | TFLOPs: 31.72 | +7: iteration 53740/ 173500 | consumed samples: 13757440 | consumed tokens: 28175237120 | elapsed time per iteration (s): 0.42 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 3.006749E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.646 | TFLOPs: 31.62 | +7: iteration 53750/ 173500 | consumed samples: 13760000 | consumed tokens: 28180480000 | elapsed time per iteration (s): 0.42 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 3.014923E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.905 | TFLOPs: 32.05 | +7: iteration 53760/ 173500 | consumed samples: 13762560 | consumed tokens: 28185722880 | elapsed time per iteration (s): 0.43 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 3.001971E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.309 | TFLOPs: 31.55 | +7: iteration 53770/ 173500 | consumed samples: 13765120 | consumed tokens: 28190965760 | elapsed time per iteration (s): 0.42 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 3.013329E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.781 | TFLOPs: 31.84 | +7: iteration 53780/ 173500 | consumed samples: 13767680 | consumed tokens: 28196208640 | elapsed time per iteration (s): 0.42 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 3.010083E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.821 | TFLOPs: 31.84 | +7: iteration 53790/ 173500 | consumed samples: 13770240 | consumed tokens: 28201451520 | elapsed time per iteration (s): 0.43 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.998276E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.952 | TFLOPs: 31.53 | +7: iteration 53800/ 173500 | consumed samples: 13772800 | consumed tokens: 28206694400 | elapsed time per iteration (s): 0.42 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.997560E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.333 | TFLOPs: 32.08 | +7: iteration 53810/ 173500 | consumed samples: 13775360 | consumed tokens: 28211937280 | elapsed time per iteration (s): 0.42 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 3.012790E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.035 | TFLOPs: 32.01 | +7: iteration 53820/ 173500 | consumed samples: 13777920 | consumed tokens: 28217180160 | elapsed time per iteration (s): 0.43 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.998309E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.513 | TFLOPs: 31.46 | +7: iteration 53830/ 173500 | consumed samples: 13780480 | consumed tokens: 28222423040 | elapsed time per iteration (s): 0.42 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 3.005557E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.911 | TFLOPs: 32.05 | +7: iteration 53840/ 173500 | consumed samples: 13783040 | consumed tokens: 28227665920 | elapsed time per iteration (s): 0.42 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 3.007772E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.533 | TFLOPs: 31.67 | +7: iteration 53850/ 173500 | consumed samples: 13785600 | consumed tokens: 28232908800 | elapsed time per iteration (s): 0.42 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 2.996168E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.307 | TFLOPs: 31.76 | +7: iteration 53860/ 173500 | consumed samples: 13788160 | consumed tokens: 28238151680 | elapsed time per iteration (s): 0.42 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 2.996167E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.212 | TFLOPs: 32.02 | +7: iteration 53870/ 173500 | consumed samples: 13790720 | consumed tokens: 28243394560 | elapsed time per iteration (s): 0.42 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 3.025793E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.763 | TFLOPs: 32.05 | +7: iteration 53880/ 173500 | consumed samples: 13793280 | consumed tokens: 28248637440 | elapsed time per iteration (s): 0.42 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 3.004771E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.754 | TFLOPs: 31.94 | +7: iteration 53890/ 173500 | consumed samples: 13795840 | consumed tokens: 28253880320 | elapsed time per iteration (s): 0.42 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 3.012185E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.721 | TFLOPs: 31.78 | +7: iteration 53900/ 173500 | consumed samples: 13798400 | consumed tokens: 28259123200 | elapsed time per iteration (s): 0.42 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 3.008396E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.410 | TFLOPs: 31.87 | +7: iteration 53910/ 173500 | consumed samples: 13800960 | consumed tokens: 28264366080 | elapsed time per iteration (s): 0.43 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 3.010400E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.797 | TFLOPs: 31.58 | +7: iteration 53920/ 173500 | consumed samples: 13803520 | consumed tokens: 28269608960 | elapsed time per iteration (s): 0.42 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 3.015037E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.996 | TFLOPs: 32.06 | +7: iteration 53930/ 173500 | consumed samples: 13806080 | consumed tokens: 28274851840 | elapsed time per iteration (s): 0.42 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 3.005719E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.409 | TFLOPs: 31.82 | +7: iteration 53940/ 173500 | consumed samples: 13808640 | consumed tokens: 28280094720 | elapsed time per iteration (s): 0.42 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 3.007032E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.018 | TFLOPs: 31.69 | +7: iteration 53950/ 173500 | consumed samples: 13811200 | consumed tokens: 28285337600 | elapsed time per iteration (s): 0.43 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 3.007433E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.238 | TFLOPs: 31.55 | +7: iteration 53960/ 173500 | consumed samples: 13813760 | consumed tokens: 28290580480 | elapsed time per iteration (s): 0.42 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 3.022343E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.959 | TFLOPs: 32.06 | +7: iteration 53970/ 173500 | consumed samples: 13816320 | consumed tokens: 28295823360 | elapsed time per iteration (s): 0.42 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 2.997859E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.942 | TFLOPs: 31.85 | +7: iteration 53980/ 173500 | consumed samples: 13818880 | consumed tokens: 28301066240 | elapsed time per iteration (s): 0.42 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 3.003067E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.177 | TFLOPs: 31.86 | +7: iteration 53990/ 173500 | consumed samples: 13821440 | consumed tokens: 28306309120 | elapsed time per iteration (s): 0.42 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 2.997210E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.440 | TFLOPs: 31.87 | +0: [2023-03-17 05:35:33,506] [INFO] [logging.py:68:log_dist] [Rank 0] step=54000, skipped=0, lr=[0.00016191666237869197, 0.00016191666237869197, 0.00016191666237869197], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 54000/ 173500 | consumed samples: 13824000 | consumed tokens: 28311552000 | elapsed time per iteration (s): 0.42 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 3.013510E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.089 | TFLOPs: 31.96 | +0: steps: 54000 loss: 3.0226 iter time (s): 0.421 samples/sec: 608.041 +7: iteration 54010/ 173500 | consumed samples: 13826560 | consumed tokens: 28316794880 | elapsed time per iteration (s): 0.42 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 3.006114E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.038 | TFLOPs: 31.69 | +7: iteration 54020/ 173500 | consumed samples: 13829120 | consumed tokens: 28322037760 | elapsed time per iteration (s): 0.42 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 3.018258E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.578 | TFLOPs: 31.83 | +7: iteration 54030/ 173500 | consumed samples: 13831680 | consumed tokens: 28327280640 | elapsed time per iteration (s): 0.43 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 3.004426E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.855 | TFLOPs: 31.58 | +7: iteration 54040/ 173500 | consumed samples: 13834240 | consumed tokens: 28332523520 | elapsed time per iteration (s): 0.42 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 2.999382E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.131 | TFLOPs: 31.65 | +7: iteration 54050/ 173500 | consumed samples: 13836800 | consumed tokens: 28337766400 | elapsed time per iteration (s): 0.42 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 3.006287E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.335 | TFLOPs: 31.97 | +7: iteration 54060/ 173500 | consumed samples: 13839360 | consumed tokens: 28343009280 | elapsed time per iteration (s): 0.42 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 3.016482E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.957 | TFLOPs: 31.69 | +7: iteration 54070/ 173500 | consumed samples: 13841920 | consumed tokens: 28348252160 | elapsed time per iteration (s): 0.43 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 3.012665E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.490 | TFLOPs: 31.35 | +7: iteration 54080/ 173500 | consumed samples: 13844480 | consumed tokens: 28353495040 | elapsed time per iteration (s): 0.42 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 3.000944E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.162 | TFLOPs: 32.01 | +7: iteration 54090/ 173500 | consumed samples: 13847040 | consumed tokens: 28358737920 | elapsed time per iteration (s): 0.42 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 3.022473E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.579 | TFLOPs: 31.62 | +7: iteration 54100/ 173500 | consumed samples: 13849600 | consumed tokens: 28363980800 | elapsed time per iteration (s): 0.42 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 3.005271E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.517 | TFLOPs: 31.98 | +7: iteration 54110/ 173500 | consumed samples: 13852160 | consumed tokens: 28369223680 | elapsed time per iteration (s): 0.43 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 3.000016E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.649 | TFLOPs: 31.57 | +7: iteration 54120/ 173500 | consumed samples: 13854720 | consumed tokens: 28374466560 | elapsed time per iteration (s): 0.44 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 2.998259E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.403 | TFLOPs: 30.61 | +7: iteration 54130/ 173500 | consumed samples: 13857280 | consumed tokens: 28379709440 | elapsed time per iteration (s): 0.42 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 3.010997E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.919 | TFLOPs: 32.11 | +7: iteration 54140/ 173500 | consumed samples: 13859840 | consumed tokens: 28384952320 | elapsed time per iteration (s): 0.42 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 3.000078E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.928 | TFLOPs: 32.05 | +7: iteration 54150/ 173500 | consumed samples: 13862400 | consumed tokens: 28390195200 | elapsed time per iteration (s): 0.43 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 2.990791E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.402 | TFLOPs: 30.98 | +7: iteration 54160/ 173500 | consumed samples: 13864960 | consumed tokens: 28395438080 | elapsed time per iteration (s): 0.43 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 3.010025E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.247 | TFLOPs: 31.55 | +7: iteration 54170/ 173500 | consumed samples: 13867520 | consumed tokens: 28400680960 | elapsed time per iteration (s): 0.44 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 3.007024E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.762 | TFLOPs: 30.37 | +7: iteration 54180/ 173500 | consumed samples: 13870080 | consumed tokens: 28405923840 | elapsed time per iteration (s): 0.43 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 3.017294E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.611 | TFLOPs: 31.20 | +7: iteration 54190/ 173500 | consumed samples: 13872640 | consumed tokens: 28411166720 | elapsed time per iteration (s): 0.45 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 2.993096E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.890 | TFLOPs: 29.80 | +7: iteration 54200/ 173500 | consumed samples: 13875200 | consumed tokens: 28416409600 | elapsed time per iteration (s): 0.44 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 3.013056E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.977 | TFLOPs: 30.64 | +7: iteration 54210/ 173500 | consumed samples: 13877760 | consumed tokens: 28421652480 | elapsed time per iteration (s): 0.43 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 3.014757E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.101 | TFLOPs: 31.01 | +7: iteration 54220/ 173500 | consumed samples: 13880320 | consumed tokens: 28426895360 | elapsed time per iteration (s): 0.44 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 3.007255E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.188 | TFLOPs: 30.70 | +7: iteration 54230/ 173500 | consumed samples: 13882880 | consumed tokens: 28432138240 | elapsed time per iteration (s): 0.44 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 2.994898E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.409 | TFLOPs: 30.19 | +7: iteration 54240/ 173500 | consumed samples: 13885440 | consumed tokens: 28437381120 | elapsed time per iteration (s): 0.45 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 3.020160E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.481 | TFLOPs: 30.14 | +7: iteration 54250/ 173500 | consumed samples: 13888000 | consumed tokens: 28442624000 | elapsed time per iteration (s): 0.44 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 3.008410E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.104 | TFLOPs: 30.28 | +7: iteration 54260/ 173500 | consumed samples: 13890560 | consumed tokens: 28447866880 | elapsed time per iteration (s): 0.43 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 3.003112E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.609 | TFLOPs: 31.46 | +7: iteration 54270/ 173500 | consumed samples: 13893120 | consumed tokens: 28453109760 | elapsed time per iteration (s): 0.44 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 3.013418E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.871 | TFLOPs: 30.48 | +7: iteration 54280/ 173500 | consumed samples: 13895680 | consumed tokens: 28458352640 | elapsed time per iteration (s): 0.45 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 3.006636E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.346 | TFLOPs: 30.08 | +7: iteration 54290/ 173500 | consumed samples: 13898240 | consumed tokens: 28463595520 | elapsed time per iteration (s): 0.43 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 2.990526E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.396 | TFLOPs: 31.13 | +7: iteration 54300/ 173500 | consumed samples: 13900800 | consumed tokens: 28468838400 | elapsed time per iteration (s): 0.45 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 3.014349E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.474 | TFLOPs: 29.56 | +7: iteration 54310/ 173500 | consumed samples: 13903360 | consumed tokens: 28474081280 | elapsed time per iteration (s): 0.42 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 2.992133E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.087 | TFLOPs: 31.64 | +7: iteration 54320/ 173500 | consumed samples: 13905920 | consumed tokens: 28479324160 | elapsed time per iteration (s): 0.47 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 3.011393E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 549.092 | TFLOPs: 28.81 | +7: iteration 54330/ 173500 | consumed samples: 13908480 | consumed tokens: 28484567040 | elapsed time per iteration (s): 0.44 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 3.000658E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.479 | TFLOPs: 30.72 | +7: iteration 54340/ 173500 | consumed samples: 13911040 | consumed tokens: 28489809920 | elapsed time per iteration (s): 0.44 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 3.022560E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.683 | TFLOPs: 30.62 | +7: iteration 54350/ 173500 | consumed samples: 13913600 | consumed tokens: 28495052800 | elapsed time per iteration (s): 0.43 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 2.998611E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.215 | TFLOPs: 31.49 | +7: iteration 54360/ 173500 | consumed samples: 13916160 | consumed tokens: 28500295680 | elapsed time per iteration (s): 0.42 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.003199E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.556 | TFLOPs: 32.03 | +7: iteration 54370/ 173500 | consumed samples: 13918720 | consumed tokens: 28505538560 | elapsed time per iteration (s): 0.42 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.011181E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.162 | TFLOPs: 32.01 | +7: iteration 54380/ 173500 | consumed samples: 13921280 | consumed tokens: 28510781440 | elapsed time per iteration (s): 0.42 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.016590E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.822 | TFLOPs: 31.84 | +7: iteration 54390/ 173500 | consumed samples: 13923840 | consumed tokens: 28516024320 | elapsed time per iteration (s): 0.42 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.012020E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.697 | TFLOPs: 32.15 | +7: iteration 54400/ 173500 | consumed samples: 13926400 | consumed tokens: 28521267200 | elapsed time per iteration (s): 0.42 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.008384E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.022 | TFLOPs: 31.90 | +7: iteration 54410/ 173500 | consumed samples: 13928960 | consumed tokens: 28526510080 | elapsed time per iteration (s): 0.42 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.004703E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.216 | TFLOPs: 31.96 | +7: iteration 54420/ 173500 | consumed samples: 13931520 | consumed tokens: 28531752960 | elapsed time per iteration (s): 0.42 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.001367E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.407 | TFLOPs: 31.61 | +7: iteration 54430/ 173500 | consumed samples: 13934080 | consumed tokens: 28536995840 | elapsed time per iteration (s): 0.42 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 3.010375E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.211 | TFLOPs: 31.65 | +7: iteration 54440/ 173500 | consumed samples: 13936640 | consumed tokens: 28542238720 | elapsed time per iteration (s): 0.42 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 2.995718E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.456 | TFLOPs: 32.13 | +7: iteration 54450/ 173500 | consumed samples: 13939200 | consumed tokens: 28547481600 | elapsed time per iteration (s): 0.42 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 3.011957E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.293 | TFLOPs: 31.76 | +7: iteration 54460/ 173500 | consumed samples: 13941760 | consumed tokens: 28552724480 | elapsed time per iteration (s): 0.42 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 3.018334E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.283 | TFLOPs: 31.97 | +7: iteration 54470/ 173500 | consumed samples: 13944320 | consumed tokens: 28557967360 | elapsed time per iteration (s): 0.42 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 3.010744E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.419 | TFLOPs: 31.87 | +7: iteration 54480/ 173500 | consumed samples: 13946880 | consumed tokens: 28563210240 | elapsed time per iteration (s): 0.43 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 3.008129E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.359 | TFLOPs: 31.45 | +7: iteration 54490/ 173500 | consumed samples: 13949440 | consumed tokens: 28568453120 | elapsed time per iteration (s): 0.42 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 3.011714E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.661 | TFLOPs: 32.09 | +7: iteration 54500/ 173500 | consumed samples: 13952000 | consumed tokens: 28573696000 | elapsed time per iteration (s): 0.42 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 3.002435E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.651 | TFLOPs: 31.83 | +7: iteration 54510/ 173500 | consumed samples: 13954560 | consumed tokens: 28578938880 | elapsed time per iteration (s): 0.42 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 3.011486E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.524 | TFLOPs: 32.09 | +7: iteration 54520/ 173500 | consumed samples: 13957120 | consumed tokens: 28584181760 | elapsed time per iteration (s): 0.42 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 3.007365E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.568 | TFLOPs: 31.77 | +7: iteration 54530/ 173500 | consumed samples: 13959680 | consumed tokens: 28589424640 | elapsed time per iteration (s): 0.42 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 3.016142E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.506 | TFLOPs: 32.08 | +7: iteration 54540/ 173500 | consumed samples: 13962240 | consumed tokens: 28594667520 | elapsed time per iteration (s): 0.42 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 3.008726E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.038 | TFLOPs: 31.85 | +7: iteration 54550/ 173500 | consumed samples: 13964800 | consumed tokens: 28599910400 | elapsed time per iteration (s): 0.42 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 3.000200E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.691 | TFLOPs: 32.09 | +7: iteration 54560/ 173500 | consumed samples: 13967360 | consumed tokens: 28605153280 | elapsed time per iteration (s): 0.42 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 2.998636E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.072 | TFLOPs: 31.90 | +7: iteration 54570/ 173500 | consumed samples: 13969920 | consumed tokens: 28610396160 | elapsed time per iteration (s): 0.42 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 3.006454E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.281 | TFLOPs: 32.07 | +7: iteration 54580/ 173500 | consumed samples: 13972480 | consumed tokens: 28615639040 | elapsed time per iteration (s): 0.43 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 3.014173E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.439 | TFLOPs: 31.56 | +7: iteration 54590/ 173500 | consumed samples: 13975040 | consumed tokens: 28620881920 | elapsed time per iteration (s): 0.42 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.998309E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.612 | TFLOPs: 32.09 | +7: iteration 54600/ 173500 | consumed samples: 13977600 | consumed tokens: 28626124800 | elapsed time per iteration (s): 0.42 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 3.004323E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.547 | TFLOPs: 31.61 | +7: iteration 54610/ 173500 | consumed samples: 13980160 | consumed tokens: 28631367680 | elapsed time per iteration (s): 0.42 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.999246E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.411 | TFLOPs: 32.08 | +7: iteration 54620/ 173500 | consumed samples: 13982720 | consumed tokens: 28636610560 | elapsed time per iteration (s): 0.42 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 3.003718E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.503 | TFLOPs: 32.08 | +7: iteration 54630/ 173500 | consumed samples: 13985280 | consumed tokens: 28641853440 | elapsed time per iteration (s): 0.42 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 3.016640E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.108 | TFLOPs: 32.01 | +7: iteration 54640/ 173500 | consumed samples: 13987840 | consumed tokens: 28647096320 | elapsed time per iteration (s): 0.43 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.994574E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.863 | TFLOPs: 31.47 | +7: iteration 54650/ 173500 | consumed samples: 13990400 | consumed tokens: 28652339200 | elapsed time per iteration (s): 0.42 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 3.004355E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.196 | TFLOPs: 32.07 | +7: iteration 54660/ 173500 | consumed samples: 13992960 | consumed tokens: 28657582080 | elapsed time per iteration (s): 0.42 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 2.991557E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.781 | TFLOPs: 31.78 | +7: iteration 54670/ 173500 | consumed samples: 13995520 | consumed tokens: 28662824960 | elapsed time per iteration (s): 0.42 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 3.014215E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.026 | TFLOPs: 31.85 | +7: iteration 54680/ 173500 | consumed samples: 13998080 | consumed tokens: 28668067840 | elapsed time per iteration (s): 0.42 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 3.002664E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.108 | TFLOPs: 32.06 | +7: iteration 54690/ 173500 | consumed samples: 14000640 | consumed tokens: 28673310720 | elapsed time per iteration (s): 0.42 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 3.008011E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.126 | TFLOPs: 31.80 | +7: iteration 54700/ 173500 | consumed samples: 14003200 | consumed tokens: 28678553600 | elapsed time per iteration (s): 0.42 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 3.005551E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.485 | TFLOPs: 31.93 | +7: iteration 54710/ 173500 | consumed samples: 14005760 | consumed tokens: 28683796480 | elapsed time per iteration (s): 0.42 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 3.015560E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.753 | TFLOPs: 31.89 | +7: iteration 54720/ 173500 | consumed samples: 14008320 | consumed tokens: 28689039360 | elapsed time per iteration (s): 0.42 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 3.008590E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.615 | TFLOPs: 31.78 | +7: iteration 54730/ 173500 | consumed samples: 14010880 | consumed tokens: 28694282240 | elapsed time per iteration (s): 0.44 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 3.010996E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.304 | TFLOPs: 30.40 | +7: iteration 54740/ 173500 | consumed samples: 14013440 | consumed tokens: 28699525120 | elapsed time per iteration (s): 0.44 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 3.015385E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.644 | TFLOPs: 30.78 | +7: iteration 54750/ 173500 | consumed samples: 14016000 | consumed tokens: 28704768000 | elapsed time per iteration (s): 0.43 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 3.029437E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.243 | TFLOPs: 31.60 | +7: iteration 54760/ 173500 | consumed samples: 14018560 | consumed tokens: 28710010880 | elapsed time per iteration (s): 0.44 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 3.008919E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.686 | TFLOPs: 30.57 | +7: iteration 54770/ 173500 | consumed samples: 14021120 | consumed tokens: 28715253760 | elapsed time per iteration (s): 0.43 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 3.014319E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.366 | TFLOPs: 31.24 | +7: iteration 54780/ 173500 | consumed samples: 14023680 | consumed tokens: 28720496640 | elapsed time per iteration (s): 0.43 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 2.998756E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.435 | TFLOPs: 31.35 | +7: iteration 54790/ 173500 | consumed samples: 14026240 | consumed tokens: 28725739520 | elapsed time per iteration (s): 0.42 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 2.995549E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.136 | TFLOPs: 31.91 | +7: iteration 54800/ 173500 | consumed samples: 14028800 | consumed tokens: 28730982400 | elapsed time per iteration (s): 0.43 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.015566E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.214 | TFLOPs: 31.44 | +7: iteration 54810/ 173500 | consumed samples: 14031360 | consumed tokens: 28736225280 | elapsed time per iteration (s): 0.43 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.005199E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.782 | TFLOPs: 31.21 | +7: iteration 54820/ 173500 | consumed samples: 14033920 | consumed tokens: 28741468160 | elapsed time per iteration (s): 0.42 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.001842E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.626 | TFLOPs: 31.83 | +7: iteration 54830/ 173500 | consumed samples: 14036480 | consumed tokens: 28746711040 | elapsed time per iteration (s): 0.43 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.011986E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.985 | TFLOPs: 31.06 | +7: iteration 54840/ 173500 | consumed samples: 14039040 | consumed tokens: 28751953920 | elapsed time per iteration (s): 0.44 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.006037E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.627 | TFLOPs: 30.78 | +7: iteration 54850/ 173500 | consumed samples: 14041600 | consumed tokens: 28757196800 | elapsed time per iteration (s): 0.43 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.015301E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.918 | TFLOPs: 31.21 | +7: iteration 54860/ 173500 | consumed samples: 14044160 | consumed tokens: 28762439680 | elapsed time per iteration (s): 0.45 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.000429E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.861 | TFLOPs: 29.90 | +7: iteration 54870/ 173500 | consumed samples: 14046720 | consumed tokens: 28767682560 | elapsed time per iteration (s): 0.43 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 3.003625E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.329 | TFLOPs: 31.08 | +7: iteration 54880/ 173500 | consumed samples: 14049280 | consumed tokens: 28772925440 | elapsed time per iteration (s): 0.44 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 3.015833E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.721 | TFLOPs: 30.47 | +7: iteration 54890/ 173500 | consumed samples: 14051840 | consumed tokens: 28778168320 | elapsed time per iteration (s): 0.44 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 3.002081E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.628 | TFLOPs: 30.62 | +7: iteration 54900/ 173500 | consumed samples: 14054400 | consumed tokens: 28783411200 | elapsed time per iteration (s): 0.43 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 3.022309E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.768 | TFLOPs: 30.89 | +7: iteration 54910/ 173500 | consumed samples: 14056960 | consumed tokens: 28788654080 | elapsed time per iteration (s): 0.44 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 2.989748E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.619 | TFLOPs: 30.83 | +7: iteration 54920/ 173500 | consumed samples: 14059520 | consumed tokens: 28793896960 | elapsed time per iteration (s): 0.45 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 3.006156E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.154 | TFLOPs: 29.97 | +7: iteration 54930/ 173500 | consumed samples: 14062080 | consumed tokens: 28799139840 | elapsed time per iteration (s): 0.44 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 3.005998E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.821 | TFLOPs: 30.32 | +7: iteration 54940/ 173500 | consumed samples: 14064640 | consumed tokens: 28804382720 | elapsed time per iteration (s): 0.43 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 3.029241E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.652 | TFLOPs: 31.57 | +7: iteration 54950/ 173500 | consumed samples: 14067200 | consumed tokens: 28809625600 | elapsed time per iteration (s): 0.42 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 3.011630E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.746 | TFLOPs: 31.78 | +7: iteration 54960/ 173500 | consumed samples: 14069760 | consumed tokens: 28814868480 | elapsed time per iteration (s): 0.43 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 3.002198E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.659 | TFLOPs: 31.10 | +7: iteration 54970/ 173500 | consumed samples: 14072320 | consumed tokens: 28820111360 | elapsed time per iteration (s): 0.43 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 3.006953E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.644 | TFLOPs: 31.20 | +7: iteration 54980/ 173500 | consumed samples: 14074880 | consumed tokens: 28825354240 | elapsed time per iteration (s): 0.44 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 2.992138E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.751 | TFLOPs: 30.84 | +7: iteration 54990/ 173500 | consumed samples: 14077440 | consumed tokens: 28830597120 | elapsed time per iteration (s): 0.44 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 3.007757E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.881 | TFLOPs: 30.48 | +7: iteration 55000/ 173500 | consumed samples: 14080000 | consumed tokens: 28835840000 | elapsed time per iteration (s): 0.43 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 3.004361E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.456 | TFLOPs: 31.50 | +7: iteration 55010/ 173500 | consumed samples: 14082560 | consumed tokens: 28841082880 | elapsed time per iteration (s): 0.43 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 3.014380E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.250 | TFLOPs: 31.28 | +7: iteration 55020/ 173500 | consumed samples: 14085120 | consumed tokens: 28846325760 | elapsed time per iteration (s): 0.43 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 2.993582E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.542 | TFLOPs: 31.19 | +7: iteration 55030/ 173500 | consumed samples: 14087680 | consumed tokens: 28851568640 | elapsed time per iteration (s): 0.44 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 3.027852E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.155 | TFLOPs: 30.23 | +7: iteration 55040/ 173500 | consumed samples: 14090240 | consumed tokens: 28856811520 | elapsed time per iteration (s): 0.43 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 3.000410E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.350 | TFLOPs: 31.03 | +7: iteration 55050/ 173500 | consumed samples: 14092800 | consumed tokens: 28862054400 | elapsed time per iteration (s): 0.42 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 3.006602E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.261 | TFLOPs: 31.76 | +7: iteration 55060/ 173500 | consumed samples: 14095360 | consumed tokens: 28867297280 | elapsed time per iteration (s): 0.43 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 3.021025E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.633 | TFLOPs: 31.30 | +7: iteration 55070/ 173500 | consumed samples: 14097920 | consumed tokens: 28872540160 | elapsed time per iteration (s): 0.42 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 3.007883E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.619 | TFLOPs: 31.72 | +7: iteration 55080/ 173500 | consumed samples: 14100480 | consumed tokens: 28877783040 | elapsed time per iteration (s): 0.43 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 3.002077E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.965 | TFLOPs: 31.11 | +7: iteration 55090/ 173500 | consumed samples: 14103040 | consumed tokens: 28883025920 | elapsed time per iteration (s): 0.42 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 2.997200E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.634 | TFLOPs: 31.93 | +7: iteration 55100/ 173500 | consumed samples: 14105600 | consumed tokens: 28888268800 | elapsed time per iteration (s): 0.42 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 3.009592E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.387 | TFLOPs: 31.76 | +7: iteration 55110/ 173500 | consumed samples: 14108160 | consumed tokens: 28893511680 | elapsed time per iteration (s): 0.43 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 3.019543E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.722 | TFLOPs: 31.36 | +7: iteration 55120/ 173500 | consumed samples: 14110720 | consumed tokens: 28898754560 | elapsed time per iteration (s): 0.44 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 3.011999E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.562 | TFLOPs: 30.62 | +7: iteration 55130/ 173500 | consumed samples: 14113280 | consumed tokens: 28903997440 | elapsed time per iteration (s): 0.44 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 3.007487E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.651 | TFLOPs: 30.73 | +7: iteration 55140/ 173500 | consumed samples: 14115840 | consumed tokens: 28909240320 | elapsed time per iteration (s): 0.43 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 3.011427E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.267 | TFLOPs: 31.08 | +7: iteration 55150/ 173500 | consumed samples: 14118400 | consumed tokens: 28914483200 | elapsed time per iteration (s): 0.43 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 3.000882E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.170 | TFLOPs: 31.44 | +7: iteration 55160/ 173500 | consumed samples: 14120960 | consumed tokens: 28919726080 | elapsed time per iteration (s): 0.43 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 3.000190E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.732 | TFLOPs: 31.10 | +7: iteration 55170/ 173500 | consumed samples: 14123520 | consumed tokens: 28924968960 | elapsed time per iteration (s): 0.43 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 3.006423E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.552 | TFLOPs: 31.56 | +7: iteration 55180/ 173500 | consumed samples: 14126080 | consumed tokens: 28930211840 | elapsed time per iteration (s): 0.45 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 3.007666E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.214 | TFLOPs: 30.08 | +7: iteration 55190/ 173500 | consumed samples: 14128640 | consumed tokens: 28935454720 | elapsed time per iteration (s): 0.44 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 3.009115E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.408 | TFLOPs: 30.51 | +7: iteration 55200/ 173500 | consumed samples: 14131200 | consumed tokens: 28940697600 | elapsed time per iteration (s): 0.43 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 3.008017E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.775 | TFLOPs: 30.89 | +7: iteration 55210/ 173500 | consumed samples: 14133760 | consumed tokens: 28945940480 | elapsed time per iteration (s): 0.42 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 3.010374E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.223 | TFLOPs: 31.65 | +7: iteration 55220/ 173500 | consumed samples: 14136320 | consumed tokens: 28951183360 | elapsed time per iteration (s): 0.44 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 3.006780E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.459 | TFLOPs: 30.88 | +7: iteration 55230/ 173500 | consumed samples: 14138880 | consumed tokens: 28956426240 | elapsed time per iteration (s): 0.44 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.990223E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.957 | TFLOPs: 30.48 | +7: iteration 55240/ 173500 | consumed samples: 14141440 | consumed tokens: 28961669120 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 3.019629E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.020 | TFLOPs: 31.01 | +7: iteration 55250/ 173500 | consumed samples: 14144000 | consumed tokens: 28966912000 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 3.004053E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.999 | TFLOPs: 31.38 | +7: iteration 55260/ 173500 | consumed samples: 14146560 | consumed tokens: 28972154880 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.998379E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.345 | TFLOPs: 31.08 | +7: iteration 55270/ 173500 | consumed samples: 14149120 | consumed tokens: 28977397760 | elapsed time per iteration (s): 0.45 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 3.002392E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.423 | TFLOPs: 30.03 | +7: iteration 55280/ 173500 | consumed samples: 14151680 | consumed tokens: 28982640640 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 3.004103E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.561 | TFLOPs: 31.09 | +7: iteration 55290/ 173500 | consumed samples: 14154240 | consumed tokens: 28987883520 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 3.007622E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.571 | TFLOPs: 30.99 | +7: iteration 55300/ 173500 | consumed samples: 14156800 | consumed tokens: 28993126400 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.998464E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.102 | TFLOPs: 30.91 | +7: iteration 55310/ 173500 | consumed samples: 14159360 | consumed tokens: 28998369280 | elapsed time per iteration (s): 0.43 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 3.022839E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.889 | TFLOPs: 31.42 | +7: iteration 55320/ 173500 | consumed samples: 14161920 | consumed tokens: 29003612160 | elapsed time per iteration (s): 0.44 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 3.017642E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.333 | TFLOPs: 30.82 | +7: iteration 55330/ 173500 | consumed samples: 14164480 | consumed tokens: 29008855040 | elapsed time per iteration (s): 0.43 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 2.993568E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.763 | TFLOPs: 31.10 | +7: iteration 55340/ 173500 | consumed samples: 14167040 | consumed tokens: 29014097920 | elapsed time per iteration (s): 0.44 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 3.004838E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.872 | TFLOPs: 30.53 | +7: iteration 55350/ 173500 | consumed samples: 14169600 | consumed tokens: 29019340800 | elapsed time per iteration (s): 0.43 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 3.007129E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.073 | TFLOPs: 31.28 | +7: iteration 55360/ 173500 | consumed samples: 14172160 | consumed tokens: 29024583680 | elapsed time per iteration (s): 0.43 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 3.013139E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.676 | TFLOPs: 31.20 | +7: iteration 55370/ 173500 | consumed samples: 14174720 | consumed tokens: 29029826560 | elapsed time per iteration (s): 0.43 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 3.006689E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.438 | TFLOPs: 31.08 | +7: iteration 55380/ 173500 | consumed samples: 14177280 | consumed tokens: 29035069440 | elapsed time per iteration (s): 0.44 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 2.995960E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.933 | TFLOPs: 30.85 | +7: iteration 55390/ 173500 | consumed samples: 14179840 | consumed tokens: 29040312320 | elapsed time per iteration (s): 0.43 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 3.018365E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.784 | TFLOPs: 31.10 | +7: iteration 55400/ 173500 | consumed samples: 14182400 | consumed tokens: 29045555200 | elapsed time per iteration (s): 0.44 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 3.001037E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.673 | TFLOPs: 30.68 | +7: iteration 55410/ 173500 | consumed samples: 14184960 | consumed tokens: 29050798080 | elapsed time per iteration (s): 0.46 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 3.005612E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.827 | TFLOPs: 29.48 | +7: iteration 55420/ 173500 | consumed samples: 14187520 | consumed tokens: 29056040960 | elapsed time per iteration (s): 0.43 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 3.013663E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.070 | TFLOPs: 31.06 | +7: iteration 55430/ 173500 | consumed samples: 14190080 | consumed tokens: 29061283840 | elapsed time per iteration (s): 0.42 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 3.013985E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.743 | TFLOPs: 31.62 | +7: iteration 55440/ 173500 | consumed samples: 14192640 | consumed tokens: 29066526720 | elapsed time per iteration (s): 0.42 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 2.992857E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.920 | TFLOPs: 32.05 | +7: iteration 55450/ 173500 | consumed samples: 14195200 | consumed tokens: 29071769600 | elapsed time per iteration (s): 0.45 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 2.987960E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.134 | TFLOPs: 30.07 | +7: iteration 55460/ 173500 | consumed samples: 14197760 | consumed tokens: 29077012480 | elapsed time per iteration (s): 0.44 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 3.003892E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.796 | TFLOPs: 30.32 | +7: iteration 55470/ 173500 | consumed samples: 14200320 | consumed tokens: 29082255360 | elapsed time per iteration (s): 0.43 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 3.003172E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.698 | TFLOPs: 31.41 | +7: iteration 55480/ 173500 | consumed samples: 14202880 | consumed tokens: 29087498240 | elapsed time per iteration (s): 0.44 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 3.002824E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.678 | TFLOPs: 30.36 | +7: iteration 55490/ 173500 | consumed samples: 14205440 | consumed tokens: 29092741120 | elapsed time per iteration (s): 0.44 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 3.003895E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.262 | TFLOPs: 30.34 | +7: iteration 55500/ 173500 | consumed samples: 14208000 | consumed tokens: 29097984000 | elapsed time per iteration (s): 0.44 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 3.001544E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.118 | TFLOPs: 30.86 | +7: iteration 55510/ 173500 | consumed samples: 14210560 | consumed tokens: 29103226880 | elapsed time per iteration (s): 0.42 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 3.008316E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.880 | TFLOPs: 31.63 | +7: iteration 55520/ 173500 | consumed samples: 14213120 | consumed tokens: 29108469760 | elapsed time per iteration (s): 0.43 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 2.997930E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.053 | TFLOPs: 31.38 | +7: iteration 55530/ 173500 | consumed samples: 14215680 | consumed tokens: 29113712640 | elapsed time per iteration (s): 0.43 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 3.007717E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.601 | TFLOPs: 30.99 | +7: iteration 55540/ 173500 | consumed samples: 14218240 | consumed tokens: 29118955520 | elapsed time per iteration (s): 0.44 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 2.991293E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.099 | TFLOPs: 30.59 | +7: iteration 55550/ 173500 | consumed samples: 14220800 | consumed tokens: 29124198400 | elapsed time per iteration (s): 0.43 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 3.011511E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.565 | TFLOPs: 31.41 | +7: iteration 55560/ 173500 | consumed samples: 14223360 | consumed tokens: 29129441280 | elapsed time per iteration (s): 0.44 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 3.013821E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.647 | TFLOPs: 30.26 | +7: iteration 55570/ 173500 | consumed samples: 14225920 | consumed tokens: 29134684160 | elapsed time per iteration (s): 0.44 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 3.012243E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.947 | TFLOPs: 30.27 | +7: iteration 55580/ 173500 | consumed samples: 14228480 | consumed tokens: 29139927040 | elapsed time per iteration (s): 0.45 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 3.003891E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.604 | TFLOPs: 30.15 | +7: iteration 55590/ 173500 | consumed samples: 14231040 | consumed tokens: 29145169920 | elapsed time per iteration (s): 0.43 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 3.005234E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.027 | TFLOPs: 31.06 | +7: iteration 55600/ 173500 | consumed samples: 14233600 | consumed tokens: 29150412800 | elapsed time per iteration (s): 0.43 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 2.982128E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.601 | TFLOPs: 31.51 | +7: iteration 55610/ 173500 | consumed samples: 14236160 | consumed tokens: 29155655680 | elapsed time per iteration (s): 0.43 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 3.013443E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.088 | TFLOPs: 31.22 | +7: iteration 55620/ 173500 | consumed samples: 14238720 | consumed tokens: 29160898560 | elapsed time per iteration (s): 0.43 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 3.004217E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.900 | TFLOPs: 31.16 | +7: iteration 55630/ 173500 | consumed samples: 14241280 | consumed tokens: 29166141440 | elapsed time per iteration (s): 0.43 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 3.000629E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.162 | TFLOPs: 31.17 | +7: iteration 55640/ 173500 | consumed samples: 14243840 | consumed tokens: 29171384320 | elapsed time per iteration (s): 0.44 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 3.010786E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.066 | TFLOPs: 30.49 | +7: iteration 55650/ 173500 | consumed samples: 14246400 | consumed tokens: 29176627200 | elapsed time per iteration (s): 0.44 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 3.019454E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.268 | TFLOPs: 30.60 | +7: iteration 55660/ 173500 | consumed samples: 14248960 | consumed tokens: 29181870080 | elapsed time per iteration (s): 0.44 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 3.021199E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.298 | TFLOPs: 30.55 | +7: iteration 55670/ 173500 | consumed samples: 14251520 | consumed tokens: 29187112960 | elapsed time per iteration (s): 0.44 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 3.003391E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.985 | TFLOPs: 30.33 | +7: iteration 55680/ 173500 | consumed samples: 14254080 | consumed tokens: 29192355840 | elapsed time per iteration (s): 0.43 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 3.000051E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.436 | TFLOPs: 31.50 | +7: iteration 55690/ 173500 | consumed samples: 14256640 | consumed tokens: 29197598720 | elapsed time per iteration (s): 0.43 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 3.005592E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.188 | TFLOPs: 31.07 | +7: iteration 55700/ 173500 | consumed samples: 14259200 | consumed tokens: 29202841600 | elapsed time per iteration (s): 0.44 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 3.000157E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.940 | TFLOPs: 30.48 | +7: iteration 55710/ 173500 | consumed samples: 14261760 | consumed tokens: 29208084480 | elapsed time per iteration (s): 0.43 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 3.003765E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.930 | TFLOPs: 30.90 | +7: iteration 55720/ 173500 | consumed samples: 14264320 | consumed tokens: 29213327360 | elapsed time per iteration (s): 0.44 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 2.999994E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.678 | TFLOPs: 30.73 | +7: iteration 55730/ 173500 | consumed samples: 14266880 | consumed tokens: 29218570240 | elapsed time per iteration (s): 0.43 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 2.998790E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.772 | TFLOPs: 30.94 | +7: iteration 55740/ 173500 | consumed samples: 14269440 | consumed tokens: 29223813120 | elapsed time per iteration (s): 0.43 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 3.019210E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.987 | TFLOPs: 31.06 | +7: iteration 55750/ 173500 | consumed samples: 14272000 | consumed tokens: 29229056000 | elapsed time per iteration (s): 0.43 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 2.992770E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.746 | TFLOPs: 31.00 | +7: iteration 55760/ 173500 | consumed samples: 14274560 | consumed tokens: 29234298880 | elapsed time per iteration (s): 0.44 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 3.010102E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.601 | TFLOPs: 30.57 | +7: iteration 55770/ 173500 | consumed samples: 14277120 | consumed tokens: 29239541760 | elapsed time per iteration (s): 0.43 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 3.003894E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.125 | TFLOPs: 31.12 | +7: iteration 55780/ 173500 | consumed samples: 14279680 | consumed tokens: 29244784640 | elapsed time per iteration (s): 0.43 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 3.013718E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.767 | TFLOPs: 31.10 | +7: iteration 55790/ 173500 | consumed samples: 14282240 | consumed tokens: 29250027520 | elapsed time per iteration (s): 0.43 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 3.006376E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.661 | TFLOPs: 30.94 | +7: iteration 55800/ 173500 | consumed samples: 14284800 | consumed tokens: 29255270400 | elapsed time per iteration (s): 0.43 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 3.010106E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.851 | TFLOPs: 31.42 | +7: iteration 55810/ 173500 | consumed samples: 14287360 | consumed tokens: 29260513280 | elapsed time per iteration (s): 0.43 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 2.998877E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.821 | TFLOPs: 31.26 | +7: iteration 55820/ 173500 | consumed samples: 14289920 | consumed tokens: 29265756160 | elapsed time per iteration (s): 0.44 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 3.022095E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.064 | TFLOPs: 30.80 | +7: iteration 55830/ 173500 | consumed samples: 14292480 | consumed tokens: 29270999040 | elapsed time per iteration (s): 0.42 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 2.993044E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.468 | TFLOPs: 31.72 | +7: iteration 55840/ 173500 | consumed samples: 14295040 | consumed tokens: 29276241920 | elapsed time per iteration (s): 0.43 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 3.002333E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.257 | TFLOPs: 31.13 | +7: iteration 55850/ 173500 | consumed samples: 14297600 | consumed tokens: 29281484800 | elapsed time per iteration (s): 0.43 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 3.015191E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.551 | TFLOPs: 31.51 | +7: iteration 55860/ 173500 | consumed samples: 14300160 | consumed tokens: 29286727680 | elapsed time per iteration (s): 0.43 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 3.008566E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.523 | TFLOPs: 31.46 | +7: iteration 55870/ 173500 | consumed samples: 14302720 | consumed tokens: 29291970560 | elapsed time per iteration (s): 0.43 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 3.009486E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.304 | TFLOPs: 31.60 | +7: iteration 55880/ 173500 | consumed samples: 14305280 | consumed tokens: 29297213440 | elapsed time per iteration (s): 0.42 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 3.016277E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.113 | TFLOPs: 31.70 | +7: iteration 55890/ 173500 | consumed samples: 14307840 | consumed tokens: 29302456320 | elapsed time per iteration (s): 0.42 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 3.006932E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.973 | TFLOPs: 31.69 | +7: iteration 55900/ 173500 | consumed samples: 14310400 | consumed tokens: 29307699200 | elapsed time per iteration (s): 0.44 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 3.006621E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.131 | TFLOPs: 30.86 | +7: iteration 55910/ 173500 | consumed samples: 14312960 | consumed tokens: 29312942080 | elapsed time per iteration (s): 0.43 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 3.007283E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.936 | TFLOPs: 31.16 | +7: iteration 55920/ 173500 | consumed samples: 14315520 | consumed tokens: 29318184960 | elapsed time per iteration (s): 0.43 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 2.997717E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.354 | TFLOPs: 31.50 | +7: iteration 55930/ 173500 | consumed samples: 14318080 | consumed tokens: 29323427840 | elapsed time per iteration (s): 0.43 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 3.008347E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.211 | TFLOPs: 31.07 | +7: iteration 55940/ 173500 | consumed samples: 14320640 | consumed tokens: 29328670720 | elapsed time per iteration (s): 0.43 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 3.003477E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.811 | TFLOPs: 31.47 | +7: iteration 55950/ 173500 | consumed samples: 14323200 | consumed tokens: 29333913600 | elapsed time per iteration (s): 0.43 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 2.995062E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.110 | TFLOPs: 31.49 | +7: iteration 55960/ 173500 | consumed samples: 14325760 | consumed tokens: 29339156480 | elapsed time per iteration (s): 0.42 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.001779E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.308 | TFLOPs: 31.92 | +7: iteration 55970/ 173500 | consumed samples: 14328320 | consumed tokens: 29344399360 | elapsed time per iteration (s): 0.42 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.003587E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.650 | TFLOPs: 31.62 | +7: iteration 55980/ 173500 | consumed samples: 14330880 | consumed tokens: 29349642240 | elapsed time per iteration (s): 0.44 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.011541E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.482 | TFLOPs: 30.77 | +7: iteration 55990/ 173500 | consumed samples: 14333440 | consumed tokens: 29354885120 | elapsed time per iteration (s): 0.43 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.008560E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.758 | TFLOPs: 31.52 | +0: [2023-03-17 05:49:54,917] [INFO] [logging.py:68:log_dist] [Rank 0] step=56000, skipped=0, lr=[0.0001591933009380588, 0.0001591933009380588, 0.0001591933009380588], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 56000/ 173500 | consumed samples: 14336000 | consumed tokens: 29360128000 | elapsed time per iteration (s): 0.43 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.000587E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.941 | TFLOPs: 31.16 | +0: steps: 56000 loss: 2.9701 iter time (s): 0.429 samples/sec: 596.808 +7: iteration 56010/ 173500 | consumed samples: 14338560 | consumed tokens: 29365370880 | elapsed time per iteration (s): 0.45 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.000579E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.998 | TFLOPs: 30.01 | +7: iteration 56020/ 173500 | consumed samples: 14341120 | consumed tokens: 29370613760 | elapsed time per iteration (s): 0.43 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.004749E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.616 | TFLOPs: 31.15 | +7: iteration 56030/ 173500 | consumed samples: 14343680 | consumed tokens: 29375856640 | elapsed time per iteration (s): 0.44 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.009068E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.193 | TFLOPs: 30.86 | +7: iteration 56040/ 173500 | consumed samples: 14346240 | consumed tokens: 29381099520 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 3.005484E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.003 | TFLOPs: 31.22 | +7: iteration 56050/ 173500 | consumed samples: 14348800 | consumed tokens: 29386342400 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.995822E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.754 | TFLOPs: 31.31 | +7: iteration 56060/ 173500 | consumed samples: 14351360 | consumed tokens: 29391585280 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.996248E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.397 | TFLOPs: 31.24 | +7: iteration 56070/ 173500 | consumed samples: 14353920 | consumed tokens: 29396828160 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.997362E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.869 | TFLOPs: 31.53 | +7: iteration 56080/ 173500 | consumed samples: 14356480 | consumed tokens: 29402071040 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 3.003809E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.100 | TFLOPs: 31.17 | +7: iteration 56090/ 173500 | consumed samples: 14359040 | consumed tokens: 29407313920 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 3.003958E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.092 | TFLOPs: 31.43 | +7: iteration 56100/ 173500 | consumed samples: 14361600 | consumed tokens: 29412556800 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 3.001389E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.978 | TFLOPs: 31.48 | +7: iteration 56110/ 173500 | consumed samples: 14364160 | consumed tokens: 29417799680 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.997989E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.180 | TFLOPs: 31.44 | +7: iteration 56120/ 173500 | consumed samples: 14366720 | consumed tokens: 29423042560 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 3.016816E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.946 | TFLOPs: 31.32 | +7: iteration 56130/ 173500 | consumed samples: 14369280 | consumed tokens: 29428285440 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 3.011774E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.949 | TFLOPs: 31.37 | +7: iteration 56140/ 173500 | consumed samples: 14371840 | consumed tokens: 29433528320 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 3.012477E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.459 | TFLOPs: 31.24 | +7: iteration 56150/ 173500 | consumed samples: 14374400 | consumed tokens: 29438771200 | elapsed time per iteration (s): 0.42 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.996674E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.081 | TFLOPs: 31.75 | +7: iteration 56160/ 173500 | consumed samples: 14376960 | consumed tokens: 29444014080 | elapsed time per iteration (s): 0.42 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.998308E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.572 | TFLOPs: 31.77 | +7: iteration 56170/ 173500 | consumed samples: 14379520 | consumed tokens: 29449256960 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 3.015897E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.826 | TFLOPs: 31.31 | +7: iteration 56180/ 173500 | consumed samples: 14382080 | consumed tokens: 29454499840 | elapsed time per iteration (s): 0.43 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 2.998220E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.274 | TFLOPs: 31.23 | +7: iteration 56190/ 173500 | consumed samples: 14384640 | consumed tokens: 29459742720 | elapsed time per iteration (s): 0.44 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 3.018217E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.507 | TFLOPs: 30.30 | +7: iteration 56200/ 173500 | consumed samples: 14387200 | consumed tokens: 29464985600 | elapsed time per iteration (s): 0.42 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 3.006188E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.709 | TFLOPs: 31.73 | +7: iteration 56210/ 173500 | consumed samples: 14389760 | consumed tokens: 29470228480 | elapsed time per iteration (s): 0.43 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 3.014598E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.500 | TFLOPs: 31.19 | +7: iteration 56220/ 173500 | consumed samples: 14392320 | consumed tokens: 29475471360 | elapsed time per iteration (s): 0.43 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 2.997628E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.253 | TFLOPs: 31.34 | +7: iteration 56230/ 173500 | consumed samples: 14394880 | consumed tokens: 29480714240 | elapsed time per iteration (s): 0.43 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 2.997952E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.918 | TFLOPs: 31.06 | +7: iteration 56240/ 173500 | consumed samples: 14397440 | consumed tokens: 29485957120 | elapsed time per iteration (s): 0.42 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 3.003602E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.091 | TFLOPs: 31.85 | +7: iteration 56250/ 173500 | consumed samples: 14400000 | consumed tokens: 29491200000 | elapsed time per iteration (s): 0.42 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 3.002297E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.885 | TFLOPs: 31.79 | +7: iteration 56260/ 173500 | consumed samples: 14402560 | consumed tokens: 29496442880 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.993171E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.184 | TFLOPs: 31.44 | +7: iteration 56270/ 173500 | consumed samples: 14405120 | consumed tokens: 29501685760 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 3.005032E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.882 | TFLOPs: 31.16 | +7: iteration 56280/ 173500 | consumed samples: 14407680 | consumed tokens: 29506928640 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.997584E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.269 | TFLOPs: 31.39 | +7: iteration 56290/ 173500 | consumed samples: 14410240 | consumed tokens: 29512171520 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.999377E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.783 | TFLOPs: 31.52 | +7: iteration 56300/ 173500 | consumed samples: 14412800 | consumed tokens: 29517414400 | elapsed time per iteration (s): 0.42 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.994318E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.503 | TFLOPs: 31.87 | +7: iteration 56310/ 173500 | consumed samples: 14415360 | consumed tokens: 29522657280 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.989943E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.088 | TFLOPs: 31.43 | +7: iteration 56320/ 173500 | consumed samples: 14417920 | consumed tokens: 29527900160 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 3.001505E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.172 | TFLOPs: 31.28 | +7: iteration 56330/ 173500 | consumed samples: 14420480 | consumed tokens: 29533143040 | elapsed time per iteration (s): 0.43 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 3.023572E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.114 | TFLOPs: 31.43 | +7: iteration 56340/ 173500 | consumed samples: 14423040 | consumed tokens: 29538385920 | elapsed time per iteration (s): 0.43 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.993758E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.630 | TFLOPs: 31.09 | +7: iteration 56350/ 173500 | consumed samples: 14425600 | consumed tokens: 29543628800 | elapsed time per iteration (s): 0.44 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 3.003716E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.315 | TFLOPs: 30.61 | +7: iteration 56360/ 173500 | consumed samples: 14428160 | consumed tokens: 29548871680 | elapsed time per iteration (s): 0.43 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 3.000982E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.319 | TFLOPs: 31.50 | +7: iteration 56370/ 173500 | consumed samples: 14430720 | consumed tokens: 29554114560 | elapsed time per iteration (s): 0.43 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.999447E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.610 | TFLOPs: 31.41 | +7: iteration 56380/ 173500 | consumed samples: 14433280 | consumed tokens: 29559357440 | elapsed time per iteration (s): 0.43 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.994007E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.481 | TFLOPs: 31.35 | +7: iteration 56390/ 173500 | consumed samples: 14435840 | consumed tokens: 29564600320 | elapsed time per iteration (s): 0.43 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.991796E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.706 | TFLOPs: 31.57 | +7: iteration 56400/ 173500 | consumed samples: 14438400 | consumed tokens: 29569843200 | elapsed time per iteration (s): 0.43 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 3.005468E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.046 | TFLOPs: 31.22 | +7: iteration 56410/ 173500 | consumed samples: 14440960 | consumed tokens: 29575086080 | elapsed time per iteration (s): 0.43 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 2.995310E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.063 | TFLOPs: 31.22 | +7: iteration 56420/ 173500 | consumed samples: 14443520 | consumed tokens: 29580328960 | elapsed time per iteration (s): 0.43 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 3.008492E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.408 | TFLOPs: 31.50 | +7: iteration 56430/ 173500 | consumed samples: 14446080 | consumed tokens: 29585571840 | elapsed time per iteration (s): 0.44 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 2.995085E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.589 | TFLOPs: 30.83 | +7: iteration 56440/ 173500 | consumed samples: 14448640 | consumed tokens: 29590814720 | elapsed time per iteration (s): 0.44 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 3.006139E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.346 | TFLOPs: 30.34 | +7: iteration 56450/ 173500 | consumed samples: 14451200 | consumed tokens: 29596057600 | elapsed time per iteration (s): 0.45 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 3.002624E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.434 | TFLOPs: 29.98 | +7: iteration 56460/ 173500 | consumed samples: 14453760 | consumed tokens: 29601300480 | elapsed time per iteration (s): 0.43 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 3.004420E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.301 | TFLOPs: 31.60 | +7: iteration 56470/ 173500 | consumed samples: 14456320 | consumed tokens: 29606543360 | elapsed time per iteration (s): 0.45 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 3.005056E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.129 | TFLOPs: 29.65 | +7: iteration 56480/ 173500 | consumed samples: 14458880 | consumed tokens: 29611786240 | elapsed time per iteration (s): 0.44 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 3.011736E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.834 | TFLOPs: 30.69 | +7: iteration 56490/ 173500 | consumed samples: 14461440 | consumed tokens: 29617029120 | elapsed time per iteration (s): 0.42 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.998353E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.955 | TFLOPs: 31.69 | +7: iteration 56500/ 173500 | consumed samples: 14464000 | consumed tokens: 29622272000 | elapsed time per iteration (s): 0.43 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.997807E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.586 | TFLOPs: 31.04 | +7: iteration 56510/ 173500 | consumed samples: 14466560 | consumed tokens: 29627514880 | elapsed time per iteration (s): 0.43 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 3.004905E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.799 | TFLOPs: 31.37 | +7: iteration 56520/ 173500 | consumed samples: 14469120 | consumed tokens: 29632757760 | elapsed time per iteration (s): 0.42 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.996787E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.132 | TFLOPs: 31.65 | +7: iteration 56530/ 173500 | consumed samples: 14471680 | consumed tokens: 29638000640 | elapsed time per iteration (s): 0.44 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 3.023802E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.537 | TFLOPs: 30.77 | +7: iteration 56540/ 173500 | consumed samples: 14474240 | consumed tokens: 29643243520 | elapsed time per iteration (s): 0.43 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 3.006602E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.134 | TFLOPs: 31.28 | +7: iteration 56550/ 173500 | consumed samples: 14476800 | consumed tokens: 29648486400 | elapsed time per iteration (s): 0.43 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 3.003502E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.308 | TFLOPs: 31.55 | +7: iteration 56560/ 173500 | consumed samples: 14479360 | consumed tokens: 29653729280 | elapsed time per iteration (s): 0.42 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 3.003236E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.972 | TFLOPs: 31.74 | +7: iteration 56570/ 173500 | consumed samples: 14481920 | consumed tokens: 29658972160 | elapsed time per iteration (s): 0.42 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 2.994421E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.755 | TFLOPs: 31.89 | +7: iteration 56580/ 173500 | consumed samples: 14484480 | consumed tokens: 29664215040 | elapsed time per iteration (s): 0.42 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 2.986926E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.002 | TFLOPs: 31.69 | +7: iteration 56590/ 173500 | consumed samples: 14487040 | consumed tokens: 29669457920 | elapsed time per iteration (s): 0.43 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 3.000183E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.815 | TFLOPs: 31.16 | +7: iteration 56600/ 173500 | consumed samples: 14489600 | consumed tokens: 29674700800 | elapsed time per iteration (s): 0.43 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 3.010721E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.964 | TFLOPs: 31.43 | +7: iteration 56610/ 173500 | consumed samples: 14492160 | consumed tokens: 29679943680 | elapsed time per iteration (s): 0.42 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 2.993497E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.178 | TFLOPs: 31.91 | +7: iteration 56620/ 173500 | consumed samples: 14494720 | consumed tokens: 29685186560 | elapsed time per iteration (s): 0.42 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 3.006776E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.821 | TFLOPs: 31.89 | +7: iteration 56630/ 173500 | consumed samples: 14497280 | consumed tokens: 29690429440 | elapsed time per iteration (s): 0.42 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 3.001409E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.230 | TFLOPs: 32.02 | +7: iteration 56640/ 173500 | consumed samples: 14499840 | consumed tokens: 29695672320 | elapsed time per iteration (s): 0.43 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 3.001682E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.235 | TFLOPs: 31.28 | +7: iteration 56650/ 173500 | consumed samples: 14502400 | consumed tokens: 29700915200 | elapsed time per iteration (s): 0.42 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 3.004741E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.908 | TFLOPs: 31.74 | +7: iteration 56660/ 173500 | consumed samples: 14504960 | consumed tokens: 29706158080 | elapsed time per iteration (s): 0.42 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 2.995995E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.355 | TFLOPs: 31.76 | +7: iteration 56670/ 173500 | consumed samples: 14507520 | consumed tokens: 29711400960 | elapsed time per iteration (s): 0.44 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 3.009604E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.649 | TFLOPs: 30.47 | +7: iteration 56680/ 173500 | consumed samples: 14510080 | consumed tokens: 29716643840 | elapsed time per iteration (s): 0.44 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 3.011244E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.066 | TFLOPs: 30.70 | +7: iteration 56690/ 173500 | consumed samples: 14512640 | consumed tokens: 29721886720 | elapsed time per iteration (s): 0.44 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 3.007748E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.692 | TFLOPs: 30.26 | +7: iteration 56700/ 173500 | consumed samples: 14515200 | consumed tokens: 29727129600 | elapsed time per iteration (s): 0.42 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 3.013799E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.232 | TFLOPs: 31.70 | +7: iteration 56710/ 173500 | consumed samples: 14517760 | consumed tokens: 29732372480 | elapsed time per iteration (s): 0.43 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 3.002862E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.101 | TFLOPs: 31.28 | +7: iteration 56720/ 173500 | consumed samples: 14520320 | consumed tokens: 29737615360 | elapsed time per iteration (s): 0.43 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 3.019870E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.217 | TFLOPs: 31.28 | +7: iteration 56730/ 173500 | consumed samples: 14522880 | consumed tokens: 29742858240 | elapsed time per iteration (s): 0.43 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 3.019532E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.084 | TFLOPs: 31.49 | +7: iteration 56740/ 173500 | consumed samples: 14525440 | consumed tokens: 29748101120 | elapsed time per iteration (s): 0.43 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 3.021121E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.932 | TFLOPs: 31.32 | +7: iteration 56750/ 173500 | consumed samples: 14528000 | consumed tokens: 29753344000 | elapsed time per iteration (s): 0.46 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 3.027484E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.490 | TFLOPs: 29.25 | +7: iteration 56760/ 173500 | consumed samples: 14530560 | consumed tokens: 29758586880 | elapsed time per iteration (s): 0.44 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.009169E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.819 | TFLOPs: 30.32 | +7: iteration 56770/ 173500 | consumed samples: 14533120 | consumed tokens: 29763829760 | elapsed time per iteration (s): 0.43 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.022454E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.059 | TFLOPs: 31.01 | +7: iteration 56780/ 173500 | consumed samples: 14535680 | consumed tokens: 29769072640 | elapsed time per iteration (s): 0.44 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.007201E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.557 | TFLOPs: 30.57 | +7: iteration 56790/ 173500 | consumed samples: 14538240 | consumed tokens: 29774315520 | elapsed time per iteration (s): 0.43 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.015218E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.120 | TFLOPs: 31.43 | +7: iteration 56800/ 173500 | consumed samples: 14540800 | consumed tokens: 29779558400 | elapsed time per iteration (s): 0.42 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.011713E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.825 | TFLOPs: 31.94 | +7: iteration 56810/ 173500 | consumed samples: 14543360 | consumed tokens: 29784801280 | elapsed time per iteration (s): 0.43 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.001236E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.879 | TFLOPs: 31.05 | +7: iteration 56820/ 173500 | consumed samples: 14545920 | consumed tokens: 29790044160 | elapsed time per iteration (s): 0.43 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.008797E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.038 | TFLOPs: 31.59 | +7: iteration 56830/ 173500 | consumed samples: 14548480 | consumed tokens: 29795287040 | elapsed time per iteration (s): 0.42 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 3.005207E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.477 | TFLOPs: 31.82 | +7: iteration 56840/ 173500 | consumed samples: 14551040 | consumed tokens: 29800529920 | elapsed time per iteration (s): 0.42 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.991561E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.472 | TFLOPs: 31.82 | +7: iteration 56850/ 173500 | consumed samples: 14553600 | consumed tokens: 29805772800 | elapsed time per iteration (s): 0.43 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.986723E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.592 | TFLOPs: 31.20 | +7: iteration 56860/ 173500 | consumed samples: 14556160 | consumed tokens: 29811015680 | elapsed time per iteration (s): 0.42 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 3.005927E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.275 | TFLOPs: 31.76 | +7: iteration 56870/ 173500 | consumed samples: 14558720 | consumed tokens: 29816258560 | elapsed time per iteration (s): 0.43 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.997286E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.091 | TFLOPs: 31.54 | +7: iteration 56880/ 173500 | consumed samples: 14561280 | consumed tokens: 29821501440 | elapsed time per iteration (s): 0.42 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.999419E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.505 | TFLOPs: 31.77 | +7: iteration 56890/ 173500 | consumed samples: 14563840 | consumed tokens: 29826744320 | elapsed time per iteration (s): 0.43 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 3.000583E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.031 | TFLOPs: 31.59 | +7: iteration 56900/ 173500 | consumed samples: 14566400 | consumed tokens: 29831987200 | elapsed time per iteration (s): 0.42 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 3.006296E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.942 | TFLOPs: 31.69 | +7: iteration 56910/ 173500 | consumed samples: 14568960 | consumed tokens: 29837230080 | elapsed time per iteration (s): 0.42 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 2.996545E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.020 | TFLOPs: 31.64 | +7: iteration 56920/ 173500 | consumed samples: 14571520 | consumed tokens: 29842472960 | elapsed time per iteration (s): 0.43 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 3.014540E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.662 | TFLOPs: 31.46 | +7: iteration 56930/ 173500 | consumed samples: 14574080 | consumed tokens: 29847715840 | elapsed time per iteration (s): 0.42 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 3.002090E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.829 | TFLOPs: 31.68 | +7: iteration 56940/ 173500 | consumed samples: 14576640 | consumed tokens: 29852958720 | elapsed time per iteration (s): 0.44 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 3.002021E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.245 | TFLOPs: 30.71 | +7: iteration 56950/ 173500 | consumed samples: 14579200 | consumed tokens: 29858201600 | elapsed time per iteration (s): 0.43 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 3.003653E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.088 | TFLOPs: 31.01 | +7: iteration 56960/ 173500 | consumed samples: 14581760 | consumed tokens: 29863444480 | elapsed time per iteration (s): 0.42 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 3.013595E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.628 | TFLOPs: 31.83 | +7: iteration 56970/ 173500 | consumed samples: 14584320 | consumed tokens: 29868687360 | elapsed time per iteration (s): 0.45 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 3.014079E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.559 | TFLOPs: 29.94 | +7: iteration 56980/ 173500 | consumed samples: 14586880 | consumed tokens: 29873930240 | elapsed time per iteration (s): 0.42 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.989643E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.820 | TFLOPs: 31.63 | +7: iteration 56990/ 173500 | consumed samples: 14589440 | consumed tokens: 29879173120 | elapsed time per iteration (s): 0.43 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 3.023482E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.375 | TFLOPs: 31.34 | +7: iteration 57000/ 173500 | consumed samples: 14592000 | consumed tokens: 29884416000 | elapsed time per iteration (s): 0.42 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 3.006904E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.930 | TFLOPs: 31.74 | +7: iteration 57010/ 173500 | consumed samples: 14594560 | consumed tokens: 29889658880 | elapsed time per iteration (s): 0.43 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.988341E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.499 | TFLOPs: 31.19 | +7: iteration 57020/ 173500 | consumed samples: 14597120 | consumed tokens: 29894901760 | elapsed time per iteration (s): 0.44 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 3.003795E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.623 | TFLOPs: 30.25 | +7: iteration 57030/ 173500 | consumed samples: 14599680 | consumed tokens: 29900144640 | elapsed time per iteration (s): 0.43 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 3.008591E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.308 | TFLOPs: 31.13 | +7: iteration 57040/ 173500 | consumed samples: 14602240 | consumed tokens: 29905387520 | elapsed time per iteration (s): 0.43 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.994135E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.985 | TFLOPs: 31.27 | +7: iteration 57050/ 173500 | consumed samples: 14604800 | consumed tokens: 29910630400 | elapsed time per iteration (s): 0.43 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 3.006553E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.852 | TFLOPs: 31.53 | +7: iteration 57060/ 173500 | consumed samples: 14607360 | consumed tokens: 29915873280 | elapsed time per iteration (s): 0.44 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 3.003812E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.475 | TFLOPs: 30.88 | +7: iteration 57070/ 173500 | consumed samples: 14609920 | consumed tokens: 29921116160 | elapsed time per iteration (s): 0.45 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 3.005490E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.872 | TFLOPs: 29.95 | +7: iteration 57080/ 173500 | consumed samples: 14612480 | consumed tokens: 29926359040 | elapsed time per iteration (s): 0.43 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 2.986154E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.022 | TFLOPs: 31.53 | +7: iteration 57090/ 173500 | consumed samples: 14615040 | consumed tokens: 29931601920 | elapsed time per iteration (s): 0.44 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 3.001382E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.991 | TFLOPs: 30.64 | +7: iteration 57100/ 173500 | consumed samples: 14617600 | consumed tokens: 29936844800 | elapsed time per iteration (s): 0.46 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 3.012306E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.911 | TFLOPs: 29.48 | +7: iteration 57110/ 173500 | consumed samples: 14620160 | consumed tokens: 29942087680 | elapsed time per iteration (s): 0.44 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 2.997805E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.716 | TFLOPs: 30.68 | +7: iteration 57120/ 173500 | consumed samples: 14622720 | consumed tokens: 29947330560 | elapsed time per iteration (s): 0.44 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.995240E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.294 | TFLOPs: 30.24 | +7: iteration 57130/ 173500 | consumed samples: 14625280 | consumed tokens: 29952573440 | elapsed time per iteration (s): 0.44 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 3.013669E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.259 | TFLOPs: 30.66 | +7: iteration 57140/ 173500 | consumed samples: 14627840 | consumed tokens: 29957816320 | elapsed time per iteration (s): 0.44 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 3.001437E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.800 | TFLOPs: 30.21 | +7: iteration 57150/ 173500 | consumed samples: 14630400 | consumed tokens: 29963059200 | elapsed time per iteration (s): 0.45 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.996212E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.506 | TFLOPs: 29.57 | +7: iteration 57160/ 173500 | consumed samples: 14632960 | consumed tokens: 29968302080 | elapsed time per iteration (s): 0.44 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 3.005791E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.933 | TFLOPs: 30.69 | +7: iteration 57170/ 173500 | consumed samples: 14635520 | consumed tokens: 29973544960 | elapsed time per iteration (s): 0.43 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.994053E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.632 | TFLOPs: 31.20 | +7: iteration 57180/ 173500 | consumed samples: 14638080 | consumed tokens: 29978787840 | elapsed time per iteration (s): 0.44 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.991226E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.515 | TFLOPs: 30.46 | +7: iteration 57190/ 173500 | consumed samples: 14640640 | consumed tokens: 29984030720 | elapsed time per iteration (s): 0.44 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 3.000728E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.144 | TFLOPs: 30.23 | +7: iteration 57200/ 173500 | consumed samples: 14643200 | consumed tokens: 29989273600 | elapsed time per iteration (s): 0.44 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 3.007603E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.826 | TFLOPs: 30.68 | +7: iteration 57210/ 173500 | consumed samples: 14645760 | consumed tokens: 29994516480 | elapsed time per iteration (s): 0.44 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.994772E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.073 | TFLOPs: 30.54 | +7: iteration 57220/ 173500 | consumed samples: 14648320 | consumed tokens: 29999759360 | elapsed time per iteration (s): 0.48 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 3.008307E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.753 | TFLOPs: 28.22 | +7: iteration 57230/ 173500 | consumed samples: 14650880 | consumed tokens: 30005002240 | elapsed time per iteration (s): 0.45 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.994551E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.162 | TFLOPs: 30.13 | +7: iteration 57240/ 173500 | consumed samples: 14653440 | consumed tokens: 30010245120 | elapsed time per iteration (s): 0.44 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.996895E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.891 | TFLOPs: 30.53 | +7: iteration 57250/ 173500 | consumed samples: 14656000 | consumed tokens: 30015488000 | elapsed time per iteration (s): 0.46 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.992330E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.971 | TFLOPs: 28.96 | +7: iteration 57260/ 173500 | consumed samples: 14658560 | consumed tokens: 30020730880 | elapsed time per iteration (s): 0.46 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 3.005968E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.285 | TFLOPs: 28.98 | +7: iteration 57270/ 173500 | consumed samples: 14661120 | consumed tokens: 30025973760 | elapsed time per iteration (s): 0.45 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.988651E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.527 | TFLOPs: 30.04 | +7: iteration 57280/ 173500 | consumed samples: 14663680 | consumed tokens: 30031216640 | elapsed time per iteration (s): 0.45 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 3.012677E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.386 | TFLOPs: 29.98 | +7: iteration 57290/ 173500 | consumed samples: 14666240 | consumed tokens: 30036459520 | elapsed time per iteration (s): 0.44 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 3.009557E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.369 | TFLOPs: 30.50 | +7: iteration 57300/ 173500 | consumed samples: 14668800 | consumed tokens: 30041702400 | elapsed time per iteration (s): 0.43 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.992818E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.932 | TFLOPs: 31.27 | +7: iteration 57310/ 173500 | consumed samples: 14671360 | consumed tokens: 30046945280 | elapsed time per iteration (s): 0.42 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.997705E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.079 | TFLOPs: 31.64 | +7: iteration 57320/ 173500 | consumed samples: 14673920 | consumed tokens: 30052188160 | elapsed time per iteration (s): 0.43 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.993056E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.256 | TFLOPs: 31.07 | +7: iteration 57330/ 173500 | consumed samples: 14676480 | consumed tokens: 30057431040 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 3.003037E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.136 | TFLOPs: 31.17 | +7: iteration 57340/ 173500 | consumed samples: 14679040 | consumed tokens: 30062673920 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 3.001171E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.188 | TFLOPs: 31.28 | +7: iteration 57350/ 173500 | consumed samples: 14681600 | consumed tokens: 30067916800 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 3.006758E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.271 | TFLOPs: 31.39 | +7: iteration 57360/ 173500 | consumed samples: 14684160 | consumed tokens: 30073159680 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 3.003766E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.216 | TFLOPs: 31.60 | +7: iteration 57370/ 173500 | consumed samples: 14686720 | consumed tokens: 30078402560 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 3.001137E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.460 | TFLOPs: 31.30 | +7: iteration 57380/ 173500 | consumed samples: 14689280 | consumed tokens: 30083645440 | elapsed time per iteration (s): 0.44 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 3.009888E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.426 | TFLOPs: 30.82 | +7: iteration 57390/ 173500 | consumed samples: 14691840 | consumed tokens: 30088888320 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 2.984330E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.451 | TFLOPs: 30.93 | +7: iteration 57400/ 173500 | consumed samples: 14694400 | consumed tokens: 30094131200 | elapsed time per iteration (s): 0.43 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 2.998794E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.957 | TFLOPs: 31.11 | +7: iteration 57410/ 173500 | consumed samples: 14696960 | consumed tokens: 30099374080 | elapsed time per iteration (s): 0.43 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 2.994917E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.630 | TFLOPs: 31.46 | +7: iteration 57420/ 173500 | consumed samples: 14699520 | consumed tokens: 30104616960 | elapsed time per iteration (s): 0.43 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 3.004072E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.261 | TFLOPs: 31.28 | +7: iteration 57430/ 173500 | consumed samples: 14702080 | consumed tokens: 30109859840 | elapsed time per iteration (s): 0.43 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 3.010481E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.895 | TFLOPs: 31.48 | +7: iteration 57440/ 173500 | consumed samples: 14704640 | consumed tokens: 30115102720 | elapsed time per iteration (s): 0.42 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 3.007761E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.661 | TFLOPs: 31.94 | +7: iteration 57450/ 173500 | consumed samples: 14707200 | consumed tokens: 30120345600 | elapsed time per iteration (s): 0.42 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 2.998248E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.928 | TFLOPs: 31.90 | +7: iteration 57460/ 173500 | consumed samples: 14709760 | consumed tokens: 30125588480 | elapsed time per iteration (s): 0.43 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 3.004806E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.096 | TFLOPs: 31.59 | +7: iteration 57470/ 173500 | consumed samples: 14712320 | consumed tokens: 30130831360 | elapsed time per iteration (s): 0.42 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 2.995903E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.765 | TFLOPs: 31.84 | +7: iteration 57480/ 173500 | consumed samples: 14714880 | consumed tokens: 30136074240 | elapsed time per iteration (s): 0.43 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 3.003807E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.529 | TFLOPs: 31.19 | +7: iteration 57490/ 173500 | consumed samples: 14717440 | consumed tokens: 30141317120 | elapsed time per iteration (s): 0.42 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 3.005001E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.267 | TFLOPs: 31.81 | +7: iteration 57500/ 173500 | consumed samples: 14720000 | consumed tokens: 30146560000 | elapsed time per iteration (s): 0.42 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 3.004299E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.461 | TFLOPs: 31.72 | +7: iteration 57510/ 173500 | consumed samples: 14722560 | consumed tokens: 30151802880 | elapsed time per iteration (s): 0.42 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 3.007040E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.196 | TFLOPs: 32.02 | +7: iteration 57520/ 173500 | consumed samples: 14725120 | consumed tokens: 30157045760 | elapsed time per iteration (s): 0.43 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 3.007336E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.802 | TFLOPs: 31.47 | +7: iteration 57530/ 173500 | consumed samples: 14727680 | consumed tokens: 30162288640 | elapsed time per iteration (s): 0.42 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 3.011945E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.601 | TFLOPs: 31.67 | +7: iteration 57540/ 173500 | consumed samples: 14730240 | consumed tokens: 30167531520 | elapsed time per iteration (s): 0.43 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 2.988313E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.783 | TFLOPs: 31.21 | +7: iteration 57550/ 173500 | consumed samples: 14732800 | consumed tokens: 30172774400 | elapsed time per iteration (s): 0.42 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 2.997612E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.818 | TFLOPs: 31.79 | +7: iteration 57560/ 173500 | consumed samples: 14735360 | consumed tokens: 30178017280 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 3.009216E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.252 | TFLOPs: 31.28 | +7: iteration 57570/ 173500 | consumed samples: 14737920 | consumed tokens: 30183260160 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 3.006815E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.112 | TFLOPs: 31.01 | +7: iteration 57580/ 173500 | consumed samples: 14740480 | consumed tokens: 30188503040 | elapsed time per iteration (s): 0.42 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 2.992587E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.784 | TFLOPs: 31.63 | +7: iteration 57590/ 173500 | consumed samples: 14743040 | consumed tokens: 30193745920 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 3.007001E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.974 | TFLOPs: 31.48 | +7: iteration 57600/ 173500 | consumed samples: 14745600 | consumed tokens: 30198988800 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 3.001125E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.329 | TFLOPs: 31.08 | +7: iteration 57610/ 173500 | consumed samples: 14748160 | consumed tokens: 30204231680 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 3.000966E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.448 | TFLOPs: 31.19 | +7: iteration 57620/ 173500 | consumed samples: 14750720 | consumed tokens: 30209474560 | elapsed time per iteration (s): 0.44 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 3.007377E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.518 | TFLOPs: 30.51 | +7: iteration 57630/ 173500 | consumed samples: 14753280 | consumed tokens: 30214717440 | elapsed time per iteration (s): 0.43 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.982741E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.023 | TFLOPs: 31.48 | +7: iteration 57640/ 173500 | consumed samples: 14755840 | consumed tokens: 30219960320 | elapsed time per iteration (s): 0.44 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 3.001514E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.044 | TFLOPs: 30.59 | +7: iteration 57650/ 173500 | consumed samples: 14758400 | consumed tokens: 30225203200 | elapsed time per iteration (s): 0.42 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.990678E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.097 | TFLOPs: 31.80 | +7: iteration 57660/ 173500 | consumed samples: 14760960 | consumed tokens: 30230446080 | elapsed time per iteration (s): 0.42 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.993606E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.900 | TFLOPs: 31.84 | +7: iteration 57670/ 173500 | consumed samples: 14763520 | consumed tokens: 30235688960 | elapsed time per iteration (s): 0.44 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 3.009634E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.254 | TFLOPs: 30.34 | +7: iteration 57680/ 173500 | consumed samples: 14766080 | consumed tokens: 30240931840 | elapsed time per iteration (s): 0.43 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 3.004545E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.083 | TFLOPs: 31.33 | +7: iteration 57690/ 173500 | consumed samples: 14768640 | consumed tokens: 30246174720 | elapsed time per iteration (s): 0.43 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.993809E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.528 | TFLOPs: 31.09 | +7: iteration 57700/ 173500 | consumed samples: 14771200 | consumed tokens: 30251417600 | elapsed time per iteration (s): 0.43 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.997312E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.375 | TFLOPs: 31.24 | +7: iteration 57710/ 173500 | consumed samples: 14773760 | consumed tokens: 30256660480 | elapsed time per iteration (s): 0.42 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.991650E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.265 | TFLOPs: 31.86 | +7: iteration 57720/ 173500 | consumed samples: 14776320 | consumed tokens: 30261903360 | elapsed time per iteration (s): 0.43 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 3.009333E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.241 | TFLOPs: 31.34 | +7: iteration 57730/ 173500 | consumed samples: 14778880 | consumed tokens: 30267146240 | elapsed time per iteration (s): 0.44 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 3.003473E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.501 | TFLOPs: 30.62 | +7: iteration 57740/ 173500 | consumed samples: 14781440 | consumed tokens: 30272389120 | elapsed time per iteration (s): 0.43 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.991777E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.697 | TFLOPs: 31.52 | +7: iteration 57750/ 173500 | consumed samples: 14784000 | consumed tokens: 30277632000 | elapsed time per iteration (s): 0.43 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 3.001429E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.477 | TFLOPs: 31.45 | +7: iteration 57760/ 173500 | consumed samples: 14786560 | consumed tokens: 30282874880 | elapsed time per iteration (s): 0.43 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 3.014089E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.055 | TFLOPs: 31.38 | +7: iteration 57770/ 173500 | consumed samples: 14789120 | consumed tokens: 30288117760 | elapsed time per iteration (s): 0.42 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 3.002666E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.035 | TFLOPs: 31.75 | +7: iteration 57780/ 173500 | consumed samples: 14791680 | consumed tokens: 30293360640 | elapsed time per iteration (s): 0.43 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.998672E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.921 | TFLOPs: 31.37 | +7: iteration 57790/ 173500 | consumed samples: 14794240 | consumed tokens: 30298603520 | elapsed time per iteration (s): 0.42 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 3.005991E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.907 | TFLOPs: 31.69 | +7: iteration 57800/ 173500 | consumed samples: 14796800 | consumed tokens: 30303846400 | elapsed time per iteration (s): 0.42 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.993130E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.419 | TFLOPs: 31.77 | +7: iteration 57810/ 173500 | consumed samples: 14799360 | consumed tokens: 30309089280 | elapsed time per iteration (s): 0.43 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.994135E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.973 | TFLOPs: 31.58 | +7: iteration 57820/ 173500 | consumed samples: 14801920 | consumed tokens: 30314332160 | elapsed time per iteration (s): 0.43 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 3.003905E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.538 | TFLOPs: 31.14 | +7: iteration 57830/ 173500 | consumed samples: 14804480 | consumed tokens: 30319575040 | elapsed time per iteration (s): 0.42 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 3.007986E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.325 | TFLOPs: 31.92 | +7: iteration 57840/ 173500 | consumed samples: 14807040 | consumed tokens: 30324817920 | elapsed time per iteration (s): 0.42 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.994630E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.461 | TFLOPs: 32.03 | +7: iteration 57850/ 173500 | consumed samples: 14809600 | consumed tokens: 30330060800 | elapsed time per iteration (s): 0.42 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.992290E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.970 | TFLOPs: 32.00 | +7: iteration 57860/ 173500 | consumed samples: 14812160 | consumed tokens: 30335303680 | elapsed time per iteration (s): 0.43 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 3.002704E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.640 | TFLOPs: 31.25 | +7: iteration 57870/ 173500 | consumed samples: 14814720 | consumed tokens: 30340546560 | elapsed time per iteration (s): 0.43 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.988018E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.008 | TFLOPs: 31.27 | +7: iteration 57880/ 173500 | consumed samples: 14817280 | consumed tokens: 30345789440 | elapsed time per iteration (s): 0.43 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 3.009174E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.283 | TFLOPs: 31.55 | +7: iteration 57890/ 173500 | consumed samples: 14819840 | consumed tokens: 30351032320 | elapsed time per iteration (s): 0.42 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 3.006325E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.953 | TFLOPs: 31.64 | +7: iteration 57900/ 173500 | consumed samples: 14822400 | consumed tokens: 30356275200 | elapsed time per iteration (s): 0.43 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 3.006457E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.442 | TFLOPs: 31.56 | +7: iteration 57910/ 173500 | consumed samples: 14824960 | consumed tokens: 30361518080 | elapsed time per iteration (s): 0.42 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 3.004277E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.124 | TFLOPs: 31.64 | +7: iteration 57920/ 173500 | consumed samples: 14827520 | consumed tokens: 30366760960 | elapsed time per iteration (s): 0.43 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 2.999092E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.684 | TFLOPs: 31.52 | +7: iteration 57930/ 173500 | consumed samples: 14830080 | consumed tokens: 30372003840 | elapsed time per iteration (s): 0.43 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 3.006026E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.792 | TFLOPs: 31.47 | +7: iteration 57940/ 173500 | consumed samples: 14832640 | consumed tokens: 30377246720 | elapsed time per iteration (s): 0.42 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 2.999893E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.925 | TFLOPs: 31.63 | +7: iteration 57950/ 173500 | consumed samples: 14835200 | consumed tokens: 30382489600 | elapsed time per iteration (s): 0.43 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 3.009684E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.415 | TFLOPs: 31.50 | +7: iteration 57960/ 173500 | consumed samples: 14837760 | consumed tokens: 30387732480 | elapsed time per iteration (s): 0.43 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 3.006739E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.950 | TFLOPs: 31.48 | +7: iteration 57970/ 173500 | consumed samples: 14840320 | consumed tokens: 30392975360 | elapsed time per iteration (s): 0.42 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.989802E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.795 | TFLOPs: 31.73 | +7: iteration 57980/ 173500 | consumed samples: 14842880 | consumed tokens: 30398218240 | elapsed time per iteration (s): 0.42 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.993165E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.420 | TFLOPs: 31.61 | +7: iteration 57990/ 173500 | consumed samples: 14845440 | consumed tokens: 30403461120 | elapsed time per iteration (s): 0.44 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.990277E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.616 | TFLOPs: 30.46 | +0: [2023-03-17 06:04:16,393] [INFO] [logging.py:68:log_dist] [Rank 0] step=58000, skipped=0, lr=[0.00015640412143068475, 0.00015640412143068475, 0.00015640412143068475], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 58000/ 173500 | consumed samples: 14848000 | consumed tokens: 30408704000 | elapsed time per iteration (s): 0.42 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 3.000361E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.112 | TFLOPs: 31.75 | +0: steps: 58000 loss: 2.9825 iter time (s): 0.429 samples/sec: 596.864 +7: iteration 58010/ 173500 | consumed samples: 14850560 | consumed tokens: 30413946880 | elapsed time per iteration (s): 0.42 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.984805E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.376 | TFLOPs: 31.71 | +7: iteration 58020/ 173500 | consumed samples: 14853120 | consumed tokens: 30419189760 | elapsed time per iteration (s): 0.42 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 3.014100E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.597 | TFLOPs: 31.88 | +7: iteration 58030/ 173500 | consumed samples: 14855680 | consumed tokens: 30424432640 | elapsed time per iteration (s): 0.42 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.986948E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.135 | TFLOPs: 31.75 | +7: iteration 58040/ 173500 | consumed samples: 14858240 | consumed tokens: 30429675520 | elapsed time per iteration (s): 0.42 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.990222E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.906 | TFLOPs: 31.84 | +7: iteration 58050/ 173500 | consumed samples: 14860800 | consumed tokens: 30434918400 | elapsed time per iteration (s): 0.43 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 3.001106E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.408 | TFLOPs: 31.40 | +7: iteration 58060/ 173500 | consumed samples: 14863360 | consumed tokens: 30440161280 | elapsed time per iteration (s): 0.42 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 3.006212E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.990 | TFLOPs: 31.64 | +7: iteration 58070/ 173500 | consumed samples: 14865920 | consumed tokens: 30445404160 | elapsed time per iteration (s): 0.42 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.992826E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.339 | TFLOPs: 31.81 | +7: iteration 58080/ 173500 | consumed samples: 14868480 | consumed tokens: 30450647040 | elapsed time per iteration (s): 0.43 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.991996E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.185 | TFLOPs: 31.60 | +7: iteration 58090/ 173500 | consumed samples: 14871040 | consumed tokens: 30455889920 | elapsed time per iteration (s): 0.42 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.997978E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.168 | TFLOPs: 31.70 | +7: iteration 58100/ 173500 | consumed samples: 14873600 | consumed tokens: 30461132800 | elapsed time per iteration (s): 0.43 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 3.007574E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.246 | TFLOPs: 31.23 | +7: iteration 58110/ 173500 | consumed samples: 14876160 | consumed tokens: 30466375680 | elapsed time per iteration (s): 0.42 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 3.005082E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.360 | TFLOPs: 32.02 | +7: iteration 58120/ 173500 | consumed samples: 14878720 | consumed tokens: 30471618560 | elapsed time per iteration (s): 0.43 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 3.001442E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.014 | TFLOPs: 31.27 | +7: iteration 58130/ 173500 | consumed samples: 14881280 | consumed tokens: 30476861440 | elapsed time per iteration (s): 0.44 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 3.011368E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.344 | TFLOPs: 30.61 | +7: iteration 58140/ 173500 | consumed samples: 14883840 | consumed tokens: 30482104320 | elapsed time per iteration (s): 0.42 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 3.002059E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.400 | TFLOPs: 31.61 | +7: iteration 58150/ 173500 | consumed samples: 14886400 | consumed tokens: 30487347200 | elapsed time per iteration (s): 0.42 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 2.990414E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.848 | TFLOPs: 31.74 | +7: iteration 58160/ 173500 | consumed samples: 14888960 | consumed tokens: 30492590080 | elapsed time per iteration (s): 0.42 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 2.995949E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.540 | TFLOPs: 31.98 | +7: iteration 58170/ 173500 | consumed samples: 14891520 | consumed tokens: 30497832960 | elapsed time per iteration (s): 0.42 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 2.991907E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.168 | TFLOPs: 31.80 | +7: iteration 58180/ 173500 | consumed samples: 14894080 | consumed tokens: 30503075840 | elapsed time per iteration (s): 0.43 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 3.013051E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.086 | TFLOPs: 31.33 | +7: iteration 58190/ 173500 | consumed samples: 14896640 | consumed tokens: 30508318720 | elapsed time per iteration (s): 0.43 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.996849E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.302 | TFLOPs: 31.55 | +7: iteration 58200/ 173500 | consumed samples: 14899200 | consumed tokens: 30513561600 | elapsed time per iteration (s): 0.42 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.998312E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.468 | TFLOPs: 31.61 | +7: iteration 58210/ 173500 | consumed samples: 14901760 | consumed tokens: 30518804480 | elapsed time per iteration (s): 0.43 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 3.005127E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.448 | TFLOPs: 31.24 | +7: iteration 58220/ 173500 | consumed samples: 14904320 | consumed tokens: 30524047360 | elapsed time per iteration (s): 0.42 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.992013E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.846 | TFLOPs: 32.00 | +7: iteration 58230/ 173500 | consumed samples: 14906880 | consumed tokens: 30529290240 | elapsed time per iteration (s): 0.42 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.998457E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.995 | TFLOPs: 31.85 | +7: iteration 58240/ 173500 | consumed samples: 14909440 | consumed tokens: 30534533120 | elapsed time per iteration (s): 0.43 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.987702E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.915 | TFLOPs: 30.95 | +7: iteration 58250/ 173500 | consumed samples: 14912000 | consumed tokens: 30539776000 | elapsed time per iteration (s): 0.42 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 3.006674E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.415 | TFLOPs: 31.77 | +7: iteration 58260/ 173500 | consumed samples: 14914560 | consumed tokens: 30545018880 | elapsed time per iteration (s): 0.44 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 3.012778E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.323 | TFLOPs: 30.71 | +7: iteration 58270/ 173500 | consumed samples: 14917120 | consumed tokens: 30550261760 | elapsed time per iteration (s): 0.43 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 3.003822E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.038 | TFLOPs: 31.54 | +7: iteration 58280/ 173500 | consumed samples: 14919680 | consumed tokens: 30555504640 | elapsed time per iteration (s): 0.42 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 3.002551E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.875 | TFLOPs: 31.79 | +7: iteration 58290/ 173500 | consumed samples: 14922240 | consumed tokens: 30560747520 | elapsed time per iteration (s): 0.42 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 3.002513E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.654 | TFLOPs: 31.62 | +7: iteration 58300/ 173500 | consumed samples: 14924800 | consumed tokens: 30565990400 | elapsed time per iteration (s): 0.42 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 3.005736E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.384 | TFLOPs: 31.71 | +7: iteration 58310/ 173500 | consumed samples: 14927360 | consumed tokens: 30571233280 | elapsed time per iteration (s): 0.42 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 3.000577E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.616 | TFLOPs: 31.62 | +7: iteration 58320/ 173500 | consumed samples: 14929920 | consumed tokens: 30576476160 | elapsed time per iteration (s): 0.42 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 3.005280E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.661 | TFLOPs: 31.78 | +7: iteration 58330/ 173500 | consumed samples: 14932480 | consumed tokens: 30581719040 | elapsed time per iteration (s): 0.42 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 3.004729E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.768 | TFLOPs: 31.84 | +7: iteration 58340/ 173500 | consumed samples: 14935040 | consumed tokens: 30586961920 | elapsed time per iteration (s): 0.42 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.982204E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.393 | TFLOPs: 31.97 | +7: iteration 58350/ 173500 | consumed samples: 14937600 | consumed tokens: 30592204800 | elapsed time per iteration (s): 0.42 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 3.007063E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.244 | TFLOPs: 31.81 | +7: iteration 58360/ 173500 | consumed samples: 14940160 | consumed tokens: 30597447680 | elapsed time per iteration (s): 0.42 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.988513E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.026 | TFLOPs: 31.80 | +7: iteration 58370/ 173500 | consumed samples: 14942720 | consumed tokens: 30602690560 | elapsed time per iteration (s): 0.43 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 3.007747E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.112 | TFLOPs: 31.33 | +7: iteration 58380/ 173500 | consumed samples: 14945280 | consumed tokens: 30607933440 | elapsed time per iteration (s): 0.43 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.989729E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.546 | TFLOPs: 31.51 | +7: iteration 58390/ 173500 | consumed samples: 14947840 | consumed tokens: 30613176320 | elapsed time per iteration (s): 0.42 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.996383E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.452 | TFLOPs: 31.71 | +7: iteration 58400/ 173500 | consumed samples: 14950400 | consumed tokens: 30618419200 | elapsed time per iteration (s): 0.43 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.987081E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.108 | TFLOPs: 30.96 | +7: iteration 58410/ 173500 | consumed samples: 14952960 | consumed tokens: 30623662080 | elapsed time per iteration (s): 0.42 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 3.001768E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.685 | TFLOPs: 31.78 | +7: iteration 58420/ 173500 | consumed samples: 14955520 | consumed tokens: 30628904960 | elapsed time per iteration (s): 0.43 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.989842E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.202 | TFLOPs: 31.39 | +7: iteration 58430/ 173500 | consumed samples: 14958080 | consumed tokens: 30634147840 | elapsed time per iteration (s): 0.42 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.993751E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.481 | TFLOPs: 31.77 | +7: iteration 58440/ 173500 | consumed samples: 14960640 | consumed tokens: 30639390720 | elapsed time per iteration (s): 0.43 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.999377E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.842 | TFLOPs: 31.53 | +7: iteration 58450/ 173500 | consumed samples: 14963200 | consumed tokens: 30644633600 | elapsed time per iteration (s): 0.42 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.996305E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.827 | TFLOPs: 31.79 | +7: iteration 58460/ 173500 | consumed samples: 14965760 | consumed tokens: 30649876480 | elapsed time per iteration (s): 0.42 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 3.005876E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.273 | TFLOPs: 31.97 | +7: iteration 58470/ 173500 | consumed samples: 14968320 | consumed tokens: 30655119360 | elapsed time per iteration (s): 0.42 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.990178E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.688 | TFLOPs: 31.73 | +7: iteration 58480/ 173500 | consumed samples: 14970880 | consumed tokens: 30660362240 | elapsed time per iteration (s): 0.42 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 3.004841E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.980 | TFLOPs: 31.79 | +7: iteration 58490/ 173500 | consumed samples: 14973440 | consumed tokens: 30665605120 | elapsed time per iteration (s): 0.42 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.995710E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.988 | TFLOPs: 31.95 | +7: iteration 58500/ 173500 | consumed samples: 14976000 | consumed tokens: 30670848000 | elapsed time per iteration (s): 0.43 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.982399E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.892 | TFLOPs: 31.53 | +7: iteration 58510/ 173500 | consumed samples: 14978560 | consumed tokens: 30676090880 | elapsed time per iteration (s): 0.42 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.997325E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.335 | TFLOPs: 31.97 | +7: iteration 58520/ 173500 | consumed samples: 14981120 | consumed tokens: 30681333760 | elapsed time per iteration (s): 0.42 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.992479E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.477 | TFLOPs: 31.66 | +7: iteration 58530/ 173500 | consumed samples: 14983680 | consumed tokens: 30686576640 | elapsed time per iteration (s): 0.42 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.994761E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.320 | TFLOPs: 31.66 | +7: iteration 58540/ 173500 | consumed samples: 14986240 | consumed tokens: 30691819520 | elapsed time per iteration (s): 0.42 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.998378E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.381 | TFLOPs: 31.97 | +7: iteration 58550/ 173500 | consumed samples: 14988800 | consumed tokens: 30697062400 | elapsed time per iteration (s): 0.42 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 3.001706E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.977 | TFLOPs: 31.95 | +7: iteration 58560/ 173500 | consumed samples: 14991360 | consumed tokens: 30702305280 | elapsed time per iteration (s): 0.42 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 3.000397E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.128 | TFLOPs: 31.96 | +7: iteration 58570/ 173500 | consumed samples: 14993920 | consumed tokens: 30707548160 | elapsed time per iteration (s): 0.42 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.997710E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.182 | TFLOPs: 31.70 | +7: iteration 58580/ 173500 | consumed samples: 14996480 | consumed tokens: 30712791040 | elapsed time per iteration (s): 0.42 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.999693E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.862 | TFLOPs: 31.74 | +7: iteration 58590/ 173500 | consumed samples: 14999040 | consumed tokens: 30718033920 | elapsed time per iteration (s): 0.43 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 3.003325E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.218 | TFLOPs: 31.60 | +7: iteration 58600/ 173500 | consumed samples: 15001600 | consumed tokens: 30723276800 | elapsed time per iteration (s): 0.42 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.980437E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.580 | TFLOPs: 31.62 | +7: iteration 58610/ 173500 | consumed samples: 15004160 | consumed tokens: 30728519680 | elapsed time per iteration (s): 0.42 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 3.007127E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.140 | TFLOPs: 31.65 | +7: iteration 58620/ 173500 | consumed samples: 15006720 | consumed tokens: 30733762560 | elapsed time per iteration (s): 0.43 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.993938E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.898 | TFLOPs: 31.53 | +7: iteration 58630/ 173500 | consumed samples: 15009280 | consumed tokens: 30739005440 | elapsed time per iteration (s): 0.42 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.990415E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.051 | TFLOPs: 31.90 | +7: iteration 58640/ 173500 | consumed samples: 15011840 | consumed tokens: 30744248320 | elapsed time per iteration (s): 0.42 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.999951E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.309 | TFLOPs: 31.76 | +7: iteration 58650/ 173500 | consumed samples: 15014400 | consumed tokens: 30749491200 | elapsed time per iteration (s): 0.42 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.983939E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.152 | TFLOPs: 31.96 | +7: iteration 58660/ 173500 | consumed samples: 15016960 | consumed tokens: 30754734080 | elapsed time per iteration (s): 0.42 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 3.004408E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.103 | TFLOPs: 31.80 | +7: iteration 58670/ 173500 | consumed samples: 15019520 | consumed tokens: 30759976960 | elapsed time per iteration (s): 0.43 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 3.008573E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.709 | TFLOPs: 31.47 | +7: iteration 58680/ 173500 | consumed samples: 15022080 | consumed tokens: 30765219840 | elapsed time per iteration (s): 0.42 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.991574E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.846 | TFLOPs: 31.84 | +7: iteration 58690/ 173500 | consumed samples: 15024640 | consumed tokens: 30770462720 | elapsed time per iteration (s): 0.42 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.992395E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.122 | TFLOPs: 31.75 | +7: iteration 58700/ 173500 | consumed samples: 15027200 | consumed tokens: 30775705600 | elapsed time per iteration (s): 0.42 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.996533E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.571 | TFLOPs: 31.98 | +7: iteration 58710/ 173500 | consumed samples: 15029760 | consumed tokens: 30780948480 | elapsed time per iteration (s): 0.43 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.992672E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.889 | TFLOPs: 31.48 | +7: iteration 58720/ 173500 | consumed samples: 15032320 | consumed tokens: 30786191360 | elapsed time per iteration (s): 0.42 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 3.003908E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.610 | TFLOPs: 31.67 | +7: iteration 58730/ 173500 | consumed samples: 15034880 | consumed tokens: 30791434240 | elapsed time per iteration (s): 0.42 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.994863E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.900 | TFLOPs: 31.63 | +7: iteration 58740/ 173500 | consumed samples: 15037440 | consumed tokens: 30796677120 | elapsed time per iteration (s): 0.43 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.992668E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.397 | TFLOPs: 31.55 | +7: iteration 58750/ 173500 | consumed samples: 15040000 | consumed tokens: 30801920000 | elapsed time per iteration (s): 0.43 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 3.002822E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.747 | TFLOPs: 31.52 | +7: iteration 58760/ 173500 | consumed samples: 15042560 | consumed tokens: 30807162880 | elapsed time per iteration (s): 0.42 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.998668E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.437 | TFLOPs: 31.92 | +7: iteration 58770/ 173500 | consumed samples: 15045120 | consumed tokens: 30812405760 | elapsed time per iteration (s): 0.42 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.997821E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.213 | TFLOPs: 31.96 | +7: iteration 58780/ 173500 | consumed samples: 15047680 | consumed tokens: 30817648640 | elapsed time per iteration (s): 0.42 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 3.007314E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.573 | TFLOPs: 31.72 | +7: iteration 58790/ 173500 | consumed samples: 15050240 | consumed tokens: 30822891520 | elapsed time per iteration (s): 0.42 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.994145E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.002 | TFLOPs: 31.74 | +7: iteration 58800/ 173500 | consumed samples: 15052800 | consumed tokens: 30828134400 | elapsed time per iteration (s): 0.43 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.997856E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.960 | TFLOPs: 31.48 | +7: iteration 58810/ 173500 | consumed samples: 15055360 | consumed tokens: 30833377280 | elapsed time per iteration (s): 0.42 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.983604E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.213 | TFLOPs: 31.65 | +7: iteration 58820/ 173500 | consumed samples: 15057920 | consumed tokens: 30838620160 | elapsed time per iteration (s): 0.42 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 3.000313E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.506 | TFLOPs: 31.61 | +7: iteration 58830/ 173500 | consumed samples: 15060480 | consumed tokens: 30843863040 | elapsed time per iteration (s): 0.42 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 3.001591E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.458 | TFLOPs: 31.98 | +7: iteration 58840/ 173500 | consumed samples: 15063040 | consumed tokens: 30849105920 | elapsed time per iteration (s): 0.42 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 3.005508E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.149 | TFLOPs: 31.96 | +7: iteration 58850/ 173500 | consumed samples: 15065600 | consumed tokens: 30854348800 | elapsed time per iteration (s): 0.42 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 2.996356E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.876 | TFLOPs: 31.74 | +7: iteration 58860/ 173500 | consumed samples: 15068160 | consumed tokens: 30859591680 | elapsed time per iteration (s): 0.43 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 3.012591E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.687 | TFLOPs: 31.57 | +7: iteration 58870/ 173500 | consumed samples: 15070720 | consumed tokens: 30864834560 | elapsed time per iteration (s): 0.42 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 2.999872E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.276 | TFLOPs: 31.76 | +7: iteration 58880/ 173500 | consumed samples: 15073280 | consumed tokens: 30870077440 | elapsed time per iteration (s): 0.42 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 2.982151E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.491 | TFLOPs: 31.77 | +7: iteration 58890/ 173500 | consumed samples: 15075840 | consumed tokens: 30875320320 | elapsed time per iteration (s): 0.42 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.993704E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.459 | TFLOPs: 31.98 | +7: iteration 58900/ 173500 | consumed samples: 15078400 | consumed tokens: 30880563200 | elapsed time per iteration (s): 0.42 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 3.005366E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.216 | TFLOPs: 31.96 | +7: iteration 58910/ 173500 | consumed samples: 15080960 | consumed tokens: 30885806080 | elapsed time per iteration (s): 0.43 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.993316E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.886 | TFLOPs: 31.27 | +7: iteration 58920/ 173500 | consumed samples: 15083520 | consumed tokens: 30891048960 | elapsed time per iteration (s): 0.42 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.995005E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.594 | TFLOPs: 31.72 | +7: iteration 58930/ 173500 | consumed samples: 15086080 | consumed tokens: 30896291840 | elapsed time per iteration (s): 0.43 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 3.005453E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.951 | TFLOPs: 31.58 | +7: iteration 58940/ 173500 | consumed samples: 15088640 | consumed tokens: 30901534720 | elapsed time per iteration (s): 0.42 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.994753E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.595 | TFLOPs: 31.72 | +7: iteration 58950/ 173500 | consumed samples: 15091200 | consumed tokens: 30906777600 | elapsed time per iteration (s): 0.42 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.995133E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.001 | TFLOPs: 31.95 | +7: iteration 58960/ 173500 | consumed samples: 15093760 | consumed tokens: 30912020480 | elapsed time per iteration (s): 0.42 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 3.003347E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.435 | TFLOPs: 31.87 | +7: iteration 58970/ 173500 | consumed samples: 15096320 | consumed tokens: 30917263360 | elapsed time per iteration (s): 0.42 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.991340E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.505 | TFLOPs: 31.82 | +7: iteration 58980/ 173500 | consumed samples: 15098880 | consumed tokens: 30922506240 | elapsed time per iteration (s): 0.42 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 3.009594E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.826 | TFLOPs: 31.73 | +7: iteration 58990/ 173500 | consumed samples: 15101440 | consumed tokens: 30927749120 | elapsed time per iteration (s): 0.42 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 3.013365E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.353 | TFLOPs: 31.92 | +7: iteration 59000/ 173500 | consumed samples: 15104000 | consumed tokens: 30932992000 | elapsed time per iteration (s): 0.43 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.997956E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.706 | TFLOPs: 31.52 | +7: iteration 59010/ 173500 | consumed samples: 15106560 | consumed tokens: 30938234880 | elapsed time per iteration (s): 0.42 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 3.020751E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.770 | TFLOPs: 31.94 | +7: iteration 59020/ 173500 | consumed samples: 15109120 | consumed tokens: 30943477760 | elapsed time per iteration (s): 0.42 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.995036E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.807 | TFLOPs: 31.94 | +7: iteration 59030/ 173500 | consumed samples: 15111680 | consumed tokens: 30948720640 | elapsed time per iteration (s): 0.42 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.989486E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.797 | TFLOPs: 31.94 | +7: iteration 59040/ 173500 | consumed samples: 15114240 | consumed tokens: 30953963520 | elapsed time per iteration (s): 0.42 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.989857E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.635 | TFLOPs: 31.93 | +7: iteration 59050/ 173500 | consumed samples: 15116800 | consumed tokens: 30959206400 | elapsed time per iteration (s): 0.42 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.977460E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.791 | TFLOPs: 31.94 | +7: iteration 59060/ 173500 | consumed samples: 15119360 | consumed tokens: 30964449280 | elapsed time per iteration (s): 0.43 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 3.006634E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.461 | TFLOPs: 31.56 | +7: iteration 59070/ 173500 | consumed samples: 15121920 | consumed tokens: 30969692160 | elapsed time per iteration (s): 0.42 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 3.010614E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.014 | TFLOPs: 31.95 | +7: iteration 59080/ 173500 | consumed samples: 15124480 | consumed tokens: 30974935040 | elapsed time per iteration (s): 0.42 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.995166E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.661 | TFLOPs: 31.73 | +7: iteration 59090/ 173500 | consumed samples: 15127040 | consumed tokens: 30980177920 | elapsed time per iteration (s): 0.42 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 3.003334E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.475 | TFLOPs: 31.72 | +7: iteration 59100/ 173500 | consumed samples: 15129600 | consumed tokens: 30985420800 | elapsed time per iteration (s): 0.42 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.973853E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.654 | TFLOPs: 31.94 | +7: iteration 59110/ 173500 | consumed samples: 15132160 | consumed tokens: 30990663680 | elapsed time per iteration (s): 0.42 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.988094E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.988 | TFLOPs: 31.95 | +7: iteration 59120/ 173500 | consumed samples: 15134720 | consumed tokens: 30995906560 | elapsed time per iteration (s): 0.42 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 3.002064E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.936 | TFLOPs: 31.95 | +7: iteration 59130/ 173500 | consumed samples: 15137280 | consumed tokens: 31001149440 | elapsed time per iteration (s): 0.44 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 3.012305E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.248 | TFLOPs: 30.39 | +7: iteration 59140/ 173500 | consumed samples: 15139840 | consumed tokens: 31006392320 | elapsed time per iteration (s): 0.44 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.994644E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.284 | TFLOPs: 30.55 | +7: iteration 59150/ 173500 | consumed samples: 15142400 | consumed tokens: 31011635200 | elapsed time per iteration (s): 0.42 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.991114E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.319 | TFLOPs: 31.81 | +7: iteration 59160/ 173500 | consumed samples: 15144960 | consumed tokens: 31016878080 | elapsed time per iteration (s): 0.42 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 3.002694E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.611 | TFLOPs: 31.62 | +7: iteration 59170/ 173500 | consumed samples: 15147520 | consumed tokens: 31022120960 | elapsed time per iteration (s): 0.42 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 2.990689E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.841 | TFLOPs: 31.63 | +7: iteration 59180/ 173500 | consumed samples: 15150080 | consumed tokens: 31027363840 | elapsed time per iteration (s): 0.42 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 3.005484E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.227 | TFLOPs: 31.70 | +7: iteration 59190/ 173500 | consumed samples: 15152640 | consumed tokens: 31032606720 | elapsed time per iteration (s): 0.42 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 3.004685E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.898 | TFLOPs: 31.95 | +7: iteration 59200/ 173500 | consumed samples: 15155200 | consumed tokens: 31037849600 | elapsed time per iteration (s): 0.43 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 3.001422E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.865 | TFLOPs: 31.47 | +7: iteration 59210/ 173500 | consumed samples: 15157760 | consumed tokens: 31043092480 | elapsed time per iteration (s): 0.43 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 2.988546E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.184 | TFLOPs: 31.28 | +7: iteration 59220/ 173500 | consumed samples: 15160320 | consumed tokens: 31048335360 | elapsed time per iteration (s): 0.42 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 2.990756E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.001 | TFLOPs: 31.95 | +7: iteration 59230/ 173500 | consumed samples: 15162880 | consumed tokens: 31053578240 | elapsed time per iteration (s): 0.42 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 3.005788E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.127 | TFLOPs: 31.96 | +7: iteration 59240/ 173500 | consumed samples: 15165440 | consumed tokens: 31058821120 | elapsed time per iteration (s): 0.43 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.998937E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.307 | TFLOPs: 31.60 | +7: iteration 59250/ 173500 | consumed samples: 15168000 | consumed tokens: 31064064000 | elapsed time per iteration (s): 0.42 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.997000E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.461 | TFLOPs: 31.82 | +7: iteration 59260/ 173500 | consumed samples: 15170560 | consumed tokens: 31069306880 | elapsed time per iteration (s): 0.42 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.989785E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.976 | TFLOPs: 31.79 | +7: iteration 59270/ 173500 | consumed samples: 15173120 | consumed tokens: 31074549760 | elapsed time per iteration (s): 0.42 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 3.012835E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.182 | TFLOPs: 31.91 | +7: iteration 59280/ 173500 | consumed samples: 15175680 | consumed tokens: 31079792640 | elapsed time per iteration (s): 0.42 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 3.012461E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.301 | TFLOPs: 31.65 | +7: iteration 59290/ 173500 | consumed samples: 15178240 | consumed tokens: 31085035520 | elapsed time per iteration (s): 0.43 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.990117E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.237 | TFLOPs: 31.60 | +7: iteration 59300/ 173500 | consumed samples: 15180800 | consumed tokens: 31090278400 | elapsed time per iteration (s): 0.44 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 3.004407E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.522 | TFLOPs: 30.56 | +7: iteration 59310/ 173500 | consumed samples: 15183360 | consumed tokens: 31095521280 | elapsed time per iteration (s): 0.45 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.991763E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.247 | TFLOPs: 30.13 | +7: iteration 59320/ 173500 | consumed samples: 15185920 | consumed tokens: 31100764160 | elapsed time per iteration (s): 0.42 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.986044E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.986 | TFLOPs: 32.01 | +7: iteration 59330/ 173500 | consumed samples: 15188480 | consumed tokens: 31106007040 | elapsed time per iteration (s): 0.42 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 3.010521E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.067 | TFLOPs: 31.96 | +7: iteration 59340/ 173500 | consumed samples: 15191040 | consumed tokens: 31111249920 | elapsed time per iteration (s): 0.42 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.989300E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.656 | TFLOPs: 31.94 | +7: iteration 59350/ 173500 | consumed samples: 15193600 | consumed tokens: 31116492800 | elapsed time per iteration (s): 0.43 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.999477E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.007 | TFLOPs: 31.38 | +7: iteration 59360/ 173500 | consumed samples: 15196160 | consumed tokens: 31121735680 | elapsed time per iteration (s): 0.42 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.993742E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.046 | TFLOPs: 31.96 | +7: iteration 59370/ 173500 | consumed samples: 15198720 | consumed tokens: 31126978560 | elapsed time per iteration (s): 0.42 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 3.002283E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.170 | TFLOPs: 31.96 | +7: iteration 59380/ 173500 | consumed samples: 15201280 | consumed tokens: 31132221440 | elapsed time per iteration (s): 0.42 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.991020E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.743 | TFLOPs: 31.62 | +7: iteration 59390/ 173500 | consumed samples: 15203840 | consumed tokens: 31137464320 | elapsed time per iteration (s): 0.42 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.979809E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.455 | TFLOPs: 31.92 | +7: iteration 59400/ 173500 | consumed samples: 15206400 | consumed tokens: 31142707200 | elapsed time per iteration (s): 0.42 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.995261E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.699 | TFLOPs: 31.94 | +7: iteration 59410/ 173500 | consumed samples: 15208960 | consumed tokens: 31147950080 | elapsed time per iteration (s): 0.42 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 3.002352E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.529 | TFLOPs: 31.93 | +7: iteration 59420/ 173500 | consumed samples: 15211520 | consumed tokens: 31153192960 | elapsed time per iteration (s): 0.42 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.999658E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.878 | TFLOPs: 31.74 | +7: iteration 59430/ 173500 | consumed samples: 15214080 | consumed tokens: 31158435840 | elapsed time per iteration (s): 0.42 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 3.007507E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.255 | TFLOPs: 31.97 | +7: iteration 59440/ 173500 | consumed samples: 15216640 | consumed tokens: 31163678720 | elapsed time per iteration (s): 0.42 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 3.005619E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.716 | TFLOPs: 31.94 | +7: iteration 59450/ 173500 | consumed samples: 15219200 | consumed tokens: 31168921600 | elapsed time per iteration (s): 0.42 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.988431E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.694 | TFLOPs: 31.94 | +7: iteration 59460/ 173500 | consumed samples: 15221760 | consumed tokens: 31174164480 | elapsed time per iteration (s): 0.42 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.993212E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.569 | TFLOPs: 31.93 | +7: iteration 59470/ 173500 | consumed samples: 15224320 | consumed tokens: 31179407360 | elapsed time per iteration (s): 0.42 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 3.010206E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.396 | TFLOPs: 31.92 | +7: iteration 59480/ 173500 | consumed samples: 15226880 | consumed tokens: 31184650240 | elapsed time per iteration (s): 0.42 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 3.000314E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.738 | TFLOPs: 31.94 | +7: iteration 59490/ 173500 | consumed samples: 15229440 | consumed tokens: 31189893120 | elapsed time per iteration (s): 0.42 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.989519E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.722 | TFLOPs: 31.78 | +7: iteration 59500/ 173500 | consumed samples: 15232000 | consumed tokens: 31195136000 | elapsed time per iteration (s): 0.42 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 3.004898E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.860 | TFLOPs: 31.95 | +7: iteration 59510/ 173500 | consumed samples: 15234560 | consumed tokens: 31200378880 | elapsed time per iteration (s): 0.42 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.995948E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.783 | TFLOPs: 31.68 | +7: iteration 59520/ 173500 | consumed samples: 15237120 | consumed tokens: 31205621760 | elapsed time per iteration (s): 0.42 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.993435E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.827 | TFLOPs: 31.84 | +7: iteration 59530/ 173500 | consumed samples: 15239680 | consumed tokens: 31210864640 | elapsed time per iteration (s): 0.42 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.990416E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.988 | TFLOPs: 31.80 | +7: iteration 59540/ 173500 | consumed samples: 15242240 | consumed tokens: 31216107520 | elapsed time per iteration (s): 0.42 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.999732E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.527 | TFLOPs: 31.93 | +7: iteration 59550/ 173500 | consumed samples: 15244800 | consumed tokens: 31221350400 | elapsed time per iteration (s): 0.42 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 3.005789E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.196 | TFLOPs: 31.91 | +7: iteration 59560/ 173500 | consumed samples: 15247360 | consumed tokens: 31226593280 | elapsed time per iteration (s): 0.42 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.989219E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.873 | TFLOPs: 31.74 | +7: iteration 59570/ 173500 | consumed samples: 15249920 | consumed tokens: 31231836160 | elapsed time per iteration (s): 0.42 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 3.008400E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.014 | TFLOPs: 31.64 | +7: iteration 59580/ 173500 | consumed samples: 15252480 | consumed tokens: 31237079040 | elapsed time per iteration (s): 0.42 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 3.008911E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.355 | TFLOPs: 31.92 | +7: iteration 59590/ 173500 | consumed samples: 15255040 | consumed tokens: 31242321920 | elapsed time per iteration (s): 0.42 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 3.002380E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.207 | TFLOPs: 31.91 | +7: iteration 59600/ 173500 | consumed samples: 15257600 | consumed tokens: 31247564800 | elapsed time per iteration (s): 0.42 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.984344E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.731 | TFLOPs: 31.89 | +7: iteration 59610/ 173500 | consumed samples: 15260160 | consumed tokens: 31252807680 | elapsed time per iteration (s): 0.43 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.989381E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.848 | TFLOPs: 31.58 | +7: iteration 59620/ 173500 | consumed samples: 15262720 | consumed tokens: 31258050560 | elapsed time per iteration (s): 0.42 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.994998E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.245 | TFLOPs: 31.91 | +7: iteration 59630/ 173500 | consumed samples: 15265280 | consumed tokens: 31263293440 | elapsed time per iteration (s): 0.42 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.993318E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.332 | TFLOPs: 31.76 | +7: iteration 59640/ 173500 | consumed samples: 15267840 | consumed tokens: 31268536320 | elapsed time per iteration (s): 0.42 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 3.003729E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.621 | TFLOPs: 31.88 | +7: iteration 59650/ 173500 | consumed samples: 15270400 | consumed tokens: 31273779200 | elapsed time per iteration (s): 0.42 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.995305E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.183 | TFLOPs: 31.70 | +7: iteration 59660/ 173500 | consumed samples: 15272960 | consumed tokens: 31279022080 | elapsed time per iteration (s): 0.42 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.992594E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.369 | TFLOPs: 31.92 | +7: iteration 59670/ 173500 | consumed samples: 15275520 | consumed tokens: 31284264960 | elapsed time per iteration (s): 0.42 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.994808E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.310 | TFLOPs: 31.92 | +7: iteration 59680/ 173500 | consumed samples: 15278080 | consumed tokens: 31289507840 | elapsed time per iteration (s): 0.42 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.987482E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.642 | TFLOPs: 31.62 | +7: iteration 59690/ 173500 | consumed samples: 15280640 | consumed tokens: 31294750720 | elapsed time per iteration (s): 0.42 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 3.001904E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.344 | TFLOPs: 31.76 | +7: iteration 59700/ 173500 | consumed samples: 15283200 | consumed tokens: 31299993600 | elapsed time per iteration (s): 0.42 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.999209E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.641 | TFLOPs: 31.78 | +7: iteration 59710/ 173500 | consumed samples: 15285760 | consumed tokens: 31305236480 | elapsed time per iteration (s): 0.42 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.980240E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.478 | TFLOPs: 31.93 | +7: iteration 59720/ 173500 | consumed samples: 15288320 | consumed tokens: 31310479360 | elapsed time per iteration (s): 0.42 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 3.014077E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.523 | TFLOPs: 31.72 | +7: iteration 59730/ 173500 | consumed samples: 15290880 | consumed tokens: 31315722240 | elapsed time per iteration (s): 0.42 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 3.001007E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.695 | TFLOPs: 31.73 | +7: iteration 59740/ 173500 | consumed samples: 15293440 | consumed tokens: 31320965120 | elapsed time per iteration (s): 0.42 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.998023E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.214 | TFLOPs: 31.91 | +7: iteration 59750/ 173500 | consumed samples: 15296000 | consumed tokens: 31326208000 | elapsed time per iteration (s): 0.42 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.982370E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.022 | TFLOPs: 31.90 | +7: iteration 59760/ 173500 | consumed samples: 15298560 | consumed tokens: 31331450880 | elapsed time per iteration (s): 0.42 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 3.009775E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.887 | TFLOPs: 31.89 | +7: iteration 59770/ 173500 | consumed samples: 15301120 | consumed tokens: 31336693760 | elapsed time per iteration (s): 0.42 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.988888E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.897 | TFLOPs: 31.90 | +7: iteration 59780/ 173500 | consumed samples: 15303680 | consumed tokens: 31341936640 | elapsed time per iteration (s): 0.43 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 3.004369E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.193 | TFLOPs: 31.60 | +7: iteration 59790/ 173500 | consumed samples: 15306240 | consumed tokens: 31347179520 | elapsed time per iteration (s): 0.42 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 3.005031E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.810 | TFLOPs: 31.89 | +7: iteration 59800/ 173500 | consumed samples: 15308800 | consumed tokens: 31352422400 | elapsed time per iteration (s): 0.42 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.996113E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.251 | TFLOPs: 31.70 | +7: iteration 59810/ 173500 | consumed samples: 15311360 | consumed tokens: 31357665280 | elapsed time per iteration (s): 0.42 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.990257E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.712 | TFLOPs: 31.78 | +7: iteration 59820/ 173500 | consumed samples: 15313920 | consumed tokens: 31362908160 | elapsed time per iteration (s): 0.43 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.983807E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.809 | TFLOPs: 31.42 | +7: iteration 59830/ 173500 | consumed samples: 15316480 | consumed tokens: 31368151040 | elapsed time per iteration (s): 0.42 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.990643E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.149 | TFLOPs: 31.75 | +7: iteration 59840/ 173500 | consumed samples: 15319040 | consumed tokens: 31373393920 | elapsed time per iteration (s): 0.42 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.991208E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.888 | TFLOPs: 31.69 | +7: iteration 59850/ 173500 | consumed samples: 15321600 | consumed tokens: 31378636800 | elapsed time per iteration (s): 0.42 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.991491E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.841 | TFLOPs: 31.74 | +7: iteration 59860/ 173500 | consumed samples: 15324160 | consumed tokens: 31383879680 | elapsed time per iteration (s): 0.42 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.994868E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.102 | TFLOPs: 31.91 | +7: iteration 59870/ 173500 | consumed samples: 15326720 | consumed tokens: 31389122560 | elapsed time per iteration (s): 0.42 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.991869E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.194 | TFLOPs: 31.70 | +7: iteration 59880/ 173500 | consumed samples: 15329280 | consumed tokens: 31394365440 | elapsed time per iteration (s): 0.42 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.997724E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.288 | TFLOPs: 31.65 | +7: iteration 59890/ 173500 | consumed samples: 15331840 | consumed tokens: 31399608320 | elapsed time per iteration (s): 0.42 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 3.003075E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.724 | TFLOPs: 31.94 | +7: iteration 59900/ 173500 | consumed samples: 15334400 | consumed tokens: 31404851200 | elapsed time per iteration (s): 0.42 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.997164E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.178 | TFLOPs: 31.91 | +7: iteration 59910/ 173500 | consumed samples: 15336960 | consumed tokens: 31410094080 | elapsed time per iteration (s): 0.42 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.996667E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.482 | TFLOPs: 31.93 | +7: iteration 59920/ 173500 | consumed samples: 15339520 | consumed tokens: 31415336960 | elapsed time per iteration (s): 0.42 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.996597E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.740 | TFLOPs: 31.89 | +7: iteration 59930/ 173500 | consumed samples: 15342080 | consumed tokens: 31420579840 | elapsed time per iteration (s): 0.42 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.992559E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.028 | TFLOPs: 31.90 | +7: iteration 59940/ 173500 | consumed samples: 15344640 | consumed tokens: 31425822720 | elapsed time per iteration (s): 0.43 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.983533E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.583 | TFLOPs: 31.09 | +7: iteration 59950/ 173500 | consumed samples: 15347200 | consumed tokens: 31431065600 | elapsed time per iteration (s): 0.42 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.987964E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.557 | TFLOPs: 31.93 | +7: iteration 59960/ 173500 | consumed samples: 15349760 | consumed tokens: 31436308480 | elapsed time per iteration (s): 0.42 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.985895E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.230 | TFLOPs: 31.91 | +7: iteration 59970/ 173500 | consumed samples: 15352320 | consumed tokens: 31441551360 | elapsed time per iteration (s): 0.42 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 3.006964E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.273 | TFLOPs: 31.92 | +7: iteration 59980/ 173500 | consumed samples: 15354880 | consumed tokens: 31446794240 | elapsed time per iteration (s): 0.42 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 3.000892E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.055 | TFLOPs: 31.90 | +7: iteration 59990/ 173500 | consumed samples: 15357440 | consumed tokens: 31452037120 | elapsed time per iteration (s): 0.42 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 3.013863E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.262 | TFLOPs: 31.86 | +0: [2023-03-17 06:18:23,437] [INFO] [logging.py:68:log_dist] [Rank 0] step=60000, skipped=0, lr=[0.00015355285563304073, 0.00015355285563304073, 0.00015355285563304073], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 60000/ 173500 | consumed samples: 15360000 | consumed tokens: 31457280000 | elapsed time per iteration (s): 0.42 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 3.005452E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.102 | TFLOPs: 31.85 | +0: steps: 60000 loss: 3.0404 iter time (s): 0.421 samples/sec: 607.403 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 60000 | lm loss value: 3.284496E+00 | lm loss PPL: 2.669553E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 60000 to checkpoints_221m91b400m +0: [2023-03-17 06:18:23,604] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step60000 is begin to save! +0: [2023-03-17 06:18:23,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_01-model_00-model_states.pt... +0: [2023-03-17 06:18:23,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_01-model_00-model_states.pt. +0: [2023-03-17 06:18:23,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_03-model_00-model_states.pt... +0: [2023-03-17 06:18:23,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_03-model_00-model_states.pt. +0: [2023-03-17 06:18:23,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_04-model_00-model_states.pt... +0: [2023-03-17 06:18:23,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_04-model_00-model_states.pt. +0: [2023-03-17 06:18:23,794] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_05-model_00-model_states.pt... +0: [2023-03-17 06:18:23,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_05-model_00-model_states.pt. +0: [2023-03-17 06:18:23,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_06-model_00-model_states.pt... +0: [2023-03-17 06:18:23,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_06-model_00-model_states.pt. +0: [2023-03-17 06:18:23,846] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_07-model_00-model_states.pt... +0: [2023-03-17 06:18:23,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_07-model_00-model_states.pt. +0: [2023-03-17 06:18:23,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_08-model_00-model_states.pt... +0: [2023-03-17 06:18:23,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_08-model_00-model_states.pt. +0: [2023-03-17 06:18:23,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_09-model_00-model_states.pt... +0: [2023-03-17 06:18:23,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_09-model_00-model_states.pt. +0: [2023-03-17 06:18:23,920] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_10-model_00-model_states.pt... +0: [2023-03-17 06:18:23,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_10-model_00-model_states.pt. +0: [2023-03-17 06:18:23,945] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_11-model_00-model_states.pt... +0: [2023-03-17 06:18:23,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_11-model_00-model_states.pt. +0: [2023-03-17 06:18:23,971] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_12-model_00-model_states.pt... +0: [2023-03-17 06:18:23,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_12-model_00-model_states.pt. +0: [2023-03-17 06:18:23,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_13-model_00-model_states.pt... +0: [2023-03-17 06:18:24,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_13-model_00-model_states.pt. +0: [2023-03-17 06:18:24,022] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_14-model_00-model_states.pt... +0: [2023-03-17 06:18:24,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_14-model_00-model_states.pt. +0: [2023-03-17 06:18:24,047] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_15-model_00-model_states.pt... +0: [2023-03-17 06:18:24,072] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_15-model_00-model_states.pt. +0: [2023-03-17 06:18:24,072] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_16-model_00-model_states.pt... +0: [2023-03-17 06:18:24,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_16-model_00-model_states.pt. +0: [2023-03-17 06:18:24,098] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_17-model_00-model_states.pt... +0: [2023-03-17 06:18:24,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_17-model_00-model_states.pt. +0: [2023-03-17 06:18:24,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_18-model_00-model_states.pt... +0: [2023-03-17 06:18:24,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_18-model_00-model_states.pt. +0: [2023-03-17 06:18:24,150] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_19-model_00-model_states.pt... +0: [2023-03-17 06:18:24,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_19-model_00-model_states.pt. +0: [2023-03-17 06:18:24,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_20-model_00-model_states.pt... +0: [2023-03-17 06:18:24,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_20-model_00-model_states.pt. +0: [2023-03-17 06:18:24,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/layer_22-model_00-model_states.pt... +0: [2023-03-17 06:18:24,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/layer_22-model_00-model_states.pt. +0: [2023-03-17 06:18:24,204] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step60000/mp_rank_00_model_states.pt +0: [2023-03-17 06:18:24,204] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/mp_rank_00_model_states.pt... +0: [2023-03-17 06:18:24,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/mp_rank_00_model_states.pt. +0: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +7: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +2: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +4: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +5: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +0: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +1: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +7: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +6: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +2: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +4: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +3: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +1: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +6: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +4: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +5: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 06:18:24,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 06:18:24,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 06:18:24,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 06:18:24,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +5: [2023-03-17 06:18:24,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 06:18:24,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 06:18:24,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +2: [2023-03-17 06:18:24,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 06:18:24,301] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 06:18:24,301] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +2: [2023-03-17 06:18:24,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 06:18:24,302] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 06:18:24,302] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +2: [2023-03-17 06:18:24,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 06:18:24,302] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 06:18:24,302] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +2: [2023-03-17 06:18:24,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 06:18:24,302] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 06:18:24,302] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +2: [2023-03-17 06:18:24,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 06:18:24,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 06:18:24,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 06:18:24,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 06:18:24,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 06:18:24,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 06:18:24,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 06:18:24,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 06:18:24,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +2: [2023-03-17 06:18:24,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +2: [2023-03-17 06:18:24,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +2: [2023-03-17 06:18:24,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +5: [2023-03-17 06:18:24,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 06:18:24,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 06:18:24,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +5: [2023-03-17 06:18:24,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 06:18:24,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 06:18:24,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +5: [2023-03-17 06:18:24,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 06:18:24,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 06:18:24,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +5: [2023-03-17 06:18:24,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 06:18:24,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 06:18:24,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +5: [2023-03-17 06:18:24,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 06:18:24,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 06:18:24,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +5: [2023-03-17 06:18:24,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 06:18:24,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 06:18:24,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +0: [2023-03-17 06:18:24,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 06:18:24,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 06:18:24,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +0: [2023-03-17 06:18:24,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 06:18:24,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 06:18:24,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +0: [2023-03-17 06:18:24,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 06:18:24,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 06:18:24,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +0: [2023-03-17 06:18:24,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 06:18:24,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 06:18:24,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 06:18:24,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 06:18:24,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 06:18:24,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 06:18:24,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 06:18:24,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 06:18:24,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 06:18:24,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 06:18:24,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 06:18:24,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +0: [2023-03-17 06:18:24,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 06:18:24,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 06:18:24,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 06:18:24,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +0: [2023-03-17 06:18:24,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 06:18:24,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 06:18:24,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +0: [2023-03-17 06:18:24,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 06:18:24,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 06:18:24,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +6: [2023-03-17 06:18:24,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 06:18:24,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 06:18:24,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 06:18:24,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 06:18:24,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 06:18:24,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 06:18:24,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 06:18:24,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 06:18:24,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +6: [2023-03-17 06:18:24,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +6: [2023-03-17 06:18:24,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +6: [2023-03-17 06:18:24,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +6: [2023-03-17 06:18:24,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 06:18:24,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 06:18:24,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +6: [2023-03-17 06:18:24,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 06:18:24,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 06:18:24,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 06:18:24,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 06:18:24,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 06:18:24,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 06:18:24,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +6: [2023-03-17 06:18:24,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +6: [2023-03-17 06:18:24,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 06:18:24,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 06:18:24,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 06:18:24,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 06:18:24,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 06:18:24,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 06:18:24,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 06:18:24,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 06:18:24,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +7: [2023-03-17 06:18:24,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +3: [2023-03-17 06:18:24,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 06:18:24,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 06:18:24,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 06:18:24,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 06:18:24,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 06:18:24,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 06:18:24,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 06:18:24,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 06:18:24,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 06:18:24,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 06:18:24,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 06:18:24,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 06:18:24,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 06:18:24,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 06:18:24,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 06:18:24,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 06:18:24,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +3: [2023-03-17 06:18:24,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +3: [2023-03-17 06:18:24,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +3: [2023-03-17 06:18:24,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +3: [2023-03-17 06:18:24,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +3: [2023-03-17 06:18:24,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +3: [2023-03-17 06:18:24,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +3: [2023-03-17 06:18:24,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +0: [2023-03-17 06:18:24,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 06:18:24,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 06:18:24,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 06:18:24,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 06:18:24,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +4: [2023-03-17 06:18:24,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +4: [2023-03-17 06:18:24,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 06:18:24,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 06:18:24,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 06:18:24,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +4: [2023-03-17 06:18:24,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! +0: successfully saved checkpoint at iteration 60000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 764.30 +7: iteration 60010/ 173500 | consumed samples: 15362560 | consumed tokens: 31462522880 | elapsed time per iteration (s): 0.52 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 3.008154E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 492.166 | TFLOPs: 25.82 | +7: iteration 60020/ 173500 | consumed samples: 15365120 | consumed tokens: 31467765760 | elapsed time per iteration (s): 0.43 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 2.988671E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.161 | TFLOPs: 31.44 | +7: iteration 60030/ 173500 | consumed samples: 15367680 | consumed tokens: 31473008640 | elapsed time per iteration (s): 0.42 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 2.997441E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.927 | TFLOPs: 31.84 | +7: iteration 60040/ 173500 | consumed samples: 15370240 | consumed tokens: 31478251520 | elapsed time per iteration (s): 0.43 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 2.997661E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.066 | TFLOPs: 31.27 | +7: iteration 60050/ 173500 | consumed samples: 15372800 | consumed tokens: 31483494400 | elapsed time per iteration (s): 0.42 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 3.001747E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.283 | TFLOPs: 31.65 | +7: iteration 60060/ 173500 | consumed samples: 15375360 | consumed tokens: 31488737280 | elapsed time per iteration (s): 0.42 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 2.968195E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.101 | TFLOPs: 31.70 | +7: iteration 60070/ 173500 | consumed samples: 15377920 | consumed tokens: 31493980160 | elapsed time per iteration (s): 0.43 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 3.000923E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.711 | TFLOPs: 31.41 | +7: iteration 60080/ 173500 | consumed samples: 15380480 | consumed tokens: 31499223040 | elapsed time per iteration (s): 0.42 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.989657E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.576 | TFLOPs: 31.83 | +7: iteration 60090/ 173500 | consumed samples: 15383040 | consumed tokens: 31504465920 | elapsed time per iteration (s): 0.43 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.999512E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.385 | TFLOPs: 31.34 | +7: iteration 60100/ 173500 | consumed samples: 15385600 | consumed tokens: 31509708800 | elapsed time per iteration (s): 0.42 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.975193E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.672 | TFLOPs: 31.94 | +7: iteration 60110/ 173500 | consumed samples: 15388160 | consumed tokens: 31514951680 | elapsed time per iteration (s): 0.44 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.990228E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.719 | TFLOPs: 30.84 | +7: iteration 60120/ 173500 | consumed samples: 15390720 | consumed tokens: 31520194560 | elapsed time per iteration (s): 0.44 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.989356E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.742 | TFLOPs: 30.52 | +7: iteration 60130/ 173500 | consumed samples: 15393280 | consumed tokens: 31525437440 | elapsed time per iteration (s): 0.43 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.991094E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.179 | TFLOPs: 30.91 | +7: iteration 60140/ 173500 | consumed samples: 15395840 | consumed tokens: 31530680320 | elapsed time per iteration (s): 0.43 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 3.003370E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.909 | TFLOPs: 31.27 | +7: iteration 60150/ 173500 | consumed samples: 15398400 | consumed tokens: 31535923200 | elapsed time per iteration (s): 0.42 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.985504E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.622 | TFLOPs: 32.04 | +7: iteration 60160/ 173500 | consumed samples: 15400960 | consumed tokens: 31541166080 | elapsed time per iteration (s): 0.44 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.994987E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.844 | TFLOPs: 30.58 | +7: iteration 60170/ 173500 | consumed samples: 15403520 | consumed tokens: 31546408960 | elapsed time per iteration (s): 0.44 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.982793E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.667 | TFLOPs: 30.83 | +7: iteration 60180/ 173500 | consumed samples: 15406080 | consumed tokens: 31551651840 | elapsed time per iteration (s): 0.44 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.995367E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.164 | TFLOPs: 30.81 | +7: iteration 60190/ 173500 | consumed samples: 15408640 | consumed tokens: 31556894720 | elapsed time per iteration (s): 0.43 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.994363E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.762 | TFLOPs: 31.52 | +7: iteration 60200/ 173500 | consumed samples: 15411200 | consumed tokens: 31562137600 | elapsed time per iteration (s): 0.44 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 3.011287E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.108 | TFLOPs: 30.38 | +7: iteration 60210/ 173500 | consumed samples: 15413760 | consumed tokens: 31567380480 | elapsed time per iteration (s): 0.44 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.997626E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.478 | TFLOPs: 30.61 | +7: iteration 60220/ 173500 | consumed samples: 15416320 | consumed tokens: 31572623360 | elapsed time per iteration (s): 0.42 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.993324E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.447 | TFLOPs: 31.66 | +7: iteration 60230/ 173500 | consumed samples: 15418880 | consumed tokens: 31577866240 | elapsed time per iteration (s): 0.44 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.989472E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.164 | TFLOPs: 30.65 | +7: iteration 60240/ 173500 | consumed samples: 15421440 | consumed tokens: 31583109120 | elapsed time per iteration (s): 0.43 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.976203E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.743 | TFLOPs: 31.05 | +7: iteration 60250/ 173500 | consumed samples: 15424000 | consumed tokens: 31588352000 | elapsed time per iteration (s): 0.45 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.996391E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.917 | TFLOPs: 29.54 | +7: iteration 60260/ 173500 | consumed samples: 15426560 | consumed tokens: 31593594880 | elapsed time per iteration (s): 0.45 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.991850E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.976 | TFLOPs: 29.64 | +7: iteration 60270/ 173500 | consumed samples: 15429120 | consumed tokens: 31598837760 | elapsed time per iteration (s): 0.46 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.993315E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.877 | TFLOPs: 29.38 | +7: iteration 60280/ 173500 | consumed samples: 15431680 | consumed tokens: 31604080640 | elapsed time per iteration (s): 0.43 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.999183E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.047 | TFLOPs: 30.91 | +7: iteration 60290/ 173500 | consumed samples: 15434240 | consumed tokens: 31609323520 | elapsed time per iteration (s): 0.44 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.998195E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.290 | TFLOPs: 30.18 | +7: iteration 60300/ 173500 | consumed samples: 15436800 | consumed tokens: 31614566400 | elapsed time per iteration (s): 0.46 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.990070E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.244 | TFLOPs: 29.40 | +7: iteration 60310/ 173500 | consumed samples: 15439360 | consumed tokens: 31619809280 | elapsed time per iteration (s): 0.45 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.989610E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.119 | TFLOPs: 30.12 | +7: iteration 60320/ 173500 | consumed samples: 15441920 | consumed tokens: 31625052160 | elapsed time per iteration (s): 0.42 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 3.007294E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.383 | TFLOPs: 31.92 | +7: iteration 60330/ 173500 | consumed samples: 15444480 | consumed tokens: 31630295040 | elapsed time per iteration (s): 0.43 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.983284E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.323 | TFLOPs: 31.29 | +7: iteration 60340/ 173500 | consumed samples: 15447040 | consumed tokens: 31635537920 | elapsed time per iteration (s): 0.45 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 3.000375E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.749 | TFLOPs: 30.10 | +7: iteration 60350/ 173500 | consumed samples: 15449600 | consumed tokens: 31640780800 | elapsed time per iteration (s): 0.44 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.994297E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.045 | TFLOPs: 30.75 | +7: iteration 60360/ 173500 | consumed samples: 15452160 | consumed tokens: 31646023680 | elapsed time per iteration (s): 0.42 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.987115E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.945 | TFLOPs: 32.16 | +7: iteration 60370/ 173500 | consumed samples: 15454720 | consumed tokens: 31651266560 | elapsed time per iteration (s): 0.42 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.989459E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.022 | TFLOPs: 32.11 | +7: iteration 60380/ 173500 | consumed samples: 15457280 | consumed tokens: 31656509440 | elapsed time per iteration (s): 0.42 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.987076E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.119 | TFLOPs: 31.64 | +7: iteration 60390/ 173500 | consumed samples: 15459840 | consumed tokens: 31661752320 | elapsed time per iteration (s): 0.42 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 3.001104E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.134 | TFLOPs: 31.86 | +7: iteration 60400/ 173500 | consumed samples: 15462400 | consumed tokens: 31666995200 | elapsed time per iteration (s): 0.43 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.990464E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.749 | TFLOPs: 31.05 | +7: iteration 60410/ 173500 | consumed samples: 15464960 | consumed tokens: 31672238080 | elapsed time per iteration (s): 0.42 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.986096E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.353 | TFLOPs: 31.66 | +7: iteration 60420/ 173500 | consumed samples: 15467520 | consumed tokens: 31677480960 | elapsed time per iteration (s): 0.42 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.999150E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.350 | TFLOPs: 31.81 | +7: iteration 60430/ 173500 | consumed samples: 15470080 | consumed tokens: 31682723840 | elapsed time per iteration (s): 0.42 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.989474E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.252 | TFLOPs: 31.81 | +7: iteration 60440/ 173500 | consumed samples: 15472640 | consumed tokens: 31687966720 | elapsed time per iteration (s): 0.43 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.993358E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.996 | TFLOPs: 31.59 | +7: iteration 60450/ 173500 | consumed samples: 15475200 | consumed tokens: 31693209600 | elapsed time per iteration (s): 0.42 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.979739E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.442 | TFLOPs: 31.77 | +7: iteration 60460/ 173500 | consumed samples: 15477760 | consumed tokens: 31698452480 | elapsed time per iteration (s): 0.42 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.985268E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.052 | TFLOPs: 31.80 | +7: iteration 60470/ 173500 | consumed samples: 15480320 | consumed tokens: 31703695360 | elapsed time per iteration (s): 0.42 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 3.003201E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.442 | TFLOPs: 31.82 | +7: iteration 60480/ 173500 | consumed samples: 15482880 | consumed tokens: 31708938240 | elapsed time per iteration (s): 0.42 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.979984E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.222 | TFLOPs: 31.81 | +7: iteration 60490/ 173500 | consumed samples: 15485440 | consumed tokens: 31714181120 | elapsed time per iteration (s): 0.43 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.992163E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.523 | TFLOPs: 31.35 | +7: iteration 60500/ 173500 | consumed samples: 15488000 | consumed tokens: 31719424000 | elapsed time per iteration (s): 0.43 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.997522E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.929 | TFLOPs: 31.42 | +7: iteration 60510/ 173500 | consumed samples: 15490560 | consumed tokens: 31724666880 | elapsed time per iteration (s): 0.42 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.985464E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.147 | TFLOPs: 31.65 | +7: iteration 60520/ 173500 | consumed samples: 15493120 | consumed tokens: 31729909760 | elapsed time per iteration (s): 0.43 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.992850E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.338 | TFLOPs: 31.08 | +7: iteration 60530/ 173500 | consumed samples: 15495680 | consumed tokens: 31735152640 | elapsed time per iteration (s): 0.42 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 3.002237E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.009 | TFLOPs: 31.85 | +7: iteration 60540/ 173500 | consumed samples: 15498240 | consumed tokens: 31740395520 | elapsed time per iteration (s): 0.42 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.984049E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.271 | TFLOPs: 31.81 | +7: iteration 60550/ 173500 | consumed samples: 15500800 | consumed tokens: 31745638400 | elapsed time per iteration (s): 0.43 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.995144E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.566 | TFLOPs: 31.56 | +7: iteration 60560/ 173500 | consumed samples: 15503360 | consumed tokens: 31750881280 | elapsed time per iteration (s): 0.43 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.989473E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.239 | TFLOPs: 31.23 | +7: iteration 60570/ 173500 | consumed samples: 15505920 | consumed tokens: 31756124160 | elapsed time per iteration (s): 0.42 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.997575E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.458 | TFLOPs: 31.66 | +7: iteration 60580/ 173500 | consumed samples: 15508480 | consumed tokens: 31761367040 | elapsed time per iteration (s): 0.42 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.992509E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.043 | TFLOPs: 31.80 | +7: iteration 60590/ 173500 | consumed samples: 15511040 | consumed tokens: 31766609920 | elapsed time per iteration (s): 0.42 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 3.004459E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.037 | TFLOPs: 31.90 | +7: iteration 60600/ 173500 | consumed samples: 15513600 | consumed tokens: 31771852800 | elapsed time per iteration (s): 0.42 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 3.001398E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.774 | TFLOPs: 31.73 | +7: iteration 60610/ 173500 | consumed samples: 15516160 | consumed tokens: 31777095680 | elapsed time per iteration (s): 0.42 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.996490E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.069 | TFLOPs: 31.69 | +7: iteration 60620/ 173500 | consumed samples: 15518720 | consumed tokens: 31782338560 | elapsed time per iteration (s): 0.42 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 3.000397E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.131 | TFLOPs: 31.80 | +7: iteration 60630/ 173500 | consumed samples: 15521280 | consumed tokens: 31787581440 | elapsed time per iteration (s): 0.43 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 3.007067E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.330 | TFLOPs: 31.50 | +7: iteration 60640/ 173500 | consumed samples: 15523840 | consumed tokens: 31792824320 | elapsed time per iteration (s): 0.42 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.979424E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.870 | TFLOPs: 31.79 | +7: iteration 60650/ 173500 | consumed samples: 15526400 | consumed tokens: 31798067200 | elapsed time per iteration (s): 0.42 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.984646E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.267 | TFLOPs: 31.76 | +7: iteration 60660/ 173500 | consumed samples: 15528960 | consumed tokens: 31803310080 | elapsed time per iteration (s): 0.43 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 3.003065E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.302 | TFLOPs: 31.50 | +7: iteration 60670/ 173500 | consumed samples: 15531520 | consumed tokens: 31808552960 | elapsed time per iteration (s): 0.42 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.995315E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.531 | TFLOPs: 31.77 | +7: iteration 60680/ 173500 | consumed samples: 15534080 | consumed tokens: 31813795840 | elapsed time per iteration (s): 0.43 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 3.002601E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.845 | TFLOPs: 31.47 | +7: iteration 60690/ 173500 | consumed samples: 15536640 | consumed tokens: 31819038720 | elapsed time per iteration (s): 0.43 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.993790E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.050 | TFLOPs: 31.59 | +7: iteration 60700/ 173500 | consumed samples: 15539200 | consumed tokens: 31824281600 | elapsed time per iteration (s): 0.42 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.989306E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.394 | TFLOPs: 31.76 | +7: iteration 60710/ 173500 | consumed samples: 15541760 | consumed tokens: 31829524480 | elapsed time per iteration (s): 0.43 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 3.002129E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.835 | TFLOPs: 31.31 | +7: iteration 60720/ 173500 | consumed samples: 15544320 | consumed tokens: 31834767360 | elapsed time per iteration (s): 0.42 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.996037E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.325 | TFLOPs: 31.81 | +7: iteration 60730/ 173500 | consumed samples: 15546880 | consumed tokens: 31840010240 | elapsed time per iteration (s): 0.43 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.984434E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.089 | TFLOPs: 31.49 | +7: iteration 60740/ 173500 | consumed samples: 15549440 | consumed tokens: 31845253120 | elapsed time per iteration (s): 0.43 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.993049E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.390 | TFLOPs: 31.03 | +7: iteration 60750/ 173500 | consumed samples: 15552000 | consumed tokens: 31850496000 | elapsed time per iteration (s): 0.42 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.988292E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.674 | TFLOPs: 32.04 | +7: iteration 60760/ 173500 | consumed samples: 15554560 | consumed tokens: 31855738880 | elapsed time per iteration (s): 0.42 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.998581E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.860 | TFLOPs: 32.00 | +7: iteration 60770/ 173500 | consumed samples: 15557120 | consumed tokens: 31860981760 | elapsed time per iteration (s): 0.42 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 3.000574E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.107 | TFLOPs: 31.64 | +7: iteration 60780/ 173500 | consumed samples: 15559680 | consumed tokens: 31866224640 | elapsed time per iteration (s): 0.43 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.992995E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.534 | TFLOPs: 31.56 | +7: iteration 60790/ 173500 | consumed samples: 15562240 | consumed tokens: 31871467520 | elapsed time per iteration (s): 0.42 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.996050E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.711 | TFLOPs: 31.73 | +7: iteration 60800/ 173500 | consumed samples: 15564800 | consumed tokens: 31876710400 | elapsed time per iteration (s): 0.42 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.995177E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.083 | TFLOPs: 31.80 | +7: iteration 60810/ 173500 | consumed samples: 15567360 | consumed tokens: 31881953280 | elapsed time per iteration (s): 0.43 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.995927E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.076 | TFLOPs: 31.38 | +7: iteration 60820/ 173500 | consumed samples: 15569920 | consumed tokens: 31887196160 | elapsed time per iteration (s): 0.42 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.997476E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.104 | TFLOPs: 32.01 | +7: iteration 60830/ 173500 | consumed samples: 15572480 | consumed tokens: 31892439040 | elapsed time per iteration (s): 0.42 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.994537E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.003 | TFLOPs: 31.69 | +7: iteration 60840/ 173500 | consumed samples: 15575040 | consumed tokens: 31897681920 | elapsed time per iteration (s): 0.42 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.997768E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.881 | TFLOPs: 31.63 | +7: iteration 60850/ 173500 | consumed samples: 15577600 | consumed tokens: 31902924800 | elapsed time per iteration (s): 0.43 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.976275E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.823 | TFLOPs: 31.47 | +7: iteration 60860/ 173500 | consumed samples: 15580160 | consumed tokens: 31908167680 | elapsed time per iteration (s): 0.43 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.994526E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.860 | TFLOPs: 31.53 | +7: iteration 60870/ 173500 | consumed samples: 15582720 | consumed tokens: 31913410560 | elapsed time per iteration (s): 0.43 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.994919E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.210 | TFLOPs: 31.28 | +7: iteration 60880/ 173500 | consumed samples: 15585280 | consumed tokens: 31918653440 | elapsed time per iteration (s): 0.42 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.990965E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.293 | TFLOPs: 31.81 | +7: iteration 60890/ 173500 | consumed samples: 15587840 | consumed tokens: 31923896320 | elapsed time per iteration (s): 0.42 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.967605E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.554 | TFLOPs: 31.98 | +7: iteration 60900/ 173500 | consumed samples: 15590400 | consumed tokens: 31929139200 | elapsed time per iteration (s): 0.42 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 3.008301E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.586 | TFLOPs: 31.77 | +7: iteration 60910/ 173500 | consumed samples: 15592960 | consumed tokens: 31934382080 | elapsed time per iteration (s): 0.43 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.987981E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.129 | TFLOPs: 31.33 | +7: iteration 60920/ 173500 | consumed samples: 15595520 | consumed tokens: 31939624960 | elapsed time per iteration (s): 0.42 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 3.002530E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.122 | TFLOPs: 31.64 | +7: iteration 60930/ 173500 | consumed samples: 15598080 | consumed tokens: 31944867840 | elapsed time per iteration (s): 0.43 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.987039E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.963 | TFLOPs: 31.48 | +7: iteration 60940/ 173500 | consumed samples: 15600640 | consumed tokens: 31950110720 | elapsed time per iteration (s): 0.44 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.984981E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.918 | TFLOPs: 30.69 | +7: iteration 60950/ 173500 | consumed samples: 15603200 | consumed tokens: 31955353600 | elapsed time per iteration (s): 0.42 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.984148E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.628 | TFLOPs: 32.04 | +7: iteration 60960/ 173500 | consumed samples: 15605760 | consumed tokens: 31960596480 | elapsed time per iteration (s): 0.42 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.993075E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.109 | TFLOPs: 31.80 | +7: iteration 60970/ 173500 | consumed samples: 15608320 | consumed tokens: 31965839360 | elapsed time per iteration (s): 0.43 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.989268E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.121 | TFLOPs: 31.49 | +7: iteration 60980/ 173500 | consumed samples: 15610880 | consumed tokens: 31971082240 | elapsed time per iteration (s): 0.42 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 3.004110E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.646 | TFLOPs: 31.62 | +7: iteration 60990/ 173500 | consumed samples: 15613440 | consumed tokens: 31976325120 | elapsed time per iteration (s): 0.42 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.984522E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.811 | TFLOPs: 31.73 | +7: iteration 61000/ 173500 | consumed samples: 15616000 | consumed tokens: 31981568000 | elapsed time per iteration (s): 0.43 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.986870E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.195 | TFLOPs: 31.28 | +7: iteration 61010/ 173500 | consumed samples: 15618560 | consumed tokens: 31986810880 | elapsed time per iteration (s): 0.42 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.984683E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.011 | TFLOPs: 31.90 | +7: iteration 61020/ 173500 | consumed samples: 15621120 | consumed tokens: 31992053760 | elapsed time per iteration (s): 0.43 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.986664E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.900 | TFLOPs: 30.95 | +7: iteration 61030/ 173500 | consumed samples: 15623680 | consumed tokens: 31997296640 | elapsed time per iteration (s): 0.45 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.997677E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.090 | TFLOPs: 30.07 | +7: iteration 61040/ 173500 | consumed samples: 15626240 | consumed tokens: 32002539520 | elapsed time per iteration (s): 0.43 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.995817E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.198 | TFLOPs: 31.44 | +7: iteration 61050/ 173500 | consumed samples: 15628800 | consumed tokens: 32007782400 | elapsed time per iteration (s): 0.43 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.990091E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.735 | TFLOPs: 31.31 | +7: iteration 61060/ 173500 | consumed samples: 15631360 | consumed tokens: 32013025280 | elapsed time per iteration (s): 0.42 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 3.002460E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.295 | TFLOPs: 32.02 | +7: iteration 61070/ 173500 | consumed samples: 15633920 | consumed tokens: 32018268160 | elapsed time per iteration (s): 0.43 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.989639E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.499 | TFLOPs: 31.51 | +7: iteration 61080/ 173500 | consumed samples: 15636480 | consumed tokens: 32023511040 | elapsed time per iteration (s): 0.43 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.985033E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.966 | TFLOPs: 31.48 | +7: iteration 61090/ 173500 | consumed samples: 15639040 | consumed tokens: 32028753920 | elapsed time per iteration (s): 0.43 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.981810E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.177 | TFLOPs: 31.54 | +7: iteration 61100/ 173500 | consumed samples: 15641600 | consumed tokens: 32033996800 | elapsed time per iteration (s): 0.43 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 3.004227E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.137 | TFLOPs: 31.02 | +7: iteration 61110/ 173500 | consumed samples: 15644160 | consumed tokens: 32039239680 | elapsed time per iteration (s): 0.42 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 3.008389E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.879 | TFLOPs: 31.74 | +7: iteration 61120/ 173500 | consumed samples: 15646720 | consumed tokens: 32044482560 | elapsed time per iteration (s): 0.43 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.982920E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.918 | TFLOPs: 31.21 | +7: iteration 61130/ 173500 | consumed samples: 15649280 | consumed tokens: 32049725440 | elapsed time per iteration (s): 0.42 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 3.006389E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.728 | TFLOPs: 31.89 | +7: iteration 61140/ 173500 | consumed samples: 15651840 | consumed tokens: 32054968320 | elapsed time per iteration (s): 0.42 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.995408E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.875 | TFLOPs: 31.68 | +7: iteration 61150/ 173500 | consumed samples: 15654400 | consumed tokens: 32060211200 | elapsed time per iteration (s): 0.42 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.974888E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.800 | TFLOPs: 31.73 | +7: iteration 61160/ 173500 | consumed samples: 15656960 | consumed tokens: 32065454080 | elapsed time per iteration (s): 0.42 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.979272E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.892 | TFLOPs: 31.69 | +7: iteration 61170/ 173500 | consumed samples: 15659520 | consumed tokens: 32070696960 | elapsed time per iteration (s): 0.43 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 3.004440E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.340 | TFLOPs: 31.29 | +7: iteration 61180/ 173500 | consumed samples: 15662080 | consumed tokens: 32075939840 | elapsed time per iteration (s): 0.42 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 3.012111E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.251 | TFLOPs: 31.81 | +7: iteration 61190/ 173500 | consumed samples: 15664640 | consumed tokens: 32081182720 | elapsed time per iteration (s): 0.43 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.984327E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.893 | TFLOPs: 31.42 | +7: iteration 61200/ 173500 | consumed samples: 15667200 | consumed tokens: 32086425600 | elapsed time per iteration (s): 0.42 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.994177E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.215 | TFLOPs: 32.02 | +7: iteration 61210/ 173500 | consumed samples: 15669760 | consumed tokens: 32091668480 | elapsed time per iteration (s): 0.43 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.993819E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.836 | TFLOPs: 31.47 | +7: iteration 61220/ 173500 | consumed samples: 15672320 | consumed tokens: 32096911360 | elapsed time per iteration (s): 0.43 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.990272E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.560 | TFLOPs: 31.41 | +7: iteration 61230/ 173500 | consumed samples: 15674880 | consumed tokens: 32102154240 | elapsed time per iteration (s): 0.42 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 3.008858E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.541 | TFLOPs: 31.88 | +7: iteration 61240/ 173500 | consumed samples: 15677440 | consumed tokens: 32107397120 | elapsed time per iteration (s): 0.43 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.989248E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.797 | TFLOPs: 31.58 | +7: iteration 61250/ 173500 | consumed samples: 15680000 | consumed tokens: 32112640000 | elapsed time per iteration (s): 0.43 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.986901E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.758 | TFLOPs: 31.57 | +7: iteration 61260/ 173500 | consumed samples: 15682560 | consumed tokens: 32117882880 | elapsed time per iteration (s): 0.43 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 3.009241E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.819 | TFLOPs: 31.00 | +7: iteration 61270/ 173500 | consumed samples: 15685120 | consumed tokens: 32123125760 | elapsed time per iteration (s): 0.42 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.999432E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.224 | TFLOPs: 31.70 | +7: iteration 61280/ 173500 | consumed samples: 15687680 | consumed tokens: 32128368640 | elapsed time per iteration (s): 0.42 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.984301E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.744 | TFLOPs: 31.89 | +7: iteration 61290/ 173500 | consumed samples: 15690240 | consumed tokens: 32133611520 | elapsed time per iteration (s): 0.43 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.993579E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.439 | TFLOPs: 31.35 | +7: iteration 61300/ 173500 | consumed samples: 15692800 | consumed tokens: 32138854400 | elapsed time per iteration (s): 0.43 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 3.007626E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.570 | TFLOPs: 31.46 | +7: iteration 61310/ 173500 | consumed samples: 15695360 | consumed tokens: 32144097280 | elapsed time per iteration (s): 0.43 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.980745E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.118 | TFLOPs: 31.33 | +7: iteration 61320/ 173500 | consumed samples: 15697920 | consumed tokens: 32149340160 | elapsed time per iteration (s): 0.42 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.988062E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.805 | TFLOPs: 31.79 | +7: iteration 61330/ 173500 | consumed samples: 15700480 | consumed tokens: 32154583040 | elapsed time per iteration (s): 0.43 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.980559E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.579 | TFLOPs: 31.56 | +7: iteration 61340/ 173500 | consumed samples: 15703040 | consumed tokens: 32159825920 | elapsed time per iteration (s): 0.42 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.977022E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.330 | TFLOPs: 31.71 | +7: iteration 61350/ 173500 | consumed samples: 15705600 | consumed tokens: 32165068800 | elapsed time per iteration (s): 0.43 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.982124E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.888 | TFLOPs: 31.37 | +7: iteration 61360/ 173500 | consumed samples: 15708160 | consumed tokens: 32170311680 | elapsed time per iteration (s): 0.42 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 3.011157E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.126 | TFLOPs: 31.70 | +7: iteration 61370/ 173500 | consumed samples: 15710720 | consumed tokens: 32175554560 | elapsed time per iteration (s): 0.42 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.985997E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.628 | TFLOPs: 31.93 | +7: iteration 61380/ 173500 | consumed samples: 15713280 | consumed tokens: 32180797440 | elapsed time per iteration (s): 0.42 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.999549E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.519 | TFLOPs: 31.67 | +7: iteration 61390/ 173500 | consumed samples: 15715840 | consumed tokens: 32186040320 | elapsed time per iteration (s): 0.43 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.999409E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.095 | TFLOPs: 31.38 | +7: iteration 61400/ 173500 | consumed samples: 15718400 | consumed tokens: 32191283200 | elapsed time per iteration (s): 0.43 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.980180E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.154 | TFLOPs: 31.54 | +7: iteration 61410/ 173500 | consumed samples: 15720960 | consumed tokens: 32196526080 | elapsed time per iteration (s): 0.43 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.997756E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.415 | TFLOPs: 31.56 | +7: iteration 61420/ 173500 | consumed samples: 15723520 | consumed tokens: 32201768960 | elapsed time per iteration (s): 0.42 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 3.001297E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.468 | TFLOPs: 31.82 | +7: iteration 61430/ 173500 | consumed samples: 15726080 | consumed tokens: 32207011840 | elapsed time per iteration (s): 0.42 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.979224E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.841 | TFLOPs: 32.00 | +7: iteration 61440/ 173500 | consumed samples: 15728640 | consumed tokens: 32212254720 | elapsed time per iteration (s): 0.43 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.979426E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.613 | TFLOPs: 31.57 | +7: iteration 61450/ 173500 | consumed samples: 15731200 | consumed tokens: 32217497600 | elapsed time per iteration (s): 0.43 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.994767E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.944 | TFLOPs: 31.58 | +7: iteration 61460/ 173500 | consumed samples: 15733760 | consumed tokens: 32222740480 | elapsed time per iteration (s): 0.42 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.995853E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.761 | TFLOPs: 31.73 | +7: iteration 61470/ 173500 | consumed samples: 15736320 | consumed tokens: 32227983360 | elapsed time per iteration (s): 0.42 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.993127E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.020 | TFLOPs: 31.69 | +7: iteration 61480/ 173500 | consumed samples: 15738880 | consumed tokens: 32233226240 | elapsed time per iteration (s): 0.42 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.985699E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.583 | TFLOPs: 31.72 | +7: iteration 61490/ 173500 | consumed samples: 15741440 | consumed tokens: 32238469120 | elapsed time per iteration (s): 0.42 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.980436E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.937 | TFLOPs: 31.64 | +7: iteration 61500/ 173500 | consumed samples: 15744000 | consumed tokens: 32243712000 | elapsed time per iteration (s): 0.42 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 3.001628E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.090 | TFLOPs: 31.85 | +7: iteration 61510/ 173500 | consumed samples: 15746560 | consumed tokens: 32248954880 | elapsed time per iteration (s): 0.42 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.979124E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.306 | TFLOPs: 31.76 | +7: iteration 61520/ 173500 | consumed samples: 15749120 | consumed tokens: 32254197760 | elapsed time per iteration (s): 0.42 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.991544E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.108 | TFLOPs: 31.75 | +7: iteration 61530/ 173500 | consumed samples: 15751680 | consumed tokens: 32259440640 | elapsed time per iteration (s): 0.42 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.981665E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.611 | TFLOPs: 31.67 | +7: iteration 61540/ 173500 | consumed samples: 15754240 | consumed tokens: 32264683520 | elapsed time per iteration (s): 0.42 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 3.005305E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.170 | TFLOPs: 31.80 | +7: iteration 61550/ 173500 | consumed samples: 15756800 | consumed tokens: 32269926400 | elapsed time per iteration (s): 0.42 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.993870E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.822 | TFLOPs: 31.73 | +7: iteration 61560/ 173500 | consumed samples: 15759360 | consumed tokens: 32275169280 | elapsed time per iteration (s): 0.43 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.992855E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.158 | TFLOPs: 31.33 | +7: iteration 61570/ 173500 | consumed samples: 15761920 | consumed tokens: 32280412160 | elapsed time per iteration (s): 0.42 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.993809E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.909 | TFLOPs: 32.00 | +7: iteration 61580/ 173500 | consumed samples: 15764480 | consumed tokens: 32285655040 | elapsed time per iteration (s): 0.42 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.996566E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.325 | TFLOPs: 31.76 | +7: iteration 61590/ 173500 | consumed samples: 15767040 | consumed tokens: 32290897920 | elapsed time per iteration (s): 0.42 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.984318E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.794 | TFLOPs: 31.68 | +7: iteration 61600/ 173500 | consumed samples: 15769600 | consumed tokens: 32296140800 | elapsed time per iteration (s): 0.43 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.987059E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.067 | TFLOPs: 31.33 | +7: iteration 61610/ 173500 | consumed samples: 15772160 | consumed tokens: 32301383680 | elapsed time per iteration (s): 0.42 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 3.004812E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.031 | TFLOPs: 31.75 | +7: iteration 61620/ 173500 | consumed samples: 15774720 | consumed tokens: 32306626560 | elapsed time per iteration (s): 0.43 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.975400E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.189 | TFLOPs: 31.18 | +7: iteration 61630/ 173500 | consumed samples: 15777280 | consumed tokens: 32311869440 | elapsed time per iteration (s): 0.43 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.984125E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.606 | TFLOPs: 31.46 | +7: iteration 61640/ 173500 | consumed samples: 15779840 | consumed tokens: 32317112320 | elapsed time per iteration (s): 0.43 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 3.004257E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.100 | TFLOPs: 31.12 | +7: iteration 61650/ 173500 | consumed samples: 15782400 | consumed tokens: 32322355200 | elapsed time per iteration (s): 0.43 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 3.002664E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.267 | TFLOPs: 31.55 | +7: iteration 61660/ 173500 | consumed samples: 15784960 | consumed tokens: 32327598080 | elapsed time per iteration (s): 0.43 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.982669E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.271 | TFLOPs: 31.34 | +7: iteration 61670/ 173500 | consumed samples: 15787520 | consumed tokens: 32332840960 | elapsed time per iteration (s): 0.42 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.988017E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.343 | TFLOPs: 31.76 | +7: iteration 61680/ 173500 | consumed samples: 15790080 | consumed tokens: 32338083840 | elapsed time per iteration (s): 0.43 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.985494E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.488 | TFLOPs: 31.35 | +7: iteration 61690/ 173500 | consumed samples: 15792640 | consumed tokens: 32343326720 | elapsed time per iteration (s): 0.43 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.993646E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.046 | TFLOPs: 31.54 | +7: iteration 61700/ 173500 | consumed samples: 15795200 | consumed tokens: 32348569600 | elapsed time per iteration (s): 0.42 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.994017E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.869 | TFLOPs: 32.00 | +7: iteration 61710/ 173500 | consumed samples: 15797760 | consumed tokens: 32353812480 | elapsed time per iteration (s): 0.42 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.992566E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.280 | TFLOPs: 31.86 | +7: iteration 61720/ 173500 | consumed samples: 15800320 | consumed tokens: 32359055360 | elapsed time per iteration (s): 0.43 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.995288E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.940 | TFLOPs: 31.16 | +7: iteration 61730/ 173500 | consumed samples: 15802880 | consumed tokens: 32364298240 | elapsed time per iteration (s): 0.43 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.994496E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.158 | TFLOPs: 30.91 | +7: iteration 61740/ 173500 | consumed samples: 15805440 | consumed tokens: 32369541120 | elapsed time per iteration (s): 0.42 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.984407E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.480 | TFLOPs: 31.77 | +7: iteration 61750/ 173500 | consumed samples: 15808000 | consumed tokens: 32374784000 | elapsed time per iteration (s): 0.43 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.996389E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.416 | TFLOPs: 31.29 | +7: iteration 61760/ 173500 | consumed samples: 15810560 | consumed tokens: 32380026880 | elapsed time per iteration (s): 0.42 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.993266E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.859 | TFLOPs: 32.00 | +7: iteration 61770/ 173500 | consumed samples: 15813120 | consumed tokens: 32385269760 | elapsed time per iteration (s): 0.42 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.987338E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.569 | TFLOPs: 31.77 | +7: iteration 61780/ 173500 | consumed samples: 15815680 | consumed tokens: 32390512640 | elapsed time per iteration (s): 0.42 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.995760E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.954 | TFLOPs: 31.79 | +7: iteration 61790/ 173500 | consumed samples: 15818240 | consumed tokens: 32395755520 | elapsed time per iteration (s): 0.43 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.977265E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.686 | TFLOPs: 31.46 | +7: iteration 61800/ 173500 | consumed samples: 15820800 | consumed tokens: 32400998400 | elapsed time per iteration (s): 0.42 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.993982E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.669 | TFLOPs: 31.99 | +7: iteration 61810/ 173500 | consumed samples: 15823360 | consumed tokens: 32406241280 | elapsed time per iteration (s): 0.42 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.999825E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.475 | TFLOPs: 31.98 | +7: iteration 61820/ 173500 | consumed samples: 15825920 | consumed tokens: 32411484160 | elapsed time per iteration (s): 0.42 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.995225E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.501 | TFLOPs: 31.98 | +7: iteration 61830/ 173500 | consumed samples: 15828480 | consumed tokens: 32416727040 | elapsed time per iteration (s): 0.43 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.980568E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.013 | TFLOPs: 31.48 | +7: iteration 61840/ 173500 | consumed samples: 15831040 | consumed tokens: 32421969920 | elapsed time per iteration (s): 0.42 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.979688E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.201 | TFLOPs: 31.75 | +7: iteration 61850/ 173500 | consumed samples: 15833600 | consumed tokens: 32427212800 | elapsed time per iteration (s): 0.43 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.985436E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.412 | TFLOPs: 31.56 | +7: iteration 61860/ 173500 | consumed samples: 15836160 | consumed tokens: 32432455680 | elapsed time per iteration (s): 0.43 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.987223E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.704 | TFLOPs: 31.57 | +7: iteration 61870/ 173500 | consumed samples: 15838720 | consumed tokens: 32437698560 | elapsed time per iteration (s): 0.42 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.998597E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.253 | TFLOPs: 31.70 | +7: iteration 61880/ 173500 | consumed samples: 15841280 | consumed tokens: 32442941440 | elapsed time per iteration (s): 0.42 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.974527E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.560 | TFLOPs: 31.98 | +7: iteration 61890/ 173500 | consumed samples: 15843840 | consumed tokens: 32448184320 | elapsed time per iteration (s): 0.42 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.986520E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.346 | TFLOPs: 31.66 | +7: iteration 61900/ 173500 | consumed samples: 15846400 | consumed tokens: 32453427200 | elapsed time per iteration (s): 0.43 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.988396E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.499 | TFLOPs: 31.45 | +7: iteration 61910/ 173500 | consumed samples: 15848960 | consumed tokens: 32458670080 | elapsed time per iteration (s): 0.42 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 3.013198E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.476 | TFLOPs: 31.98 | +7: iteration 61920/ 173500 | consumed samples: 15851520 | consumed tokens: 32463912960 | elapsed time per iteration (s): 0.43 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.987900E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.331 | TFLOPs: 31.45 | +7: iteration 61930/ 173500 | consumed samples: 15854080 | consumed tokens: 32469155840 | elapsed time per iteration (s): 0.42 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.994165E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.741 | TFLOPs: 31.99 | +7: iteration 61940/ 173500 | consumed samples: 15856640 | consumed tokens: 32474398720 | elapsed time per iteration (s): 0.42 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.993193E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.204 | TFLOPs: 31.75 | +7: iteration 61950/ 173500 | consumed samples: 15859200 | consumed tokens: 32479641600 | elapsed time per iteration (s): 0.43 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.981606E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.019 | TFLOPs: 31.11 | +7: iteration 61960/ 173500 | consumed samples: 15861760 | consumed tokens: 32484884480 | elapsed time per iteration (s): 0.42 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 3.001399E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.704 | TFLOPs: 31.78 | +7: iteration 61970/ 173500 | consumed samples: 15864320 | consumed tokens: 32490127360 | elapsed time per iteration (s): 0.43 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.984528E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.719 | TFLOPs: 31.57 | +7: iteration 61980/ 173500 | consumed samples: 15866880 | consumed tokens: 32495370240 | elapsed time per iteration (s): 0.42 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.989004E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.720 | TFLOPs: 31.99 | +7: iteration 61990/ 173500 | consumed samples: 15869440 | consumed tokens: 32500613120 | elapsed time per iteration (s): 0.42 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.978458E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.283 | TFLOPs: 31.76 | +0: [2023-03-17 06:32:37,478] [INFO] [logging.py:68:log_dist] [Rank 0] step=62000, skipped=0, lr=[0.00015064331838981058, 0.00015064331838981058, 0.00015064331838981058], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 62000/ 173500 | consumed samples: 15872000 | consumed tokens: 32505856000 | elapsed time per iteration (s): 0.43 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.997001E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.746 | TFLOPs: 31.47 | +0: steps: 62000 loss: 3.0121 iter time (s): 0.425 samples/sec: 602.648 +7: iteration 62010/ 173500 | consumed samples: 15874560 | consumed tokens: 32511098880 | elapsed time per iteration (s): 0.43 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.981755E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.041 | TFLOPs: 31.59 | +7: iteration 62020/ 173500 | consumed samples: 15877120 | consumed tokens: 32516341760 | elapsed time per iteration (s): 0.42 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.994209E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.271 | TFLOPs: 31.76 | +7: iteration 62030/ 173500 | consumed samples: 15879680 | consumed tokens: 32521584640 | elapsed time per iteration (s): 0.43 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.988225E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.476 | TFLOPs: 31.51 | +7: iteration 62040/ 173500 | consumed samples: 15882240 | consumed tokens: 32526827520 | elapsed time per iteration (s): 0.42 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.990185E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.333 | TFLOPs: 31.97 | +7: iteration 62050/ 173500 | consumed samples: 15884800 | consumed tokens: 32532070400 | elapsed time per iteration (s): 0.42 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.984255E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.333 | TFLOPs: 31.92 | +7: iteration 62060/ 173500 | consumed samples: 15887360 | consumed tokens: 32537313280 | elapsed time per iteration (s): 0.43 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.980125E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.700 | TFLOPs: 31.36 | +7: iteration 62070/ 173500 | consumed samples: 15889920 | consumed tokens: 32542556160 | elapsed time per iteration (s): 0.43 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.986194E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.337 | TFLOPs: 31.60 | +7: iteration 62080/ 173500 | consumed samples: 15892480 | consumed tokens: 32547799040 | elapsed time per iteration (s): 0.43 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.985490E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.827 | TFLOPs: 31.21 | +7: iteration 62090/ 173500 | consumed samples: 15895040 | consumed tokens: 32553041920 | elapsed time per iteration (s): 0.42 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.990180E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.694 | TFLOPs: 31.73 | +7: iteration 62100/ 173500 | consumed samples: 15897600 | consumed tokens: 32558284800 | elapsed time per iteration (s): 0.42 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.986932E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.850 | TFLOPs: 31.63 | +7: iteration 62110/ 173500 | consumed samples: 15900160 | consumed tokens: 32563527680 | elapsed time per iteration (s): 0.42 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 3.006896E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.074 | TFLOPs: 31.80 | +7: iteration 62120/ 173500 | consumed samples: 15902720 | consumed tokens: 32568770560 | elapsed time per iteration (s): 0.43 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.994596E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.079 | TFLOPs: 31.38 | +7: iteration 62130/ 173500 | consumed samples: 15905280 | consumed tokens: 32574013440 | elapsed time per iteration (s): 0.42 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.989718E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.423 | TFLOPs: 31.98 | +7: iteration 62140/ 173500 | consumed samples: 15907840 | consumed tokens: 32579256320 | elapsed time per iteration (s): 0.43 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.992885E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.123 | TFLOPs: 31.59 | +7: iteration 62150/ 173500 | consumed samples: 15910400 | consumed tokens: 32584499200 | elapsed time per iteration (s): 0.42 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.984671E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.748 | TFLOPs: 31.84 | +7: iteration 62160/ 173500 | consumed samples: 15912960 | consumed tokens: 32589742080 | elapsed time per iteration (s): 0.43 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.988841E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.718 | TFLOPs: 31.52 | +7: iteration 62170/ 173500 | consumed samples: 15915520 | consumed tokens: 32594984960 | elapsed time per iteration (s): 0.42 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.991110E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.108 | TFLOPs: 31.80 | +7: iteration 62180/ 173500 | consumed samples: 15918080 | consumed tokens: 32600227840 | elapsed time per iteration (s): 0.42 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.985837E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.790 | TFLOPs: 31.63 | +7: iteration 62190/ 173500 | consumed samples: 15920640 | consumed tokens: 32605470720 | elapsed time per iteration (s): 0.43 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.979154E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.909 | TFLOPs: 30.90 | +7: iteration 62200/ 173500 | consumed samples: 15923200 | consumed tokens: 32610713600 | elapsed time per iteration (s): 0.43 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.993219E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.400 | TFLOPs: 31.50 | +7: iteration 62210/ 173500 | consumed samples: 15925760 | consumed tokens: 32615956480 | elapsed time per iteration (s): 0.42 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.997168E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.661 | TFLOPs: 31.99 | +7: iteration 62220/ 173500 | consumed samples: 15928320 | consumed tokens: 32621199360 | elapsed time per iteration (s): 0.42 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.981468E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.392 | TFLOPs: 31.76 | +7: iteration 62230/ 173500 | consumed samples: 15930880 | consumed tokens: 32626442240 | elapsed time per iteration (s): 0.42 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.981875E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.203 | TFLOPs: 31.96 | +7: iteration 62240/ 173500 | consumed samples: 15933440 | consumed tokens: 32631685120 | elapsed time per iteration (s): 0.43 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 3.000314E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.805 | TFLOPs: 31.47 | +7: iteration 62250/ 173500 | consumed samples: 15936000 | consumed tokens: 32636928000 | elapsed time per iteration (s): 0.42 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 3.003762E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.379 | TFLOPs: 31.76 | +7: iteration 62260/ 173500 | consumed samples: 15938560 | consumed tokens: 32642170880 | elapsed time per iteration (s): 0.42 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.990138E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.678 | TFLOPs: 31.67 | +7: iteration 62270/ 173500 | consumed samples: 15941120 | consumed tokens: 32647413760 | elapsed time per iteration (s): 0.42 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.987982E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.309 | TFLOPs: 31.97 | +7: iteration 62280/ 173500 | consumed samples: 15943680 | consumed tokens: 32652656640 | elapsed time per iteration (s): 0.42 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.993152E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.062 | TFLOPs: 31.69 | +7: iteration 62290/ 173500 | consumed samples: 15946240 | consumed tokens: 32657899520 | elapsed time per iteration (s): 0.43 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.980561E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.750 | TFLOPs: 31.36 | +7: iteration 62300/ 173500 | consumed samples: 15948800 | consumed tokens: 32663142400 | elapsed time per iteration (s): 0.42 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.995213E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.308 | TFLOPs: 31.71 | +7: iteration 62310/ 173500 | consumed samples: 15951360 | consumed tokens: 32668385280 | elapsed time per iteration (s): 0.42 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.999372E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.510 | TFLOPs: 31.67 | +7: iteration 62320/ 173500 | consumed samples: 15953920 | consumed tokens: 32673628160 | elapsed time per iteration (s): 0.42 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 3.003161E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.488 | TFLOPs: 31.66 | +7: iteration 62330/ 173500 | consumed samples: 15956480 | consumed tokens: 32678871040 | elapsed time per iteration (s): 0.42 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.989430E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.899 | TFLOPs: 32.00 | +7: iteration 62340/ 173500 | consumed samples: 15959040 | consumed tokens: 32684113920 | elapsed time per iteration (s): 0.43 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.992396E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.858 | TFLOPs: 31.47 | +7: iteration 62350/ 173500 | consumed samples: 15961600 | consumed tokens: 32689356800 | elapsed time per iteration (s): 0.42 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.987508E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.715 | TFLOPs: 31.73 | +7: iteration 62360/ 173500 | consumed samples: 15964160 | consumed tokens: 32694599680 | elapsed time per iteration (s): 0.42 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.976682E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.264 | TFLOPs: 31.76 | +7: iteration 62370/ 173500 | consumed samples: 15966720 | consumed tokens: 32699842560 | elapsed time per iteration (s): 0.43 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.994396E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.857 | TFLOPs: 31.53 | +7: iteration 62380/ 173500 | consumed samples: 15969280 | consumed tokens: 32705085440 | elapsed time per iteration (s): 0.42 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.999927E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.848 | TFLOPs: 31.89 | +7: iteration 62390/ 173500 | consumed samples: 15971840 | consumed tokens: 32710328320 | elapsed time per iteration (s): 0.43 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.991438E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.484 | TFLOPs: 31.51 | +7: iteration 62400/ 173500 | consumed samples: 15974400 | consumed tokens: 32715571200 | elapsed time per iteration (s): 0.43 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.979537E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.507 | TFLOPs: 31.46 | +7: iteration 62410/ 173500 | consumed samples: 15976960 | consumed tokens: 32720814080 | elapsed time per iteration (s): 0.42 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.970463E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.582 | TFLOPs: 31.72 | +7: iteration 62420/ 173500 | consumed samples: 15979520 | consumed tokens: 32726056960 | elapsed time per iteration (s): 0.42 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.996021E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.796 | TFLOPs: 31.79 | +7: iteration 62430/ 173500 | consumed samples: 15982080 | consumed tokens: 32731299840 | elapsed time per iteration (s): 0.42 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.990153E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.358 | TFLOPs: 31.76 | +7: iteration 62440/ 173500 | consumed samples: 15984640 | consumed tokens: 32736542720 | elapsed time per iteration (s): 0.43 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.987380E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.242 | TFLOPs: 31.44 | +7: iteration 62450/ 173500 | consumed samples: 15987200 | consumed tokens: 32741785600 | elapsed time per iteration (s): 0.43 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.984302E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.917 | TFLOPs: 31.42 | +7: iteration 62460/ 173500 | consumed samples: 15989760 | consumed tokens: 32747028480 | elapsed time per iteration (s): 0.42 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.997466E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.504 | TFLOPs: 31.61 | +7: iteration 62470/ 173500 | consumed samples: 15992320 | consumed tokens: 32752271360 | elapsed time per iteration (s): 0.42 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.984194E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.962 | TFLOPs: 31.69 | +7: iteration 62480/ 173500 | consumed samples: 15994880 | consumed tokens: 32757514240 | elapsed time per iteration (s): 0.42 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.981065E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.479 | TFLOPs: 31.98 | +7: iteration 62490/ 173500 | consumed samples: 15997440 | consumed tokens: 32762757120 | elapsed time per iteration (s): 0.44 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.984745E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.349 | TFLOPs: 30.45 | +7: iteration 62500/ 173500 | consumed samples: 16000000 | consumed tokens: 32768000000 | elapsed time per iteration (s): 0.43 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.992676E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.401 | TFLOPs: 31.55 | +7: iteration 62510/ 173500 | consumed samples: 16002560 | consumed tokens: 32773242880 | elapsed time per iteration (s): 0.43 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.976374E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.501 | TFLOPs: 31.45 | +7: iteration 62520/ 173500 | consumed samples: 16005120 | consumed tokens: 32778485760 | elapsed time per iteration (s): 0.42 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.998971E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.444 | TFLOPs: 31.87 | +7: iteration 62530/ 173500 | consumed samples: 16007680 | consumed tokens: 32783728640 | elapsed time per iteration (s): 0.43 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.995119E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.737 | TFLOPs: 31.52 | +7: iteration 62540/ 173500 | consumed samples: 16010240 | consumed tokens: 32788971520 | elapsed time per iteration (s): 0.42 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.976669E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.034 | TFLOPs: 31.64 | +7: iteration 62550/ 173500 | consumed samples: 16012800 | consumed tokens: 32794214400 | elapsed time per iteration (s): 0.42 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.993766E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.699 | TFLOPs: 31.94 | +7: iteration 62560/ 173500 | consumed samples: 16015360 | consumed tokens: 32799457280 | elapsed time per iteration (s): 0.43 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.994566E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.913 | TFLOPs: 31.21 | +7: iteration 62570/ 173500 | consumed samples: 16017920 | consumed tokens: 32804700160 | elapsed time per iteration (s): 0.42 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.994653E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.005 | TFLOPs: 31.69 | +7: iteration 62580/ 173500 | consumed samples: 16020480 | consumed tokens: 32809943040 | elapsed time per iteration (s): 0.43 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.986492E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.295 | TFLOPs: 31.44 | +7: iteration 62590/ 173500 | consumed samples: 16023040 | consumed tokens: 32815185920 | elapsed time per iteration (s): 0.43 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.996623E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.970 | TFLOPs: 31.48 | +7: iteration 62600/ 173500 | consumed samples: 16025600 | consumed tokens: 32820428800 | elapsed time per iteration (s): 0.42 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.996895E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.642 | TFLOPs: 31.78 | +7: iteration 62610/ 173500 | consumed samples: 16028160 | consumed tokens: 32825671680 | elapsed time per iteration (s): 0.42 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.994346E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.768 | TFLOPs: 31.73 | +7: iteration 62620/ 173500 | consumed samples: 16030720 | consumed tokens: 32830914560 | elapsed time per iteration (s): 0.42 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.987657E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.858 | TFLOPs: 31.79 | +7: iteration 62630/ 173500 | consumed samples: 16033280 | consumed tokens: 32836157440 | elapsed time per iteration (s): 0.42 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.993089E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.942 | TFLOPs: 31.90 | +7: iteration 62640/ 173500 | consumed samples: 16035840 | consumed tokens: 32841400320 | elapsed time per iteration (s): 0.42 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.990268E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.653 | TFLOPs: 31.62 | +7: iteration 62650/ 173500 | consumed samples: 16038400 | consumed tokens: 32846643200 | elapsed time per iteration (s): 0.42 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.986561E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.510 | TFLOPs: 31.77 | +7: iteration 62660/ 173500 | consumed samples: 16040960 | consumed tokens: 32851886080 | elapsed time per iteration (s): 0.42 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.978126E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.934 | TFLOPs: 31.74 | +7: iteration 62670/ 173500 | consumed samples: 16043520 | consumed tokens: 32857128960 | elapsed time per iteration (s): 0.42 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.992666E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.781 | TFLOPs: 31.68 | +7: iteration 62680/ 173500 | consumed samples: 16046080 | consumed tokens: 32862371840 | elapsed time per iteration (s): 0.42 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.981840E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.712 | TFLOPs: 31.99 | +7: iteration 62690/ 173500 | consumed samples: 16048640 | consumed tokens: 32867614720 | elapsed time per iteration (s): 0.42 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.981818E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.612 | TFLOPs: 31.78 | +7: iteration 62700/ 173500 | consumed samples: 16051200 | consumed tokens: 32872857600 | elapsed time per iteration (s): 0.42 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.983425E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.789 | TFLOPs: 31.99 | +7: iteration 62710/ 173500 | consumed samples: 16053760 | consumed tokens: 32878100480 | elapsed time per iteration (s): 0.42 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.987636E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.610 | TFLOPs: 31.67 | +7: iteration 62720/ 173500 | consumed samples: 16056320 | consumed tokens: 32883343360 | elapsed time per iteration (s): 0.42 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.988082E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.403 | TFLOPs: 31.71 | +7: iteration 62730/ 173500 | consumed samples: 16058880 | consumed tokens: 32888586240 | elapsed time per iteration (s): 0.42 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.968639E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.082 | TFLOPs: 31.85 | +7: iteration 62740/ 173500 | consumed samples: 16061440 | consumed tokens: 32893829120 | elapsed time per iteration (s): 0.43 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.995837E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.329 | TFLOPs: 31.39 | +7: iteration 62750/ 173500 | consumed samples: 16064000 | consumed tokens: 32899072000 | elapsed time per iteration (s): 0.43 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.988561E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.613 | TFLOPs: 31.57 | +7: iteration 62760/ 173500 | consumed samples: 16066560 | consumed tokens: 32904314880 | elapsed time per iteration (s): 0.42 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.978847E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.236 | TFLOPs: 31.65 | +7: iteration 62770/ 173500 | consumed samples: 16069120 | consumed tokens: 32909557760 | elapsed time per iteration (s): 0.42 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.999144E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.868 | TFLOPs: 31.63 | +7: iteration 62780/ 173500 | consumed samples: 16071680 | consumed tokens: 32914800640 | elapsed time per iteration (s): 0.43 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.973079E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.545 | TFLOPs: 31.56 | +7: iteration 62790/ 173500 | consumed samples: 16074240 | consumed tokens: 32920043520 | elapsed time per iteration (s): 0.42 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.972955E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.987 | TFLOPs: 31.64 | +7: iteration 62800/ 173500 | consumed samples: 16076800 | consumed tokens: 32925286400 | elapsed time per iteration (s): 0.42 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.997873E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.762 | TFLOPs: 31.84 | +7: iteration 62810/ 173500 | consumed samples: 16079360 | consumed tokens: 32930529280 | elapsed time per iteration (s): 0.43 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.989731E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.923 | TFLOPs: 31.37 | +7: iteration 62820/ 173500 | consumed samples: 16081920 | consumed tokens: 32935772160 | elapsed time per iteration (s): 0.42 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.977808E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.176 | TFLOPs: 31.96 | +7: iteration 62830/ 173500 | consumed samples: 16084480 | consumed tokens: 32941015040 | elapsed time per iteration (s): 0.42 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.983210E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.933 | TFLOPs: 31.74 | +7: iteration 62840/ 173500 | consumed samples: 16087040 | consumed tokens: 32946257920 | elapsed time per iteration (s): 0.42 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.994256E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.446 | TFLOPs: 31.77 | +7: iteration 62850/ 173500 | consumed samples: 16089600 | consumed tokens: 32951500800 | elapsed time per iteration (s): 0.42 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.997399E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.250 | TFLOPs: 31.76 | +7: iteration 62860/ 173500 | consumed samples: 16092160 | consumed tokens: 32956743680 | elapsed time per iteration (s): 0.42 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.992403E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.405 | TFLOPs: 31.97 | +7: iteration 62870/ 173500 | consumed samples: 16094720 | consumed tokens: 32961986560 | elapsed time per iteration (s): 0.42 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.989449E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.680 | TFLOPs: 31.67 | +7: iteration 62880/ 173500 | consumed samples: 16097280 | consumed tokens: 32967229440 | elapsed time per iteration (s): 0.42 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.991951E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.863 | TFLOPs: 31.74 | +7: iteration 62890/ 173500 | consumed samples: 16099840 | consumed tokens: 32972472320 | elapsed time per iteration (s): 0.43 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.990627E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.220 | TFLOPs: 31.60 | +7: iteration 62900/ 173500 | consumed samples: 16102400 | consumed tokens: 32977715200 | elapsed time per iteration (s): 0.42 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.992387E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.773 | TFLOPs: 31.63 | +7: iteration 62910/ 173500 | consumed samples: 16104960 | consumed tokens: 32982958080 | elapsed time per iteration (s): 0.43 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.989807E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.639 | TFLOPs: 31.36 | +7: iteration 62920/ 173500 | consumed samples: 16107520 | consumed tokens: 32988200960 | elapsed time per iteration (s): 0.43 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.994534E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.398 | TFLOPs: 31.45 | +7: iteration 62930/ 173500 | consumed samples: 16110080 | consumed tokens: 32993443840 | elapsed time per iteration (s): 0.43 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.978006E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.216 | TFLOPs: 31.49 | +7: iteration 62940/ 173500 | consumed samples: 16112640 | consumed tokens: 32998686720 | elapsed time per iteration (s): 0.43 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.983115E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.781 | TFLOPs: 31.47 | +7: iteration 62950/ 173500 | consumed samples: 16115200 | consumed tokens: 33003929600 | elapsed time per iteration (s): 0.42 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.979422E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.394 | TFLOPs: 31.66 | +7: iteration 62960/ 173500 | consumed samples: 16117760 | consumed tokens: 33009172480 | elapsed time per iteration (s): 0.42 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.992499E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.579 | TFLOPs: 31.93 | +7: iteration 62970/ 173500 | consumed samples: 16120320 | consumed tokens: 33014415360 | elapsed time per iteration (s): 0.43 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.996129E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.982 | TFLOPs: 31.48 | +7: iteration 62980/ 173500 | consumed samples: 16122880 | consumed tokens: 33019658240 | elapsed time per iteration (s): 0.42 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.997084E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.947 | TFLOPs: 31.74 | +7: iteration 62990/ 173500 | consumed samples: 16125440 | consumed tokens: 33024901120 | elapsed time per iteration (s): 0.42 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.985004E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.325 | TFLOPs: 31.76 | +7: iteration 63000/ 173500 | consumed samples: 16128000 | consumed tokens: 33030144000 | elapsed time per iteration (s): 0.42 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.989814E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.709 | TFLOPs: 31.68 | +7: iteration 63010/ 173500 | consumed samples: 16130560 | consumed tokens: 33035386880 | elapsed time per iteration (s): 0.43 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.987035E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.426 | TFLOPs: 31.56 | +7: iteration 63020/ 173500 | consumed samples: 16133120 | consumed tokens: 33040629760 | elapsed time per iteration (s): 0.42 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.977725E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.340 | TFLOPs: 31.71 | +7: iteration 63030/ 173500 | consumed samples: 16135680 | consumed tokens: 33045872640 | elapsed time per iteration (s): 0.42 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.975410E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.154 | TFLOPs: 31.75 | +7: iteration 63040/ 173500 | consumed samples: 16138240 | consumed tokens: 33051115520 | elapsed time per iteration (s): 0.42 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.992296E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.541 | TFLOPs: 31.61 | +7: iteration 63050/ 173500 | consumed samples: 16140800 | consumed tokens: 33056358400 | elapsed time per iteration (s): 0.43 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.987164E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.936 | TFLOPs: 30.95 | +7: iteration 63060/ 173500 | consumed samples: 16143360 | consumed tokens: 33061601280 | elapsed time per iteration (s): 0.43 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.985658E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.512 | TFLOPs: 31.19 | +7: iteration 63070/ 173500 | consumed samples: 16145920 | consumed tokens: 33066844160 | elapsed time per iteration (s): 0.42 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.988293E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.138 | TFLOPs: 31.86 | +7: iteration 63080/ 173500 | consumed samples: 16148480 | consumed tokens: 33072087040 | elapsed time per iteration (s): 0.42 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.983190E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.412 | TFLOPs: 31.97 | +7: iteration 63090/ 173500 | consumed samples: 16151040 | consumed tokens: 33077329920 | elapsed time per iteration (s): 0.42 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.990527E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.545 | TFLOPs: 31.72 | +7: iteration 63100/ 173500 | consumed samples: 16153600 | consumed tokens: 33082572800 | elapsed time per iteration (s): 0.42 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.993992E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.301 | TFLOPs: 31.97 | +7: iteration 63110/ 173500 | consumed samples: 16156160 | consumed tokens: 33087815680 | elapsed time per iteration (s): 0.42 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.991616E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.683 | TFLOPs: 31.73 | +7: iteration 63120/ 173500 | consumed samples: 16158720 | consumed tokens: 33093058560 | elapsed time per iteration (s): 0.42 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.991666E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.200 | TFLOPs: 31.65 | +7: iteration 63130/ 173500 | consumed samples: 16161280 | consumed tokens: 33098301440 | elapsed time per iteration (s): 0.44 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.984896E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.516 | TFLOPs: 30.51 | +7: iteration 63140/ 173500 | consumed samples: 16163840 | consumed tokens: 33103544320 | elapsed time per iteration (s): 0.43 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.977647E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.292 | TFLOPs: 31.55 | +7: iteration 63150/ 173500 | consumed samples: 16166400 | consumed tokens: 33108787200 | elapsed time per iteration (s): 0.43 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.999671E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.892 | TFLOPs: 31.58 | +7: iteration 63160/ 173500 | consumed samples: 16168960 | consumed tokens: 33114030080 | elapsed time per iteration (s): 0.43 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 3.002343E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.992 | TFLOPs: 31.43 | +7: iteration 63170/ 173500 | consumed samples: 16171520 | consumed tokens: 33119272960 | elapsed time per iteration (s): 0.42 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.991271E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.841 | TFLOPs: 31.68 | +7: iteration 63180/ 173500 | consumed samples: 16174080 | consumed tokens: 33124515840 | elapsed time per iteration (s): 0.43 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.987580E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.182 | TFLOPs: 31.54 | +7: iteration 63190/ 173500 | consumed samples: 16176640 | consumed tokens: 33129758720 | elapsed time per iteration (s): 0.43 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.982852E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.402 | TFLOPs: 31.45 | +7: iteration 63200/ 173500 | consumed samples: 16179200 | consumed tokens: 33135001600 | elapsed time per iteration (s): 0.42 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.989851E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.601 | TFLOPs: 31.62 | +7: iteration 63210/ 173500 | consumed samples: 16181760 | consumed tokens: 33140244480 | elapsed time per iteration (s): 0.43 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.994571E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.157 | TFLOPs: 31.28 | +7: iteration 63220/ 173500 | consumed samples: 16184320 | consumed tokens: 33145487360 | elapsed time per iteration (s): 0.45 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.982857E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.396 | TFLOPs: 30.09 | +7: iteration 63230/ 173500 | consumed samples: 16186880 | consumed tokens: 33150730240 | elapsed time per iteration (s): 0.43 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.998195E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.300 | TFLOPs: 31.55 | +7: iteration 63240/ 173500 | consumed samples: 16189440 | consumed tokens: 33155973120 | elapsed time per iteration (s): 0.43 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.990309E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.187 | TFLOPs: 31.23 | +7: iteration 63250/ 173500 | consumed samples: 16192000 | consumed tokens: 33161216000 | elapsed time per iteration (s): 0.43 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.974751E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.357 | TFLOPs: 31.29 | +7: iteration 63260/ 173500 | consumed samples: 16194560 | consumed tokens: 33166458880 | elapsed time per iteration (s): 0.44 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.987028E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.662 | TFLOPs: 30.73 | +7: iteration 63270/ 173500 | consumed samples: 16197120 | consumed tokens: 33171701760 | elapsed time per iteration (s): 0.42 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.974441E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.393 | TFLOPs: 31.66 | +7: iteration 63280/ 173500 | consumed samples: 16199680 | consumed tokens: 33176944640 | elapsed time per iteration (s): 0.44 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 3.005424E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.230 | TFLOPs: 30.23 | +7: iteration 63290/ 173500 | consumed samples: 16202240 | consumed tokens: 33182187520 | elapsed time per iteration (s): 0.46 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.988680E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.627 | TFLOPs: 29.31 | +7: iteration 63300/ 173500 | consumed samples: 16204800 | consumed tokens: 33187430400 | elapsed time per iteration (s): 0.42 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.989992E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.205 | TFLOPs: 32.07 | +7: iteration 63310/ 173500 | consumed samples: 16207360 | consumed tokens: 33192673280 | elapsed time per iteration (s): 0.43 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.992374E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.745 | TFLOPs: 31.47 | +7: iteration 63320/ 173500 | consumed samples: 16209920 | consumed tokens: 33197916160 | elapsed time per iteration (s): 0.43 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.975961E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.735 | TFLOPs: 31.41 | +7: iteration 63330/ 173500 | consumed samples: 16212480 | consumed tokens: 33203159040 | elapsed time per iteration (s): 0.43 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.996057E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.191 | TFLOPs: 30.97 | +7: iteration 63340/ 173500 | consumed samples: 16215040 | consumed tokens: 33208401920 | elapsed time per iteration (s): 0.43 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.976618E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.891 | TFLOPs: 31.27 | +7: iteration 63350/ 173500 | consumed samples: 16217600 | consumed tokens: 33213644800 | elapsed time per iteration (s): 0.43 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.989985E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.209 | TFLOPs: 31.44 | +7: iteration 63360/ 173500 | consumed samples: 16220160 | consumed tokens: 33218887680 | elapsed time per iteration (s): 0.44 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 3.008550E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.885 | TFLOPs: 30.85 | +7: iteration 63370/ 173500 | consumed samples: 16222720 | consumed tokens: 33224130560 | elapsed time per iteration (s): 0.47 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.982211E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 544.798 | TFLOPs: 28.58 | +7: iteration 63380/ 173500 | consumed samples: 16225280 | consumed tokens: 33229373440 | elapsed time per iteration (s): 0.48 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.979326E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.633 | TFLOPs: 28.00 | +7: iteration 63390/ 173500 | consumed samples: 16227840 | consumed tokens: 33234616320 | elapsed time per iteration (s): 0.45 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.982300E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.039 | TFLOPs: 30.12 | +7: iteration 63400/ 173500 | consumed samples: 16230400 | consumed tokens: 33239859200 | elapsed time per iteration (s): 0.45 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.983313E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.994 | TFLOPs: 29.54 | +7: iteration 63410/ 173500 | consumed samples: 16232960 | consumed tokens: 33245102080 | elapsed time per iteration (s): 0.44 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.992000E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.508 | TFLOPs: 30.51 | +7: iteration 63420/ 173500 | consumed samples: 16235520 | consumed tokens: 33250344960 | elapsed time per iteration (s): 0.44 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 3.004506E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.500 | TFLOPs: 30.72 | +7: iteration 63430/ 173500 | consumed samples: 16238080 | consumed tokens: 33255587840 | elapsed time per iteration (s): 0.42 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 3.002457E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.664 | TFLOPs: 32.04 | +7: iteration 63440/ 173500 | consumed samples: 16240640 | consumed tokens: 33260830720 | elapsed time per iteration (s): 0.42 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.986144E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.934 | TFLOPs: 31.90 | +7: iteration 63450/ 173500 | consumed samples: 16243200 | consumed tokens: 33266073600 | elapsed time per iteration (s): 0.42 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.984306E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.433 | TFLOPs: 31.87 | +7: iteration 63460/ 173500 | consumed samples: 16245760 | consumed tokens: 33271316480 | elapsed time per iteration (s): 0.42 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.993384E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.629 | TFLOPs: 32.09 | +7: iteration 63470/ 173500 | consumed samples: 16248320 | consumed tokens: 33276559360 | elapsed time per iteration (s): 0.42 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.973964E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.701 | TFLOPs: 31.89 | +7: iteration 63480/ 173500 | consumed samples: 16250880 | consumed tokens: 33281802240 | elapsed time per iteration (s): 0.42 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.980829E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.807 | TFLOPs: 31.89 | +7: iteration 63490/ 173500 | consumed samples: 16253440 | consumed tokens: 33287045120 | elapsed time per iteration (s): 0.43 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.987426E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.949 | TFLOPs: 31.53 | +7: iteration 63500/ 173500 | consumed samples: 16256000 | consumed tokens: 33292288000 | elapsed time per iteration (s): 0.42 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.992958E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.588 | TFLOPs: 31.83 | +7: iteration 63510/ 173500 | consumed samples: 16258560 | consumed tokens: 33297530880 | elapsed time per iteration (s): 0.42 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.985364E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.717 | TFLOPs: 32.04 | +7: iteration 63520/ 173500 | consumed samples: 16261120 | consumed tokens: 33302773760 | elapsed time per iteration (s): 0.42 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.976588E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.290 | TFLOPs: 32.02 | +7: iteration 63530/ 173500 | consumed samples: 16263680 | consumed tokens: 33308016640 | elapsed time per iteration (s): 0.43 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.982993E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.763 | TFLOPs: 31.57 | +7: iteration 63540/ 173500 | consumed samples: 16266240 | consumed tokens: 33313259520 | elapsed time per iteration (s): 0.43 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.988808E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.572 | TFLOPs: 31.46 | +7: iteration 63550/ 173500 | consumed samples: 16268800 | consumed tokens: 33318502400 | elapsed time per iteration (s): 0.42 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.984833E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.233 | TFLOPs: 32.02 | +7: iteration 63560/ 173500 | consumed samples: 16271360 | consumed tokens: 33323745280 | elapsed time per iteration (s): 0.42 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.985638E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.318 | TFLOPs: 31.71 | +7: iteration 63570/ 173500 | consumed samples: 16273920 | consumed tokens: 33328988160 | elapsed time per iteration (s): 0.42 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.989307E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.015 | TFLOPs: 31.64 | +7: iteration 63580/ 173500 | consumed samples: 16276480 | consumed tokens: 33334231040 | elapsed time per iteration (s): 0.43 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.981808E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.595 | TFLOPs: 30.99 | +7: iteration 63590/ 173500 | consumed samples: 16279040 | consumed tokens: 33339473920 | elapsed time per iteration (s): 0.42 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.985859E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.808 | TFLOPs: 32.00 | +7: iteration 63600/ 173500 | consumed samples: 16281600 | consumed tokens: 33344716800 | elapsed time per iteration (s): 0.42 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.999134E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.456 | TFLOPs: 31.98 | +7: iteration 63610/ 173500 | consumed samples: 16284160 | consumed tokens: 33349959680 | elapsed time per iteration (s): 0.42 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 3.002604E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.577 | TFLOPs: 31.67 | +7: iteration 63620/ 173500 | consumed samples: 16286720 | consumed tokens: 33355202560 | elapsed time per iteration (s): 0.43 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.997077E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.931 | TFLOPs: 31.37 | +7: iteration 63630/ 173500 | consumed samples: 16289280 | consumed tokens: 33360445440 | elapsed time per iteration (s): 0.42 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.980659E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.936 | TFLOPs: 31.84 | +7: iteration 63640/ 173500 | consumed samples: 16291840 | consumed tokens: 33365688320 | elapsed time per iteration (s): 0.42 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.979282E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.230 | TFLOPs: 31.86 | +7: iteration 63650/ 173500 | consumed samples: 16294400 | consumed tokens: 33370931200 | elapsed time per iteration (s): 0.42 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.979474E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.159 | TFLOPs: 31.80 | +7: iteration 63660/ 173500 | consumed samples: 16296960 | consumed tokens: 33376174080 | elapsed time per iteration (s): 0.42 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.997151E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.129 | TFLOPs: 31.86 | +7: iteration 63670/ 173500 | consumed samples: 16299520 | consumed tokens: 33381416960 | elapsed time per iteration (s): 0.42 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.986976E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.888 | TFLOPs: 31.79 | +7: iteration 63680/ 173500 | consumed samples: 16302080 | consumed tokens: 33386659840 | elapsed time per iteration (s): 0.42 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.992574E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.049 | TFLOPs: 31.80 | +7: iteration 63690/ 173500 | consumed samples: 16304640 | consumed tokens: 33391902720 | elapsed time per iteration (s): 0.42 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 3.001740E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.473 | TFLOPs: 31.82 | +7: iteration 63700/ 173500 | consumed samples: 16307200 | consumed tokens: 33397145600 | elapsed time per iteration (s): 0.43 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.977280E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.851 | TFLOPs: 31.42 | +7: iteration 63710/ 173500 | consumed samples: 16309760 | consumed tokens: 33402388480 | elapsed time per iteration (s): 0.42 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.981662E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.198 | TFLOPs: 31.81 | +7: iteration 63720/ 173500 | consumed samples: 16312320 | consumed tokens: 33407631360 | elapsed time per iteration (s): 0.42 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.991711E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.443 | TFLOPs: 31.92 | +7: iteration 63730/ 173500 | consumed samples: 16314880 | consumed tokens: 33412874240 | elapsed time per iteration (s): 0.43 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.977206E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.971 | TFLOPs: 31.27 | +7: iteration 63740/ 173500 | consumed samples: 16317440 | consumed tokens: 33418117120 | elapsed time per iteration (s): 0.42 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.987966E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.573 | TFLOPs: 31.98 | +7: iteration 63750/ 173500 | consumed samples: 16320000 | consumed tokens: 33423360000 | elapsed time per iteration (s): 0.43 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 3.001366E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.930 | TFLOPs: 31.48 | +7: iteration 63760/ 173500 | consumed samples: 16322560 | consumed tokens: 33428602880 | elapsed time per iteration (s): 0.42 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.999908E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.179 | TFLOPs: 31.96 | +7: iteration 63770/ 173500 | consumed samples: 16325120 | consumed tokens: 33433845760 | elapsed time per iteration (s): 0.42 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.995148E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.029 | TFLOPs: 31.64 | +7: iteration 63780/ 173500 | consumed samples: 16327680 | consumed tokens: 33439088640 | elapsed time per iteration (s): 0.42 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.991186E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.171 | TFLOPs: 31.96 | +7: iteration 63790/ 173500 | consumed samples: 16330240 | consumed tokens: 33444331520 | elapsed time per iteration (s): 0.42 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.992380E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.038 | TFLOPs: 31.64 | +7: iteration 63800/ 173500 | consumed samples: 16332800 | consumed tokens: 33449574400 | elapsed time per iteration (s): 0.42 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.990438E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.812 | TFLOPs: 31.84 | +7: iteration 63810/ 173500 | consumed samples: 16335360 | consumed tokens: 33454817280 | elapsed time per iteration (s): 0.42 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.979594E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.107 | TFLOPs: 31.75 | +7: iteration 63820/ 173500 | consumed samples: 16337920 | consumed tokens: 33460060160 | elapsed time per iteration (s): 0.42 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 3.003877E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.068 | TFLOPs: 31.80 | +7: iteration 63830/ 173500 | consumed samples: 16340480 | consumed tokens: 33465303040 | elapsed time per iteration (s): 0.42 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.992547E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.122 | TFLOPs: 31.85 | +7: iteration 63840/ 173500 | consumed samples: 16343040 | consumed tokens: 33470545920 | elapsed time per iteration (s): 0.42 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.980146E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.852 | TFLOPs: 31.63 | +7: iteration 63850/ 173500 | consumed samples: 16345600 | consumed tokens: 33475788800 | elapsed time per iteration (s): 0.45 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.977066E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.681 | TFLOPs: 29.84 | +7: iteration 63860/ 173500 | consumed samples: 16348160 | consumed tokens: 33481031680 | elapsed time per iteration (s): 0.44 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.989245E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.984 | TFLOPs: 30.54 | +7: iteration 63870/ 173500 | consumed samples: 16350720 | consumed tokens: 33486274560 | elapsed time per iteration (s): 0.43 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.983367E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.054 | TFLOPs: 31.17 | +7: iteration 63880/ 173500 | consumed samples: 16353280 | consumed tokens: 33491517440 | elapsed time per iteration (s): 0.43 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.986056E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.898 | TFLOPs: 31.58 | +7: iteration 63890/ 173500 | consumed samples: 16355840 | consumed tokens: 33496760320 | elapsed time per iteration (s): 0.44 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.982920E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.418 | TFLOPs: 30.66 | +7: iteration 63900/ 173500 | consumed samples: 16358400 | consumed tokens: 33502003200 | elapsed time per iteration (s): 0.42 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.975172E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.532 | TFLOPs: 31.72 | +7: iteration 63910/ 173500 | consumed samples: 16360960 | consumed tokens: 33507246080 | elapsed time per iteration (s): 0.42 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.990132E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.917 | TFLOPs: 31.74 | +7: iteration 63920/ 173500 | consumed samples: 16363520 | consumed tokens: 33512488960 | elapsed time per iteration (s): 0.43 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 3.004939E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.808 | TFLOPs: 31.52 | +7: iteration 63930/ 173500 | consumed samples: 16366080 | consumed tokens: 33517731840 | elapsed time per iteration (s): 0.43 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.986368E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.198 | TFLOPs: 31.54 | +7: iteration 63940/ 173500 | consumed samples: 16368640 | consumed tokens: 33522974720 | elapsed time per iteration (s): 0.43 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.987320E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.393 | TFLOPs: 31.13 | +7: iteration 63950/ 173500 | consumed samples: 16371200 | consumed tokens: 33528217600 | elapsed time per iteration (s): 0.43 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.988296E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.446 | TFLOPs: 30.98 | +7: iteration 63960/ 173500 | consumed samples: 16373760 | consumed tokens: 33533460480 | elapsed time per iteration (s): 0.43 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.987892E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.688 | TFLOPs: 30.99 | +7: iteration 63970/ 173500 | consumed samples: 16376320 | consumed tokens: 33538703360 | elapsed time per iteration (s): 0.43 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.996167E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.901 | TFLOPs: 31.42 | +7: iteration 63980/ 173500 | consumed samples: 16378880 | consumed tokens: 33543946240 | elapsed time per iteration (s): 0.46 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.987381E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.512 | TFLOPs: 29.46 | +7: iteration 63990/ 173500 | consumed samples: 16381440 | consumed tokens: 33549189120 | elapsed time per iteration (s): 0.43 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.980062E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.491 | TFLOPs: 31.03 | +0: [2023-03-17 06:46:50,385] [INFO] [logging.py:68:log_dist] [Rank 0] step=64000, skipped=0, lr=[0.0001476794025098283, 0.0001476794025098283, 0.0001476794025098283], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 64000/ 173500 | consumed samples: 16384000 | consumed tokens: 33554432000 | elapsed time per iteration (s): 0.43 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.984485E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.728 | TFLOPs: 31.20 | +0: steps: 64000 loss: 3.0168 iter time (s): 0.425 samples/sec: 602.822 +7: iteration 64010/ 173500 | consumed samples: 16386560 | consumed tokens: 33559674880 | elapsed time per iteration (s): 0.44 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.999743E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.088 | TFLOPs: 30.86 | +7: iteration 64020/ 173500 | consumed samples: 16389120 | consumed tokens: 33564917760 | elapsed time per iteration (s): 0.43 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.993042E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.930 | TFLOPs: 31.22 | +7: iteration 64030/ 173500 | consumed samples: 16391680 | consumed tokens: 33570160640 | elapsed time per iteration (s): 0.43 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.982298E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.855 | TFLOPs: 31.21 | +7: iteration 64040/ 173500 | consumed samples: 16394240 | consumed tokens: 33575403520 | elapsed time per iteration (s): 0.43 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.991660E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.039 | TFLOPs: 31.27 | +7: iteration 64050/ 173500 | consumed samples: 16396800 | consumed tokens: 33580646400 | elapsed time per iteration (s): 0.45 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.993141E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.608 | TFLOPs: 29.78 | +7: iteration 64060/ 173500 | consumed samples: 16399360 | consumed tokens: 33585889280 | elapsed time per iteration (s): 0.43 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.981552E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.433 | TFLOPs: 30.98 | +7: iteration 64070/ 173500 | consumed samples: 16401920 | consumed tokens: 33591132160 | elapsed time per iteration (s): 0.44 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.977888E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.793 | TFLOPs: 30.58 | +7: iteration 64080/ 173500 | consumed samples: 16404480 | consumed tokens: 33596375040 | elapsed time per iteration (s): 0.45 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.978639E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.496 | TFLOPs: 29.67 | +7: iteration 64090/ 173500 | consumed samples: 16407040 | consumed tokens: 33601617920 | elapsed time per iteration (s): 0.43 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.982804E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.524 | TFLOPs: 31.14 | +7: iteration 64100/ 173500 | consumed samples: 16409600 | consumed tokens: 33606860800 | elapsed time per iteration (s): 0.43 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.979991E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.880 | TFLOPs: 31.37 | +7: iteration 64110/ 173500 | consumed samples: 16412160 | consumed tokens: 33612103680 | elapsed time per iteration (s): 0.42 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.993189E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.517 | TFLOPs: 31.61 | +7: iteration 64120/ 173500 | consumed samples: 16414720 | consumed tokens: 33617346560 | elapsed time per iteration (s): 0.44 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.996525E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.860 | TFLOPs: 30.84 | +7: iteration 64130/ 173500 | consumed samples: 16417280 | consumed tokens: 33622589440 | elapsed time per iteration (s): 0.43 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.987322E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.617 | TFLOPs: 31.46 | +7: iteration 64140/ 173500 | consumed samples: 16419840 | consumed tokens: 33627832320 | elapsed time per iteration (s): 0.43 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.995847E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.327 | TFLOPs: 31.39 | +7: iteration 64150/ 173500 | consumed samples: 16422400 | consumed tokens: 33633075200 | elapsed time per iteration (s): 0.43 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.997618E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.877 | TFLOPs: 31.05 | +7: iteration 64160/ 173500 | consumed samples: 16424960 | consumed tokens: 33638318080 | elapsed time per iteration (s): 0.43 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.989476E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.878 | TFLOPs: 31.05 | +7: iteration 64170/ 173500 | consumed samples: 16427520 | consumed tokens: 33643560960 | elapsed time per iteration (s): 0.43 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.981662E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.440 | TFLOPs: 31.03 | +7: iteration 64180/ 173500 | consumed samples: 16430080 | consumed tokens: 33648803840 | elapsed time per iteration (s): 0.42 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.997939E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.084 | TFLOPs: 31.64 | +7: iteration 64190/ 173500 | consumed samples: 16432640 | consumed tokens: 33654046720 | elapsed time per iteration (s): 0.44 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.976550E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.885 | TFLOPs: 30.64 | +7: iteration 64200/ 173500 | consumed samples: 16435200 | consumed tokens: 33659289600 | elapsed time per iteration (s): 0.42 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.980674E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.889 | TFLOPs: 31.69 | +7: iteration 64210/ 173500 | consumed samples: 16437760 | consumed tokens: 33664532480 | elapsed time per iteration (s): 0.43 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.973496E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.302 | TFLOPs: 31.23 | +7: iteration 64220/ 173500 | consumed samples: 16440320 | consumed tokens: 33669775360 | elapsed time per iteration (s): 0.43 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.988955E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.039 | TFLOPs: 31.54 | +7: iteration 64230/ 173500 | consumed samples: 16442880 | consumed tokens: 33675018240 | elapsed time per iteration (s): 0.44 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.979431E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.905 | TFLOPs: 30.79 | +7: iteration 64240/ 173500 | consumed samples: 16445440 | consumed tokens: 33680261120 | elapsed time per iteration (s): 0.45 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.974592E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.211 | TFLOPs: 30.18 | +7: iteration 64250/ 173500 | consumed samples: 16448000 | consumed tokens: 33685504000 | elapsed time per iteration (s): 0.45 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.986936E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.822 | TFLOPs: 29.90 | +7: iteration 64260/ 173500 | consumed samples: 16450560 | consumed tokens: 33690746880 | elapsed time per iteration (s): 0.44 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.992733E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.603 | TFLOPs: 30.78 | +7: iteration 64270/ 173500 | consumed samples: 16453120 | consumed tokens: 33695989760 | elapsed time per iteration (s): 0.43 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.982110E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.051 | TFLOPs: 31.01 | +7: iteration 64280/ 173500 | consumed samples: 16455680 | consumed tokens: 33701232640 | elapsed time per iteration (s): 0.43 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.986169E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.411 | TFLOPs: 30.98 | +7: iteration 64290/ 173500 | consumed samples: 16458240 | consumed tokens: 33706475520 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.985194E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.674 | TFLOPs: 31.52 | +7: iteration 64300/ 173500 | consumed samples: 16460800 | consumed tokens: 33711718400 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.992590E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.283 | TFLOPs: 31.13 | +7: iteration 64310/ 173500 | consumed samples: 16463360 | consumed tokens: 33716961280 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.990552E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.904 | TFLOPs: 31.11 | +7: iteration 64320/ 173500 | consumed samples: 16465920 | consumed tokens: 33722204160 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.972543E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.003 | TFLOPs: 31.43 | +7: iteration 64330/ 173500 | consumed samples: 16468480 | consumed tokens: 33727447040 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.981170E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.739 | TFLOPs: 31.41 | +7: iteration 64340/ 173500 | consumed samples: 16471040 | consumed tokens: 33732689920 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.995141E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.996 | TFLOPs: 31.17 | +7: iteration 64350/ 173500 | consumed samples: 16473600 | consumed tokens: 33737932800 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 3.002851E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.646 | TFLOPs: 31.15 | +7: iteration 64360/ 173500 | consumed samples: 16476160 | consumed tokens: 33743175680 | elapsed time per iteration (s): 0.44 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.983636E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.685 | TFLOPs: 30.78 | +7: iteration 64370/ 173500 | consumed samples: 16478720 | consumed tokens: 33748418560 | elapsed time per iteration (s): 0.43 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.991436E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.609 | TFLOPs: 31.09 | +7: iteration 64380/ 173500 | consumed samples: 16481280 | consumed tokens: 33753661440 | elapsed time per iteration (s): 0.43 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.977213E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.880 | TFLOPs: 31.47 | +7: iteration 64390/ 173500 | consumed samples: 16483840 | consumed tokens: 33758904320 | elapsed time per iteration (s): 0.43 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.983390E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.290 | TFLOPs: 31.23 | +7: iteration 64400/ 173500 | consumed samples: 16486400 | consumed tokens: 33764147200 | elapsed time per iteration (s): 0.43 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.982561E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.573 | TFLOPs: 31.14 | +7: iteration 64410/ 173500 | consumed samples: 16488960 | consumed tokens: 33769390080 | elapsed time per iteration (s): 0.42 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.990402E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.043 | TFLOPs: 32.01 | +7: iteration 64420/ 173500 | consumed samples: 16491520 | consumed tokens: 33774632960 | elapsed time per iteration (s): 0.43 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.985481E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.723 | TFLOPs: 30.99 | +7: iteration 64430/ 173500 | consumed samples: 16494080 | consumed tokens: 33779875840 | elapsed time per iteration (s): 0.43 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.978500E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.095 | TFLOPs: 31.17 | +7: iteration 64440/ 173500 | consumed samples: 16496640 | consumed tokens: 33785118720 | elapsed time per iteration (s): 0.43 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.983518E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.707 | TFLOPs: 30.99 | +7: iteration 64450/ 173500 | consumed samples: 16499200 | consumed tokens: 33790361600 | elapsed time per iteration (s): 0.43 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.979306E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.912 | TFLOPs: 30.95 | +7: iteration 64460/ 173500 | consumed samples: 16501760 | consumed tokens: 33795604480 | elapsed time per iteration (s): 0.44 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.985439E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.199 | TFLOPs: 30.65 | +7: iteration 64470/ 173500 | consumed samples: 16504320 | consumed tokens: 33800847360 | elapsed time per iteration (s): 0.44 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.971803E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.006 | TFLOPs: 30.43 | +7: iteration 64480/ 173500 | consumed samples: 16506880 | consumed tokens: 33806090240 | elapsed time per iteration (s): 0.44 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.985769E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.304 | TFLOPs: 30.76 | +7: iteration 64490/ 173500 | consumed samples: 16509440 | consumed tokens: 33811333120 | elapsed time per iteration (s): 0.43 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.990540E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.298 | TFLOPs: 31.60 | +7: iteration 64500/ 173500 | consumed samples: 16512000 | consumed tokens: 33816576000 | elapsed time per iteration (s): 0.43 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.987878E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.498 | TFLOPs: 31.14 | +7: iteration 64510/ 173500 | consumed samples: 16514560 | consumed tokens: 33821818880 | elapsed time per iteration (s): 0.42 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.982995E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.489 | TFLOPs: 31.61 | +7: iteration 64520/ 173500 | consumed samples: 16517120 | consumed tokens: 33827061760 | elapsed time per iteration (s): 0.42 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.981684E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.445 | TFLOPs: 31.66 | +7: iteration 64530/ 173500 | consumed samples: 16519680 | consumed tokens: 33832304640 | elapsed time per iteration (s): 0.44 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.998536E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.949 | TFLOPs: 30.69 | +7: iteration 64540/ 173500 | consumed samples: 16522240 | consumed tokens: 33837547520 | elapsed time per iteration (s): 0.43 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.989158E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.381 | TFLOPs: 31.13 | +7: iteration 64550/ 173500 | consumed samples: 16524800 | consumed tokens: 33842790400 | elapsed time per iteration (s): 0.44 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.993949E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.578 | TFLOPs: 30.57 | +7: iteration 64560/ 173500 | consumed samples: 16527360 | consumed tokens: 33848033280 | elapsed time per iteration (s): 0.44 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.984438E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.954 | TFLOPs: 30.80 | +7: iteration 64570/ 173500 | consumed samples: 16529920 | consumed tokens: 33853276160 | elapsed time per iteration (s): 0.43 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.991762E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.246 | TFLOPs: 31.60 | +7: iteration 64580/ 173500 | consumed samples: 16532480 | consumed tokens: 33858519040 | elapsed time per iteration (s): 0.43 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.986911E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.433 | TFLOPs: 30.93 | +7: iteration 64590/ 173500 | consumed samples: 16535040 | consumed tokens: 33863761920 | elapsed time per iteration (s): 0.45 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.996266E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.274 | TFLOPs: 30.03 | +7: iteration 64600/ 173500 | consumed samples: 16537600 | consumed tokens: 33869004800 | elapsed time per iteration (s): 0.44 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.976738E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.156 | TFLOPs: 30.60 | +7: iteration 64610/ 173500 | consumed samples: 16540160 | consumed tokens: 33874247680 | elapsed time per iteration (s): 0.43 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.978926E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.094 | TFLOPs: 31.22 | +7: iteration 64620/ 173500 | consumed samples: 16542720 | consumed tokens: 33879490560 | elapsed time per iteration (s): 0.43 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.997211E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.048 | TFLOPs: 31.12 | +7: iteration 64630/ 173500 | consumed samples: 16545280 | consumed tokens: 33884733440 | elapsed time per iteration (s): 0.43 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.985485E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.962 | TFLOPs: 31.06 | +7: iteration 64640/ 173500 | consumed samples: 16547840 | consumed tokens: 33889976320 | elapsed time per iteration (s): 0.43 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.972944E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.763 | TFLOPs: 31.26 | +7: iteration 64650/ 173500 | consumed samples: 16550400 | consumed tokens: 33895219200 | elapsed time per iteration (s): 0.43 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.981557E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.812 | TFLOPs: 31.52 | +7: iteration 64660/ 173500 | consumed samples: 16552960 | consumed tokens: 33900462080 | elapsed time per iteration (s): 0.43 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.980145E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.875 | TFLOPs: 31.21 | +7: iteration 64670/ 173500 | consumed samples: 16555520 | consumed tokens: 33905704960 | elapsed time per iteration (s): 0.43 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.986287E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.445 | TFLOPs: 31.29 | +7: iteration 64680/ 173500 | consumed samples: 16558080 | consumed tokens: 33910947840 | elapsed time per iteration (s): 0.43 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.979464E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.435 | TFLOPs: 31.08 | +7: iteration 64690/ 173500 | consumed samples: 16560640 | consumed tokens: 33916190720 | elapsed time per iteration (s): 0.43 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.971070E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.898 | TFLOPs: 31.53 | +7: iteration 64700/ 173500 | consumed samples: 16563200 | consumed tokens: 33921433600 | elapsed time per iteration (s): 0.43 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.985395E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.680 | TFLOPs: 30.94 | +7: iteration 64710/ 173500 | consumed samples: 16565760 | consumed tokens: 33926676480 | elapsed time per iteration (s): 0.43 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.996181E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.733 | TFLOPs: 31.05 | +7: iteration 64720/ 173500 | consumed samples: 16568320 | consumed tokens: 33931919360 | elapsed time per iteration (s): 0.43 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.988858E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.153 | TFLOPs: 31.28 | +7: iteration 64730/ 173500 | consumed samples: 16570880 | consumed tokens: 33937162240 | elapsed time per iteration (s): 0.43 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.993795E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.445 | TFLOPs: 31.24 | +7: iteration 64740/ 173500 | consumed samples: 16573440 | consumed tokens: 33942405120 | elapsed time per iteration (s): 0.43 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.999396E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.564 | TFLOPs: 30.99 | +7: iteration 64750/ 173500 | consumed samples: 16576000 | consumed tokens: 33947648000 | elapsed time per iteration (s): 0.43 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.976533E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.797 | TFLOPs: 31.16 | +7: iteration 64760/ 173500 | consumed samples: 16578560 | consumed tokens: 33952890880 | elapsed time per iteration (s): 0.44 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.987081E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.994 | TFLOPs: 30.80 | +7: iteration 64770/ 173500 | consumed samples: 16581120 | consumed tokens: 33958133760 | elapsed time per iteration (s): 0.44 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.983197E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.700 | TFLOPs: 30.84 | +7: iteration 64780/ 173500 | consumed samples: 16583680 | consumed tokens: 33963376640 | elapsed time per iteration (s): 0.42 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.988331E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.287 | TFLOPs: 31.65 | +7: iteration 64790/ 173500 | consumed samples: 16586240 | consumed tokens: 33968619520 | elapsed time per iteration (s): 0.42 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.990323E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.331 | TFLOPs: 31.87 | +7: iteration 64800/ 173500 | consumed samples: 16588800 | consumed tokens: 33973862400 | elapsed time per iteration (s): 0.42 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.993289E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.405 | TFLOPs: 31.61 | +7: iteration 64810/ 173500 | consumed samples: 16591360 | consumed tokens: 33979105280 | elapsed time per iteration (s): 0.42 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.978715E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.612 | TFLOPs: 31.72 | +7: iteration 64820/ 173500 | consumed samples: 16593920 | consumed tokens: 33984348160 | elapsed time per iteration (s): 0.42 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 3.001150E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.373 | TFLOPs: 31.76 | +7: iteration 64830/ 173500 | consumed samples: 16596480 | consumed tokens: 33989591040 | elapsed time per iteration (s): 0.43 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.982342E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.162 | TFLOPs: 31.02 | +7: iteration 64840/ 173500 | consumed samples: 16599040 | consumed tokens: 33994833920 | elapsed time per iteration (s): 0.42 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.986484E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.577 | TFLOPs: 31.77 | +7: iteration 64850/ 173500 | consumed samples: 16601600 | consumed tokens: 34000076800 | elapsed time per iteration (s): 0.43 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.990455E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.973 | TFLOPs: 31.06 | +7: iteration 64860/ 173500 | consumed samples: 16604160 | consumed tokens: 34005319680 | elapsed time per iteration (s): 0.43 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.973450E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.031 | TFLOPs: 31.43 | +7: iteration 64870/ 173500 | consumed samples: 16606720 | consumed tokens: 34010562560 | elapsed time per iteration (s): 0.42 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.987430E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.080 | TFLOPs: 31.85 | +7: iteration 64880/ 173500 | consumed samples: 16609280 | consumed tokens: 34015805440 | elapsed time per iteration (s): 0.43 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.981692E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.692 | TFLOPs: 31.52 | +7: iteration 64890/ 173500 | consumed samples: 16611840 | consumed tokens: 34021048320 | elapsed time per iteration (s): 0.42 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.985009E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.692 | TFLOPs: 31.78 | +7: iteration 64900/ 173500 | consumed samples: 16614400 | consumed tokens: 34026291200 | elapsed time per iteration (s): 0.43 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.992025E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.516 | TFLOPs: 31.19 | +7: iteration 64910/ 173500 | consumed samples: 16616960 | consumed tokens: 34031534080 | elapsed time per iteration (s): 0.44 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.980165E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.620 | TFLOPs: 30.52 | +7: iteration 64920/ 173500 | consumed samples: 16619520 | consumed tokens: 34036776960 | elapsed time per iteration (s): 0.42 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.987945E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.384 | TFLOPs: 31.87 | +7: iteration 64930/ 173500 | consumed samples: 16622080 | consumed tokens: 34042019840 | elapsed time per iteration (s): 0.42 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.984330E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.270 | TFLOPs: 31.71 | +7: iteration 64940/ 173500 | consumed samples: 16624640 | consumed tokens: 34047262720 | elapsed time per iteration (s): 0.43 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.996956E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.402 | TFLOPs: 30.98 | +7: iteration 64950/ 173500 | consumed samples: 16627200 | consumed tokens: 34052505600 | elapsed time per iteration (s): 0.42 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.990230E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.551 | TFLOPs: 31.82 | +7: iteration 64960/ 173500 | consumed samples: 16629760 | consumed tokens: 34057748480 | elapsed time per iteration (s): 0.42 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.984523E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.315 | TFLOPs: 31.71 | +7: iteration 64970/ 173500 | consumed samples: 16632320 | consumed tokens: 34062991360 | elapsed time per iteration (s): 0.43 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.981989E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.199 | TFLOPs: 31.28 | +7: iteration 64980/ 173500 | consumed samples: 16634880 | consumed tokens: 34068234240 | elapsed time per iteration (s): 0.43 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.982389E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.563 | TFLOPs: 31.30 | +7: iteration 64990/ 173500 | consumed samples: 16637440 | consumed tokens: 34073477120 | elapsed time per iteration (s): 0.45 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.974330E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.562 | TFLOPs: 29.94 | +7: iteration 65000/ 173500 | consumed samples: 16640000 | consumed tokens: 34078720000 | elapsed time per iteration (s): 0.43 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 3.005009E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.550 | TFLOPs: 31.46 | +7: iteration 65010/ 173500 | consumed samples: 16642560 | consumed tokens: 34083962880 | elapsed time per iteration (s): 0.43 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 3.001576E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.182 | TFLOPs: 31.49 | +7: iteration 65020/ 173500 | consumed samples: 16645120 | consumed tokens: 34089205760 | elapsed time per iteration (s): 0.42 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.995784E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.786 | TFLOPs: 31.68 | +7: iteration 65030/ 173500 | consumed samples: 16647680 | consumed tokens: 34094448640 | elapsed time per iteration (s): 0.42 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.983830E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.700 | TFLOPs: 32.04 | +7: iteration 65040/ 173500 | consumed samples: 16650240 | consumed tokens: 34099691520 | elapsed time per iteration (s): 0.43 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.998865E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.432 | TFLOPs: 31.56 | +7: iteration 65050/ 173500 | consumed samples: 16652800 | consumed tokens: 34104934400 | elapsed time per iteration (s): 0.43 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.975197E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.681 | TFLOPs: 31.31 | +7: iteration 65060/ 173500 | consumed samples: 16655360 | consumed tokens: 34110177280 | elapsed time per iteration (s): 0.42 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.974223E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.102 | TFLOPs: 31.80 | +7: iteration 65070/ 173500 | consumed samples: 16657920 | consumed tokens: 34115420160 | elapsed time per iteration (s): 0.42 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 3.001831E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.508 | TFLOPs: 31.82 | +7: iteration 65080/ 173500 | consumed samples: 16660480 | consumed tokens: 34120663040 | elapsed time per iteration (s): 0.42 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.987677E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.271 | TFLOPs: 31.65 | +7: iteration 65090/ 173500 | consumed samples: 16663040 | consumed tokens: 34125905920 | elapsed time per iteration (s): 0.43 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.980789E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.309 | TFLOPs: 31.18 | +7: iteration 65100/ 173500 | consumed samples: 16665600 | consumed tokens: 34131148800 | elapsed time per iteration (s): 0.42 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.976630E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.217 | TFLOPs: 31.65 | +7: iteration 65110/ 173500 | consumed samples: 16668160 | consumed tokens: 34136391680 | elapsed time per iteration (s): 0.42 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.977899E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.709 | TFLOPs: 31.73 | +7: iteration 65120/ 173500 | consumed samples: 16670720 | consumed tokens: 34141634560 | elapsed time per iteration (s): 0.43 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.981018E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.938 | TFLOPs: 30.95 | +7: iteration 65130/ 173500 | consumed samples: 16673280 | consumed tokens: 34146877440 | elapsed time per iteration (s): 0.43 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.967800E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.424 | TFLOPs: 31.14 | +7: iteration 65140/ 173500 | consumed samples: 16675840 | consumed tokens: 34152120320 | elapsed time per iteration (s): 0.43 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.984194E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.219 | TFLOPs: 31.18 | +7: iteration 65150/ 173500 | consumed samples: 16678400 | consumed tokens: 34157363200 | elapsed time per iteration (s): 0.43 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.983335E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.738 | TFLOPs: 31.10 | +7: iteration 65160/ 173500 | consumed samples: 16680960 | consumed tokens: 34162606080 | elapsed time per iteration (s): 0.43 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.979159E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.614 | TFLOPs: 31.41 | +7: iteration 65170/ 173500 | consumed samples: 16683520 | consumed tokens: 34167848960 | elapsed time per iteration (s): 0.43 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.985030E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.343 | TFLOPs: 31.60 | +7: iteration 65180/ 173500 | consumed samples: 16686080 | consumed tokens: 34173091840 | elapsed time per iteration (s): 0.43 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.979681E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.722 | TFLOPs: 31.41 | +7: iteration 65190/ 173500 | consumed samples: 16688640 | consumed tokens: 34178334720 | elapsed time per iteration (s): 0.43 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.986028E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.373 | TFLOPs: 31.03 | +7: iteration 65200/ 173500 | consumed samples: 16691200 | consumed tokens: 34183577600 | elapsed time per iteration (s): 0.42 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.988317E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.899 | TFLOPs: 31.90 | +7: iteration 65210/ 173500 | consumed samples: 16693760 | consumed tokens: 34188820480 | elapsed time per iteration (s): 0.43 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.976300E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.082 | TFLOPs: 31.59 | +7: iteration 65220/ 173500 | consumed samples: 16696320 | consumed tokens: 34194063360 | elapsed time per iteration (s): 0.43 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.979045E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.894 | TFLOPs: 31.27 | +7: iteration 65230/ 173500 | consumed samples: 16698880 | consumed tokens: 34199306240 | elapsed time per iteration (s): 0.43 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.990813E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.096 | TFLOPs: 31.59 | +7: iteration 65240/ 173500 | consumed samples: 16701440 | consumed tokens: 34204549120 | elapsed time per iteration (s): 0.42 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.989665E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.350 | TFLOPs: 31.66 | +7: iteration 65250/ 173500 | consumed samples: 16704000 | consumed tokens: 34209792000 | elapsed time per iteration (s): 0.42 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.980047E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.144 | TFLOPs: 31.70 | +7: iteration 65260/ 173500 | consumed samples: 16706560 | consumed tokens: 34215034880 | elapsed time per iteration (s): 0.42 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.985137E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.637 | TFLOPs: 31.72 | +7: iteration 65270/ 173500 | consumed samples: 16709120 | consumed tokens: 34220277760 | elapsed time per iteration (s): 0.42 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.974340E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.018 | TFLOPs: 31.85 | +7: iteration 65280/ 173500 | consumed samples: 16711680 | consumed tokens: 34225520640 | elapsed time per iteration (s): 0.43 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.981809E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.056 | TFLOPs: 31.38 | +7: iteration 65290/ 173500 | consumed samples: 16714240 | consumed tokens: 34230763520 | elapsed time per iteration (s): 0.43 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.977895E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.542 | TFLOPs: 31.51 | +7: iteration 65300/ 173500 | consumed samples: 16716800 | consumed tokens: 34236006400 | elapsed time per iteration (s): 0.43 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.972559E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.795 | TFLOPs: 31.37 | +7: iteration 65310/ 173500 | consumed samples: 16719360 | consumed tokens: 34241249280 | elapsed time per iteration (s): 0.43 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.987553E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.970 | TFLOPs: 31.58 | +7: iteration 65320/ 173500 | consumed samples: 16721920 | consumed tokens: 34246492160 | elapsed time per iteration (s): 0.43 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.992690E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.573 | TFLOPs: 31.30 | +7: iteration 65330/ 173500 | consumed samples: 16724480 | consumed tokens: 34251735040 | elapsed time per iteration (s): 0.43 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.985853E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.054 | TFLOPs: 31.17 | +7: iteration 65340/ 173500 | consumed samples: 16727040 | consumed tokens: 34256977920 | elapsed time per iteration (s): 0.42 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.984086E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.612 | TFLOPs: 31.99 | +7: iteration 65350/ 173500 | consumed samples: 16729600 | consumed tokens: 34262220800 | elapsed time per iteration (s): 0.42 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.983028E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.719 | TFLOPs: 31.99 | +7: iteration 65360/ 173500 | consumed samples: 16732160 | consumed tokens: 34267463680 | elapsed time per iteration (s): 0.43 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.986390E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.655 | TFLOPs: 31.41 | +7: iteration 65370/ 173500 | consumed samples: 16734720 | consumed tokens: 34272706560 | elapsed time per iteration (s): 0.42 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.988885E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.779 | TFLOPs: 31.99 | +7: iteration 65380/ 173500 | consumed samples: 16737280 | consumed tokens: 34277949440 | elapsed time per iteration (s): 0.44 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.988623E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.334 | TFLOPs: 30.24 | +7: iteration 65390/ 173500 | consumed samples: 16739840 | consumed tokens: 34283192320 | elapsed time per iteration (s): 0.42 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.977014E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.758 | TFLOPs: 32.05 | +7: iteration 65400/ 173500 | consumed samples: 16742400 | consumed tokens: 34288435200 | elapsed time per iteration (s): 0.42 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.987746E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.394 | TFLOPs: 31.71 | +7: iteration 65410/ 173500 | consumed samples: 16744960 | consumed tokens: 34293678080 | elapsed time per iteration (s): 0.43 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.988949E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.729 | TFLOPs: 31.57 | +7: iteration 65420/ 173500 | consumed samples: 16747520 | consumed tokens: 34298920960 | elapsed time per iteration (s): 0.43 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.982187E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.140 | TFLOPs: 31.28 | +7: iteration 65430/ 173500 | consumed samples: 16750080 | consumed tokens: 34304163840 | elapsed time per iteration (s): 0.43 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.980439E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.502 | TFLOPs: 30.98 | +7: iteration 65440/ 173500 | consumed samples: 16752640 | consumed tokens: 34309406720 | elapsed time per iteration (s): 0.42 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.995906E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.524 | TFLOPs: 31.88 | +7: iteration 65450/ 173500 | consumed samples: 16755200 | consumed tokens: 34314649600 | elapsed time per iteration (s): 0.42 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.984447E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.081 | TFLOPs: 32.01 | +7: iteration 65460/ 173500 | consumed samples: 16757760 | consumed tokens: 34319892480 | elapsed time per iteration (s): 0.43 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.993592E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.887 | TFLOPs: 31.58 | +7: iteration 65470/ 173500 | consumed samples: 16760320 | consumed tokens: 34325135360 | elapsed time per iteration (s): 0.42 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.981953E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.689 | TFLOPs: 31.62 | +7: iteration 65480/ 173500 | consumed samples: 16762880 | consumed tokens: 34330378240 | elapsed time per iteration (s): 0.43 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.971996E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.851 | TFLOPs: 31.37 | +7: iteration 65490/ 173500 | consumed samples: 16765440 | consumed tokens: 34335621120 | elapsed time per iteration (s): 0.42 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.968255E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.249 | TFLOPs: 31.86 | +7: iteration 65500/ 173500 | consumed samples: 16768000 | consumed tokens: 34340864000 | elapsed time per iteration (s): 0.42 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.975322E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.189 | TFLOPs: 31.81 | +7: iteration 65510/ 173500 | consumed samples: 16770560 | consumed tokens: 34346106880 | elapsed time per iteration (s): 0.42 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.984970E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.528 | TFLOPs: 31.98 | +7: iteration 65520/ 173500 | consumed samples: 16773120 | consumed tokens: 34351349760 | elapsed time per iteration (s): 0.42 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.973097E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.967 | TFLOPs: 31.85 | +7: iteration 65530/ 173500 | consumed samples: 16775680 | consumed tokens: 34356592640 | elapsed time per iteration (s): 0.43 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.987340E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.981 | TFLOPs: 30.96 | +7: iteration 65540/ 173500 | consumed samples: 16778240 | consumed tokens: 34361835520 | elapsed time per iteration (s): 0.44 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.988594E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.826 | TFLOPs: 30.53 | +7: iteration 65550/ 173500 | consumed samples: 16780800 | consumed tokens: 34367078400 | elapsed time per iteration (s): 0.42 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.984211E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.851 | TFLOPs: 31.63 | +7: iteration 65560/ 173500 | consumed samples: 16783360 | consumed tokens: 34372321280 | elapsed time per iteration (s): 0.43 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.971287E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.296 | TFLOPs: 31.29 | +7: iteration 65570/ 173500 | consumed samples: 16785920 | consumed tokens: 34377564160 | elapsed time per iteration (s): 0.43 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.990222E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.750 | TFLOPs: 31.57 | +7: iteration 65580/ 173500 | consumed samples: 16788480 | consumed tokens: 34382807040 | elapsed time per iteration (s): 0.43 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.967493E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.221 | TFLOPs: 31.18 | +7: iteration 65590/ 173500 | consumed samples: 16791040 | consumed tokens: 34388049920 | elapsed time per iteration (s): 0.43 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.999320E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.968 | TFLOPs: 31.43 | +7: iteration 65600/ 173500 | consumed samples: 16793600 | consumed tokens: 34393292800 | elapsed time per iteration (s): 0.43 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.978302E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.161 | TFLOPs: 31.59 | +7: iteration 65610/ 173500 | consumed samples: 16796160 | consumed tokens: 34398535680 | elapsed time per iteration (s): 0.43 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.994264E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.945 | TFLOPs: 31.22 | +7: iteration 65620/ 173500 | consumed samples: 16798720 | consumed tokens: 34403778560 | elapsed time per iteration (s): 0.43 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.984180E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.245 | TFLOPs: 31.44 | +7: iteration 65630/ 173500 | consumed samples: 16801280 | consumed tokens: 34409021440 | elapsed time per iteration (s): 0.43 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.990683E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.071 | TFLOPs: 31.38 | +7: iteration 65640/ 173500 | consumed samples: 16803840 | consumed tokens: 34414264320 | elapsed time per iteration (s): 0.43 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.985462E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.743 | TFLOPs: 31.26 | +7: iteration 65650/ 173500 | consumed samples: 16806400 | consumed tokens: 34419507200 | elapsed time per iteration (s): 0.43 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.980050E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.657 | TFLOPs: 30.99 | +7: iteration 65660/ 173500 | consumed samples: 16808960 | consumed tokens: 34424750080 | elapsed time per iteration (s): 0.44 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.975510E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.860 | TFLOPs: 30.21 | +7: iteration 65670/ 173500 | consumed samples: 16811520 | consumed tokens: 34429992960 | elapsed time per iteration (s): 0.43 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.972331E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.297 | TFLOPs: 31.34 | +7: iteration 65680/ 173500 | consumed samples: 16814080 | consumed tokens: 34435235840 | elapsed time per iteration (s): 0.43 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.970173E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.360 | TFLOPs: 31.34 | +7: iteration 65690/ 173500 | consumed samples: 16816640 | consumed tokens: 34440478720 | elapsed time per iteration (s): 0.43 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 3.005807E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.663 | TFLOPs: 31.04 | +7: iteration 65700/ 173500 | consumed samples: 16819200 | consumed tokens: 34445721600 | elapsed time per iteration (s): 0.42 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.989848E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.536 | TFLOPs: 31.72 | +7: iteration 65710/ 173500 | consumed samples: 16821760 | consumed tokens: 34450964480 | elapsed time per iteration (s): 0.42 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.969776E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.299 | TFLOPs: 31.81 | +7: iteration 65720/ 173500 | consumed samples: 16824320 | consumed tokens: 34456207360 | elapsed time per iteration (s): 0.42 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.984009E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.198 | TFLOPs: 31.81 | +7: iteration 65730/ 173500 | consumed samples: 16826880 | consumed tokens: 34461450240 | elapsed time per iteration (s): 0.43 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.977064E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.234 | TFLOPs: 30.97 | +7: iteration 65740/ 173500 | consumed samples: 16829440 | consumed tokens: 34466693120 | elapsed time per iteration (s): 0.43 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.973467E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.991 | TFLOPs: 31.27 | +7: iteration 65750/ 173500 | consumed samples: 16832000 | consumed tokens: 34471936000 | elapsed time per iteration (s): 0.42 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.999274E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.612 | TFLOPs: 31.83 | +7: iteration 65760/ 173500 | consumed samples: 16834560 | consumed tokens: 34477178880 | elapsed time per iteration (s): 0.44 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.991757E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.099 | TFLOPs: 30.80 | +7: iteration 65770/ 173500 | consumed samples: 16837120 | consumed tokens: 34482421760 | elapsed time per iteration (s): 0.42 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.979873E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.528 | TFLOPs: 31.61 | +7: iteration 65780/ 173500 | consumed samples: 16839680 | consumed tokens: 34487664640 | elapsed time per iteration (s): 0.43 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.977112E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.356 | TFLOPs: 31.55 | +7: iteration 65790/ 173500 | consumed samples: 16842240 | consumed tokens: 34492907520 | elapsed time per iteration (s): 0.43 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.983805E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.379 | TFLOPs: 31.40 | +7: iteration 65800/ 173500 | consumed samples: 16844800 | consumed tokens: 34498150400 | elapsed time per iteration (s): 0.42 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.991833E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.211 | TFLOPs: 31.81 | +7: iteration 65810/ 173500 | consumed samples: 16847360 | consumed tokens: 34503393280 | elapsed time per iteration (s): 0.43 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.972699E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.630 | TFLOPs: 31.46 | +7: iteration 65820/ 173500 | consumed samples: 16849920 | consumed tokens: 34508636160 | elapsed time per iteration (s): 0.42 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.969119E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.573 | TFLOPs: 31.88 | +7: iteration 65830/ 173500 | consumed samples: 16852480 | consumed tokens: 34513879040 | elapsed time per iteration (s): 0.43 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.977042E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.052 | TFLOPs: 31.59 | +7: iteration 65840/ 173500 | consumed samples: 16855040 | consumed tokens: 34519121920 | elapsed time per iteration (s): 0.43 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.973043E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.798 | TFLOPs: 31.10 | +7: iteration 65850/ 173500 | consumed samples: 16857600 | consumed tokens: 34524364800 | elapsed time per iteration (s): 0.42 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.981372E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.040 | TFLOPs: 31.64 | +7: iteration 65860/ 173500 | consumed samples: 16860160 | consumed tokens: 34529607680 | elapsed time per iteration (s): 0.43 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.966348E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.361 | TFLOPs: 31.24 | +7: iteration 65870/ 173500 | consumed samples: 16862720 | consumed tokens: 34534850560 | elapsed time per iteration (s): 0.43 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.979815E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.885 | TFLOPs: 31.16 | +7: iteration 65880/ 173500 | consumed samples: 16865280 | consumed tokens: 34540093440 | elapsed time per iteration (s): 0.42 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.982512E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.327 | TFLOPs: 31.66 | +7: iteration 65890/ 173500 | consumed samples: 16867840 | consumed tokens: 34545336320 | elapsed time per iteration (s): 0.43 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.968940E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.442 | TFLOPs: 31.03 | +7: iteration 65900/ 173500 | consumed samples: 16870400 | consumed tokens: 34550579200 | elapsed time per iteration (s): 0.44 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.979947E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.160 | TFLOPs: 30.60 | +7: iteration 65910/ 173500 | consumed samples: 16872960 | consumed tokens: 34555822080 | elapsed time per iteration (s): 0.42 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.975599E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.448 | TFLOPs: 31.77 | +7: iteration 65920/ 173500 | consumed samples: 16875520 | consumed tokens: 34561064960 | elapsed time per iteration (s): 0.42 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.989671E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.223 | TFLOPs: 31.86 | +7: iteration 65930/ 173500 | consumed samples: 16878080 | consumed tokens: 34566307840 | elapsed time per iteration (s): 0.43 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.977544E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.309 | TFLOPs: 31.55 | +7: iteration 65940/ 173500 | consumed samples: 16880640 | consumed tokens: 34571550720 | elapsed time per iteration (s): 0.42 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.972243E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.727 | TFLOPs: 31.68 | +7: iteration 65950/ 173500 | consumed samples: 16883200 | consumed tokens: 34576793600 | elapsed time per iteration (s): 0.43 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.995012E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.108 | TFLOPs: 31.07 | +7: iteration 65960/ 173500 | consumed samples: 16885760 | consumed tokens: 34582036480 | elapsed time per iteration (s): 0.42 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.971851E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.973 | TFLOPs: 31.79 | +7: iteration 65970/ 173500 | consumed samples: 16888320 | consumed tokens: 34587279360 | elapsed time per iteration (s): 0.43 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.988849E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.051 | TFLOPs: 31.54 | +7: iteration 65980/ 173500 | consumed samples: 16890880 | consumed tokens: 34592522240 | elapsed time per iteration (s): 0.42 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.986331E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.112 | TFLOPs: 31.80 | +7: iteration 65990/ 173500 | consumed samples: 16893440 | consumed tokens: 34597765120 | elapsed time per iteration (s): 0.42 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.990280E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.812 | TFLOPs: 31.68 | +0: [2023-03-17 07:01:08,685] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=0, lr=[0.00014466507355770288, 0.00014466507355770288, 0.00014466507355770288], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 66000/ 173500 | consumed samples: 16896000 | consumed tokens: 34603008000 | elapsed time per iteration (s): 0.43 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.978994E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.115 | TFLOPs: 31.33 | +0: steps: 66000 loss: 2.9943 iter time (s): 0.427 samples/sec: 599.039 +7: iteration 66010/ 173500 | consumed samples: 16898560 | consumed tokens: 34608250880 | elapsed time per iteration (s): 0.42 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.987493E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.726 | TFLOPs: 31.78 | +7: iteration 66020/ 173500 | consumed samples: 16901120 | consumed tokens: 34613493760 | elapsed time per iteration (s): 0.42 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.981707E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.202 | TFLOPs: 31.91 | +7: iteration 66030/ 173500 | consumed samples: 16903680 | consumed tokens: 34618736640 | elapsed time per iteration (s): 0.43 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.985021E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.256 | TFLOPs: 31.44 | +7: iteration 66040/ 173500 | consumed samples: 16906240 | consumed tokens: 34623979520 | elapsed time per iteration (s): 0.43 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.972183E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.888 | TFLOPs: 31.58 | +7: iteration 66050/ 173500 | consumed samples: 16908800 | consumed tokens: 34629222400 | elapsed time per iteration (s): 0.43 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.974225E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.087 | TFLOPs: 31.43 | +7: iteration 66060/ 173500 | consumed samples: 16911360 | consumed tokens: 34634465280 | elapsed time per iteration (s): 0.43 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.980677E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.388 | TFLOPs: 31.50 | +7: iteration 66070/ 173500 | consumed samples: 16913920 | consumed tokens: 34639708160 | elapsed time per iteration (s): 0.43 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.978697E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.656 | TFLOPs: 31.41 | +7: iteration 66080/ 173500 | consumed samples: 16916480 | consumed tokens: 34644951040 | elapsed time per iteration (s): 0.43 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.988415E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.502 | TFLOPs: 31.51 | +7: iteration 66090/ 173500 | consumed samples: 16919040 | consumed tokens: 34650193920 | elapsed time per iteration (s): 0.43 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.967852E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.656 | TFLOPs: 31.57 | +7: iteration 66100/ 173500 | consumed samples: 16921600 | consumed tokens: 34655436800 | elapsed time per iteration (s): 0.42 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.989953E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.689 | TFLOPs: 31.99 | +7: iteration 66110/ 173500 | consumed samples: 16924160 | consumed tokens: 34660679680 | elapsed time per iteration (s): 0.42 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.973270E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.344 | TFLOPs: 31.71 | +7: iteration 66120/ 173500 | consumed samples: 16926720 | consumed tokens: 34665922560 | elapsed time per iteration (s): 0.42 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.990701E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.544 | TFLOPs: 31.61 | +7: iteration 66130/ 173500 | consumed samples: 16929280 | consumed tokens: 34671165440 | elapsed time per iteration (s): 0.43 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.978725E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.146 | TFLOPs: 31.59 | +7: iteration 66140/ 173500 | consumed samples: 16931840 | consumed tokens: 34676408320 | elapsed time per iteration (s): 0.42 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.988387E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.045 | TFLOPs: 31.64 | +7: iteration 66150/ 173500 | consumed samples: 16934400 | consumed tokens: 34681651200 | elapsed time per iteration (s): 0.42 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.978734E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.712 | TFLOPs: 31.62 | +7: iteration 66160/ 173500 | consumed samples: 16936960 | consumed tokens: 34686894080 | elapsed time per iteration (s): 0.43 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.973543E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.986 | TFLOPs: 31.48 | +7: iteration 66170/ 173500 | consumed samples: 16939520 | consumed tokens: 34692136960 | elapsed time per iteration (s): 0.44 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.985160E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.117 | TFLOPs: 30.60 | +7: iteration 66180/ 173500 | consumed samples: 16942080 | consumed tokens: 34697379840 | elapsed time per iteration (s): 0.42 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.968236E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.684 | TFLOPs: 31.62 | +7: iteration 66190/ 173500 | consumed samples: 16944640 | consumed tokens: 34702622720 | elapsed time per iteration (s): 0.43 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.981667E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.695 | TFLOPs: 31.57 | +7: iteration 66200/ 173500 | consumed samples: 16947200 | consumed tokens: 34707865600 | elapsed time per iteration (s): 0.43 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.990940E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.625 | TFLOPs: 31.15 | +7: iteration 66210/ 173500 | consumed samples: 16949760 | consumed tokens: 34713108480 | elapsed time per iteration (s): 0.43 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.983505E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.339 | TFLOPs: 31.34 | +7: iteration 66220/ 173500 | consumed samples: 16952320 | consumed tokens: 34718351360 | elapsed time per iteration (s): 0.43 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.985811E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.248 | TFLOPs: 30.92 | +7: iteration 66230/ 173500 | consumed samples: 16954880 | consumed tokens: 34723594240 | elapsed time per iteration (s): 0.42 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.992670E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.514 | TFLOPs: 31.61 | +7: iteration 66240/ 173500 | consumed samples: 16957440 | consumed tokens: 34728837120 | elapsed time per iteration (s): 0.43 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.974284E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.127 | TFLOPs: 31.28 | +7: iteration 66250/ 173500 | consumed samples: 16960000 | consumed tokens: 34734080000 | elapsed time per iteration (s): 0.43 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.973552E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.393 | TFLOPs: 31.13 | +7: iteration 66260/ 173500 | consumed samples: 16962560 | consumed tokens: 34739322880 | elapsed time per iteration (s): 0.43 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.982251E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.798 | TFLOPs: 31.37 | +7: iteration 66270/ 173500 | consumed samples: 16965120 | consumed tokens: 34744565760 | elapsed time per iteration (s): 0.44 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.994756E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.785 | TFLOPs: 30.84 | +7: iteration 66280/ 173500 | consumed samples: 16967680 | consumed tokens: 34749808640 | elapsed time per iteration (s): 0.43 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.982737E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.785 | TFLOPs: 31.21 | +7: iteration 66290/ 173500 | consumed samples: 16970240 | consumed tokens: 34755051520 | elapsed time per iteration (s): 0.43 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.973105E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.115 | TFLOPs: 31.59 | +7: iteration 66300/ 173500 | consumed samples: 16972800 | consumed tokens: 34760294400 | elapsed time per iteration (s): 0.43 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.973985E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.255 | TFLOPs: 30.92 | +7: iteration 66310/ 173500 | consumed samples: 16975360 | consumed tokens: 34765537280 | elapsed time per iteration (s): 0.42 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.990207E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.579 | TFLOPs: 31.72 | +7: iteration 66320/ 173500 | consumed samples: 16977920 | consumed tokens: 34770780160 | elapsed time per iteration (s): 0.43 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.989822E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.691 | TFLOPs: 30.89 | +7: iteration 66330/ 173500 | consumed samples: 16980480 | consumed tokens: 34776023040 | elapsed time per iteration (s): 0.43 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.993335E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.835 | TFLOPs: 31.11 | +7: iteration 66340/ 173500 | consumed samples: 16983040 | consumed tokens: 34781265920 | elapsed time per iteration (s): 0.43 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.974894E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.065 | TFLOPs: 31.38 | +7: iteration 66350/ 173500 | consumed samples: 16985600 | consumed tokens: 34786508800 | elapsed time per iteration (s): 0.43 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.978728E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.513 | TFLOPs: 31.40 | +7: iteration 66360/ 173500 | consumed samples: 16988160 | consumed tokens: 34791751680 | elapsed time per iteration (s): 0.43 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.993019E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.192 | TFLOPs: 31.28 | +7: iteration 66370/ 173500 | consumed samples: 16990720 | consumed tokens: 34796994560 | elapsed time per iteration (s): 0.42 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.992322E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.015 | TFLOPs: 31.80 | +7: iteration 66380/ 173500 | consumed samples: 16993280 | consumed tokens: 34802237440 | elapsed time per iteration (s): 0.44 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.975060E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.110 | TFLOPs: 30.49 | +7: iteration 66390/ 173500 | consumed samples: 16995840 | consumed tokens: 34807480320 | elapsed time per iteration (s): 0.47 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.989158E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 543.997 | TFLOPs: 28.54 | +7: iteration 66400/ 173500 | consumed samples: 16998400 | consumed tokens: 34812723200 | elapsed time per iteration (s): 0.46 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.990495E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.410 | TFLOPs: 29.51 | +7: iteration 66410/ 173500 | consumed samples: 17000960 | consumed tokens: 34817966080 | elapsed time per iteration (s): 0.44 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.970474E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.867 | TFLOPs: 30.27 | +7: iteration 66420/ 173500 | consumed samples: 17003520 | consumed tokens: 34823208960 | elapsed time per iteration (s): 0.43 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.971993E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.130 | TFLOPs: 30.96 | +7: iteration 66430/ 173500 | consumed samples: 17006080 | consumed tokens: 34828451840 | elapsed time per iteration (s): 0.45 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.988763E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.223 | TFLOPs: 29.76 | +7: iteration 66440/ 173500 | consumed samples: 17008640 | consumed tokens: 34833694720 | elapsed time per iteration (s): 0.45 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.984624E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.154 | TFLOPs: 29.71 | +7: iteration 66450/ 173500 | consumed samples: 17011200 | consumed tokens: 34838937600 | elapsed time per iteration (s): 0.44 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.980494E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.690 | TFLOPs: 30.78 | +7: iteration 66460/ 173500 | consumed samples: 17013760 | consumed tokens: 34844180480 | elapsed time per iteration (s): 0.43 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.979647E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.052 | TFLOPs: 31.33 | +7: iteration 66470/ 173500 | consumed samples: 17016320 | consumed tokens: 34849423360 | elapsed time per iteration (s): 0.42 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.977522E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.789 | TFLOPs: 31.63 | +7: iteration 66480/ 173500 | consumed samples: 17018880 | consumed tokens: 34854666240 | elapsed time per iteration (s): 0.44 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.974543E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.460 | TFLOPs: 30.56 | +7: iteration 66490/ 173500 | consumed samples: 17021440 | consumed tokens: 34859909120 | elapsed time per iteration (s): 0.46 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.993172E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.355 | TFLOPs: 29.35 | +7: iteration 66500/ 173500 | consumed samples: 17024000 | consumed tokens: 34865152000 | elapsed time per iteration (s): 0.44 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.962053E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.644 | TFLOPs: 30.73 | +7: iteration 66510/ 173500 | consumed samples: 17026560 | consumed tokens: 34870394880 | elapsed time per iteration (s): 0.45 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.969157E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.093 | TFLOPs: 29.60 | +7: iteration 66520/ 173500 | consumed samples: 17029120 | consumed tokens: 34875637760 | elapsed time per iteration (s): 0.46 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.980297E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.423 | TFLOPs: 28.98 | +7: iteration 66530/ 173500 | consumed samples: 17031680 | consumed tokens: 34880880640 | elapsed time per iteration (s): 0.47 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.992374E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 550.334 | TFLOPs: 28.88 | +7: iteration 66540/ 173500 | consumed samples: 17034240 | consumed tokens: 34886123520 | elapsed time per iteration (s): 0.48 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.978238E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.731 | TFLOPs: 28.27 | +7: iteration 66550/ 173500 | consumed samples: 17036800 | consumed tokens: 34891366400 | elapsed time per iteration (s): 0.48 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.979536E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.881 | TFLOPs: 27.80 | +7: iteration 66560/ 173500 | consumed samples: 17039360 | consumed tokens: 34896609280 | elapsed time per iteration (s): 0.44 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.978227E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.147 | TFLOPs: 30.49 | +7: iteration 66570/ 173500 | consumed samples: 17041920 | consumed tokens: 34901852160 | elapsed time per iteration (s): 0.43 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.968534E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.538 | TFLOPs: 31.35 | +7: iteration 66580/ 173500 | consumed samples: 17044480 | consumed tokens: 34907095040 | elapsed time per iteration (s): 0.43 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.988807E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.693 | TFLOPs: 31.31 | +7: iteration 66590/ 173500 | consumed samples: 17047040 | consumed tokens: 34912337920 | elapsed time per iteration (s): 0.42 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.963897E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.505 | TFLOPs: 31.82 | +7: iteration 66600/ 173500 | consumed samples: 17049600 | consumed tokens: 34917580800 | elapsed time per iteration (s): 0.44 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.971985E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.345 | TFLOPs: 30.87 | +7: iteration 66610/ 173500 | consumed samples: 17052160 | consumed tokens: 34922823680 | elapsed time per iteration (s): 0.43 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.966310E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.063 | TFLOPs: 31.59 | +7: iteration 66620/ 173500 | consumed samples: 17054720 | consumed tokens: 34928066560 | elapsed time per iteration (s): 0.43 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.981257E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.649 | TFLOPs: 31.41 | +7: iteration 66630/ 173500 | consumed samples: 17057280 | consumed tokens: 34933309440 | elapsed time per iteration (s): 0.43 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.968589E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.354 | TFLOPs: 31.50 | +7: iteration 66640/ 173500 | consumed samples: 17059840 | consumed tokens: 34938552320 | elapsed time per iteration (s): 0.42 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.980368E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.698 | TFLOPs: 31.78 | +7: iteration 66650/ 173500 | consumed samples: 17062400 | consumed tokens: 34943795200 | elapsed time per iteration (s): 0.44 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.970535E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.878 | TFLOPs: 30.43 | +7: iteration 66660/ 173500 | consumed samples: 17064960 | consumed tokens: 34949038080 | elapsed time per iteration (s): 0.42 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.986649E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.033 | TFLOPs: 31.64 | +7: iteration 66670/ 173500 | consumed samples: 17067520 | consumed tokens: 34954280960 | elapsed time per iteration (s): 0.44 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.986893E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.835 | TFLOPs: 30.63 | +7: iteration 66680/ 173500 | consumed samples: 17070080 | consumed tokens: 34959523840 | elapsed time per iteration (s): 0.43 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.970498E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.016 | TFLOPs: 31.32 | +7: iteration 66690/ 173500 | consumed samples: 17072640 | consumed tokens: 34964766720 | elapsed time per iteration (s): 0.43 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.974389E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.871 | TFLOPs: 31.16 | +7: iteration 66700/ 173500 | consumed samples: 17075200 | consumed tokens: 34970009600 | elapsed time per iteration (s): 0.42 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.970829E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.866 | TFLOPs: 31.79 | +7: iteration 66710/ 173500 | consumed samples: 17077760 | consumed tokens: 34975252480 | elapsed time per iteration (s): 0.45 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.974222E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.134 | TFLOPs: 29.81 | +7: iteration 66720/ 173500 | consumed samples: 17080320 | consumed tokens: 34980495360 | elapsed time per iteration (s): 0.43 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.979109E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.955 | TFLOPs: 31.22 | +7: iteration 66730/ 173500 | consumed samples: 17082880 | consumed tokens: 34985738240 | elapsed time per iteration (s): 0.42 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.975420E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.787 | TFLOPs: 31.68 | +7: iteration 66740/ 173500 | consumed samples: 17085440 | consumed tokens: 34990981120 | elapsed time per iteration (s): 0.42 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.983902E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.670 | TFLOPs: 31.62 | +7: iteration 66750/ 173500 | consumed samples: 17088000 | consumed tokens: 34996224000 | elapsed time per iteration (s): 0.43 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.978441E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.727 | TFLOPs: 31.47 | +7: iteration 66760/ 173500 | consumed samples: 17090560 | consumed tokens: 35001466880 | elapsed time per iteration (s): 0.42 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.980249E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.207 | TFLOPs: 31.70 | +7: iteration 66770/ 173500 | consumed samples: 17093120 | consumed tokens: 35006709760 | elapsed time per iteration (s): 0.42 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.967789E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.396 | TFLOPs: 31.82 | +7: iteration 66780/ 173500 | consumed samples: 17095680 | consumed tokens: 35011952640 | elapsed time per iteration (s): 0.43 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.993595E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.035 | TFLOPs: 30.91 | +7: iteration 66790/ 173500 | consumed samples: 17098240 | consumed tokens: 35017195520 | elapsed time per iteration (s): 0.42 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.987458E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.473 | TFLOPs: 31.82 | +7: iteration 66800/ 173500 | consumed samples: 17100800 | consumed tokens: 35022438400 | elapsed time per iteration (s): 0.42 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.976966E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.846 | TFLOPs: 31.79 | +7: iteration 66810/ 173500 | consumed samples: 17103360 | consumed tokens: 35027681280 | elapsed time per iteration (s): 0.42 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.983594E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.406 | TFLOPs: 31.76 | +7: iteration 66820/ 173500 | consumed samples: 17105920 | consumed tokens: 35032924160 | elapsed time per iteration (s): 0.43 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.977404E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.345 | TFLOPs: 31.34 | +7: iteration 66830/ 173500 | consumed samples: 17108480 | consumed tokens: 35038167040 | elapsed time per iteration (s): 0.43 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.986724E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.955 | TFLOPs: 31.58 | +7: iteration 66840/ 173500 | consumed samples: 17111040 | consumed tokens: 35043409920 | elapsed time per iteration (s): 0.43 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.961720E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.395 | TFLOPs: 31.50 | +7: iteration 66850/ 173500 | consumed samples: 17113600 | consumed tokens: 35048652800 | elapsed time per iteration (s): 0.43 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.963581E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.539 | TFLOPs: 31.14 | +7: iteration 66860/ 173500 | consumed samples: 17116160 | consumed tokens: 35053895680 | elapsed time per iteration (s): 0.43 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.985641E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.411 | TFLOPs: 31.45 | +7: iteration 66870/ 173500 | consumed samples: 17118720 | consumed tokens: 35059138560 | elapsed time per iteration (s): 0.43 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.970135E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.055 | TFLOPs: 31.38 | +7: iteration 66880/ 173500 | consumed samples: 17121280 | consumed tokens: 35064381440 | elapsed time per iteration (s): 0.42 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.973505E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.428 | TFLOPs: 31.82 | +7: iteration 66890/ 173500 | consumed samples: 17123840 | consumed tokens: 35069624320 | elapsed time per iteration (s): 0.43 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.969722E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.157 | TFLOPs: 31.23 | +7: iteration 66900/ 173500 | consumed samples: 17126400 | consumed tokens: 35074867200 | elapsed time per iteration (s): 0.43 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.994528E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.829 | TFLOPs: 31.47 | +7: iteration 66910/ 173500 | consumed samples: 17128960 | consumed tokens: 35080110080 | elapsed time per iteration (s): 0.42 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.981794E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.559 | TFLOPs: 31.83 | +7: iteration 66920/ 173500 | consumed samples: 17131520 | consumed tokens: 35085352960 | elapsed time per iteration (s): 0.43 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.979875E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.049 | TFLOPs: 31.38 | +7: iteration 66930/ 173500 | consumed samples: 17134080 | consumed tokens: 35090595840 | elapsed time per iteration (s): 0.42 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.971202E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.088 | TFLOPs: 32.01 | +7: iteration 66940/ 173500 | consumed samples: 17136640 | consumed tokens: 35095838720 | elapsed time per iteration (s): 0.43 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.966556E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.301 | TFLOPs: 31.55 | +7: iteration 66950/ 173500 | consumed samples: 17139200 | consumed tokens: 35101081600 | elapsed time per iteration (s): 0.42 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.973785E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.278 | TFLOPs: 31.71 | +7: iteration 66960/ 173500 | consumed samples: 17141760 | consumed tokens: 35106324480 | elapsed time per iteration (s): 0.43 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.986039E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.813 | TFLOPs: 31.47 | +7: iteration 66970/ 173500 | consumed samples: 17144320 | consumed tokens: 35111567360 | elapsed time per iteration (s): 0.42 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.971131E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.655 | TFLOPs: 31.62 | +7: iteration 66980/ 173500 | consumed samples: 17146880 | consumed tokens: 35116810240 | elapsed time per iteration (s): 0.43 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.975946E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.813 | TFLOPs: 31.42 | +7: iteration 66990/ 173500 | consumed samples: 17149440 | consumed tokens: 35122053120 | elapsed time per iteration (s): 0.43 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.984242E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.209 | TFLOPs: 30.91 | +7: iteration 67000/ 173500 | consumed samples: 17152000 | consumed tokens: 35127296000 | elapsed time per iteration (s): 0.43 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.990803E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.037 | TFLOPs: 31.59 | +7: iteration 67010/ 173500 | consumed samples: 17154560 | consumed tokens: 35132538880 | elapsed time per iteration (s): 0.43 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.977291E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.796 | TFLOPs: 31.58 | +7: iteration 67020/ 173500 | consumed samples: 17157120 | consumed tokens: 35137781760 | elapsed time per iteration (s): 0.42 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.980251E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.750 | TFLOPs: 31.63 | +7: iteration 67030/ 173500 | consumed samples: 17159680 | consumed tokens: 35143024640 | elapsed time per iteration (s): 0.42 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.977530E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.973 | TFLOPs: 32.00 | +7: iteration 67040/ 173500 | consumed samples: 17162240 | consumed tokens: 35148267520 | elapsed time per iteration (s): 0.43 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.985752E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.423 | TFLOPs: 31.03 | +7: iteration 67050/ 173500 | consumed samples: 17164800 | consumed tokens: 35153510400 | elapsed time per iteration (s): 0.43 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.976199E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.744 | TFLOPs: 31.21 | +7: iteration 67060/ 173500 | consumed samples: 17167360 | consumed tokens: 35158753280 | elapsed time per iteration (s): 0.43 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.970891E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.944 | TFLOPs: 31.32 | +7: iteration 67070/ 173500 | consumed samples: 17169920 | consumed tokens: 35163996160 | elapsed time per iteration (s): 0.45 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.987092E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.483 | TFLOPs: 29.93 | +7: iteration 67080/ 173500 | consumed samples: 17172480 | consumed tokens: 35169239040 | elapsed time per iteration (s): 0.43 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.982533E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.669 | TFLOPs: 31.10 | +7: iteration 67090/ 173500 | consumed samples: 17175040 | consumed tokens: 35174481920 | elapsed time per iteration (s): 0.43 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.982367E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.066 | TFLOPs: 31.59 | +7: iteration 67100/ 173500 | consumed samples: 17177600 | consumed tokens: 35179724800 | elapsed time per iteration (s): 0.43 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.980186E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.425 | TFLOPs: 31.40 | +7: iteration 67110/ 173500 | consumed samples: 17180160 | consumed tokens: 35184967680 | elapsed time per iteration (s): 0.43 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.980965E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.383 | TFLOPs: 31.40 | +7: iteration 67120/ 173500 | consumed samples: 17182720 | consumed tokens: 35190210560 | elapsed time per iteration (s): 0.43 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.980676E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.884 | TFLOPs: 31.47 | +7: iteration 67130/ 173500 | consumed samples: 17185280 | consumed tokens: 35195453440 | elapsed time per iteration (s): 0.43 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.970686E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.394 | TFLOPs: 31.55 | +7: iteration 67140/ 173500 | consumed samples: 17187840 | consumed tokens: 35200696320 | elapsed time per iteration (s): 0.44 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.978755E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.099 | TFLOPs: 30.59 | +7: iteration 67150/ 173500 | consumed samples: 17190400 | consumed tokens: 35205939200 | elapsed time per iteration (s): 0.42 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.982025E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.262 | TFLOPs: 31.81 | +7: iteration 67160/ 173500 | consumed samples: 17192960 | consumed tokens: 35211182080 | elapsed time per iteration (s): 0.43 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.986381E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.223 | TFLOPs: 31.23 | +7: iteration 67170/ 173500 | consumed samples: 17195520 | consumed tokens: 35216424960 | elapsed time per iteration (s): 0.42 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.975705E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.508 | TFLOPs: 31.77 | +7: iteration 67180/ 173500 | consumed samples: 17198080 | consumed tokens: 35221667840 | elapsed time per iteration (s): 0.42 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.994779E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.527 | TFLOPs: 31.61 | +7: iteration 67190/ 173500 | consumed samples: 17200640 | consumed tokens: 35226910720 | elapsed time per iteration (s): 0.42 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.973680E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.382 | TFLOPs: 31.71 | +7: iteration 67200/ 173500 | consumed samples: 17203200 | consumed tokens: 35232153600 | elapsed time per iteration (s): 0.42 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.980342E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.588 | TFLOPs: 31.83 | +7: iteration 67210/ 173500 | consumed samples: 17205760 | consumed tokens: 35237396480 | elapsed time per iteration (s): 0.42 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.977419E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.633 | TFLOPs: 31.62 | +7: iteration 67220/ 173500 | consumed samples: 17208320 | consumed tokens: 35242639360 | elapsed time per iteration (s): 0.43 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.969928E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.463 | TFLOPs: 31.30 | +7: iteration 67230/ 173500 | consumed samples: 17210880 | consumed tokens: 35247882240 | elapsed time per iteration (s): 0.42 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.979005E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.116 | TFLOPs: 31.64 | +7: iteration 67240/ 173500 | consumed samples: 17213440 | consumed tokens: 35253125120 | elapsed time per iteration (s): 0.43 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.987358E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.742 | TFLOPs: 31.57 | +7: iteration 67250/ 173500 | consumed samples: 17216000 | consumed tokens: 35258368000 | elapsed time per iteration (s): 0.44 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.976211E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.315 | TFLOPs: 30.40 | +7: iteration 67260/ 173500 | consumed samples: 17218560 | consumed tokens: 35263610880 | elapsed time per iteration (s): 0.44 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.978682E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.500 | TFLOPs: 30.83 | +7: iteration 67270/ 173500 | consumed samples: 17221120 | consumed tokens: 35268853760 | elapsed time per iteration (s): 0.42 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.979056E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.000 | TFLOPs: 31.64 | +7: iteration 67280/ 173500 | consumed samples: 17223680 | consumed tokens: 35274096640 | elapsed time per iteration (s): 0.42 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.985270E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.177 | TFLOPs: 31.75 | +7: iteration 67290/ 173500 | consumed samples: 17226240 | consumed tokens: 35279339520 | elapsed time per iteration (s): 0.42 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.989107E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.427 | TFLOPs: 31.98 | +7: iteration 67300/ 173500 | consumed samples: 17228800 | consumed tokens: 35284582400 | elapsed time per iteration (s): 0.43 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.973612E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.321 | TFLOPs: 31.60 | +7: iteration 67310/ 173500 | consumed samples: 17231360 | consumed tokens: 35289825280 | elapsed time per iteration (s): 0.42 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.984324E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.122 | TFLOPs: 31.75 | +7: iteration 67320/ 173500 | consumed samples: 17233920 | consumed tokens: 35295068160 | elapsed time per iteration (s): 0.43 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.979641E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.951 | TFLOPs: 31.16 | +7: iteration 67330/ 173500 | consumed samples: 17236480 | consumed tokens: 35300311040 | elapsed time per iteration (s): 0.43 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.968965E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.661 | TFLOPs: 31.52 | +7: iteration 67340/ 173500 | consumed samples: 17239040 | consumed tokens: 35305553920 | elapsed time per iteration (s): 0.43 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.968528E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.221 | TFLOPs: 31.55 | +7: iteration 67350/ 173500 | consumed samples: 17241600 | consumed tokens: 35310796800 | elapsed time per iteration (s): 0.43 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.962941E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.300 | TFLOPs: 31.39 | +7: iteration 67360/ 173500 | consumed samples: 17244160 | consumed tokens: 35316039680 | elapsed time per iteration (s): 0.42 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.968056E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.087 | TFLOPs: 31.96 | +7: iteration 67370/ 173500 | consumed samples: 17246720 | consumed tokens: 35321282560 | elapsed time per iteration (s): 0.43 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.972291E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.293 | TFLOPs: 31.55 | +7: iteration 67380/ 173500 | consumed samples: 17249280 | consumed tokens: 35326525440 | elapsed time per iteration (s): 0.42 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.973466E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.194 | TFLOPs: 31.81 | +7: iteration 67390/ 173500 | consumed samples: 17251840 | consumed tokens: 35331768320 | elapsed time per iteration (s): 0.42 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.978441E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.864 | TFLOPs: 31.74 | +7: iteration 67400/ 173500 | consumed samples: 17254400 | consumed tokens: 35337011200 | elapsed time per iteration (s): 0.42 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.971906E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.433 | TFLOPs: 31.66 | +7: iteration 67410/ 173500 | consumed samples: 17256960 | consumed tokens: 35342254080 | elapsed time per iteration (s): 0.42 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.977605E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.492 | TFLOPs: 31.77 | +7: iteration 67420/ 173500 | consumed samples: 17259520 | consumed tokens: 35347496960 | elapsed time per iteration (s): 0.42 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.971583E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.328 | TFLOPs: 31.97 | +7: iteration 67430/ 173500 | consumed samples: 17262080 | consumed tokens: 35352739840 | elapsed time per iteration (s): 0.42 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.977446E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.027 | TFLOPs: 31.69 | +7: iteration 67440/ 173500 | consumed samples: 17264640 | consumed tokens: 35357982720 | elapsed time per iteration (s): 0.43 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.969552E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.412 | TFLOPs: 31.19 | +7: iteration 67450/ 173500 | consumed samples: 17267200 | consumed tokens: 35363225600 | elapsed time per iteration (s): 0.43 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.976386E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.383 | TFLOPs: 31.29 | +7: iteration 67460/ 173500 | consumed samples: 17269760 | consumed tokens: 35368468480 | elapsed time per iteration (s): 0.42 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.991104E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.849 | TFLOPs: 32.00 | +7: iteration 67470/ 173500 | consumed samples: 17272320 | consumed tokens: 35373711360 | elapsed time per iteration (s): 0.42 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.983692E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.351 | TFLOPs: 31.97 | +7: iteration 67480/ 173500 | consumed samples: 17274880 | consumed tokens: 35378954240 | elapsed time per iteration (s): 0.42 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.987086E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.854 | TFLOPs: 31.74 | +7: iteration 67490/ 173500 | consumed samples: 17277440 | consumed tokens: 35384197120 | elapsed time per iteration (s): 0.43 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.977519E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.571 | TFLOPs: 31.51 | +7: iteration 67500/ 173500 | consumed samples: 17280000 | consumed tokens: 35389440000 | elapsed time per iteration (s): 0.42 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.994587E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.713 | TFLOPs: 31.73 | +7: iteration 67510/ 173500 | consumed samples: 17282560 | consumed tokens: 35394682880 | elapsed time per iteration (s): 0.43 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.973405E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.925 | TFLOPs: 31.42 | +7: iteration 67520/ 173500 | consumed samples: 17285120 | consumed tokens: 35399925760 | elapsed time per iteration (s): 0.43 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.965933E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.601 | TFLOPs: 31.51 | +7: iteration 67530/ 173500 | consumed samples: 17287680 | consumed tokens: 35405168640 | elapsed time per iteration (s): 0.43 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.988779E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.676 | TFLOPs: 31.41 | +7: iteration 67540/ 173500 | consumed samples: 17290240 | consumed tokens: 35410411520 | elapsed time per iteration (s): 0.42 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.969776E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.560 | TFLOPs: 31.77 | +7: iteration 67550/ 173500 | consumed samples: 17292800 | consumed tokens: 35415654400 | elapsed time per iteration (s): 0.42 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.975709E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.712 | TFLOPs: 31.78 | +7: iteration 67560/ 173500 | consumed samples: 17295360 | consumed tokens: 35420897280 | elapsed time per iteration (s): 0.43 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.978448E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.288 | TFLOPs: 31.13 | +7: iteration 67570/ 173500 | consumed samples: 17297920 | consumed tokens: 35426140160 | elapsed time per iteration (s): 0.43 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.978057E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.926 | TFLOPs: 31.42 | +7: iteration 67580/ 173500 | consumed samples: 17300480 | consumed tokens: 35431383040 | elapsed time per iteration (s): 0.43 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.987877E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.388 | TFLOPs: 31.13 | +7: iteration 67590/ 173500 | consumed samples: 17303040 | consumed tokens: 35436625920 | elapsed time per iteration (s): 0.43 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.982599E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.080 | TFLOPs: 31.49 | +7: iteration 67600/ 173500 | consumed samples: 17305600 | consumed tokens: 35441868800 | elapsed time per iteration (s): 0.42 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.954698E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.462 | TFLOPs: 31.72 | +7: iteration 67610/ 173500 | consumed samples: 17308160 | consumed tokens: 35447111680 | elapsed time per iteration (s): 0.46 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.977945E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 550.987 | TFLOPs: 28.91 | +7: iteration 67620/ 173500 | consumed samples: 17310720 | consumed tokens: 35452354560 | elapsed time per iteration (s): 0.44 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.969618E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.930 | TFLOPs: 30.74 | +7: iteration 67630/ 173500 | consumed samples: 17313280 | consumed tokens: 35457597440 | elapsed time per iteration (s): 0.42 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.975874E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.702 | TFLOPs: 32.10 | +7: iteration 67640/ 173500 | consumed samples: 17315840 | consumed tokens: 35462840320 | elapsed time per iteration (s): 0.42 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.985079E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.572 | TFLOPs: 31.88 | +7: iteration 67650/ 173500 | consumed samples: 17318400 | consumed tokens: 35468083200 | elapsed time per iteration (s): 0.43 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.971610E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.045 | TFLOPs: 31.12 | +7: iteration 67660/ 173500 | consumed samples: 17320960 | consumed tokens: 35473326080 | elapsed time per iteration (s): 0.42 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.989038E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.108 | TFLOPs: 31.64 | +7: iteration 67670/ 173500 | consumed samples: 17323520 | consumed tokens: 35478568960 | elapsed time per iteration (s): 0.42 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.983322E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.015 | TFLOPs: 31.74 | +7: iteration 67680/ 173500 | consumed samples: 17326080 | consumed tokens: 35483811840 | elapsed time per iteration (s): 0.43 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.975885E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.777 | TFLOPs: 31.36 | +7: iteration 67690/ 173500 | consumed samples: 17328640 | consumed tokens: 35489054720 | elapsed time per iteration (s): 0.42 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.977072E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.523 | TFLOPs: 31.82 | +7: iteration 67700/ 173500 | consumed samples: 17331200 | consumed tokens: 35494297600 | elapsed time per iteration (s): 0.43 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.971551E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.339 | TFLOPs: 31.60 | +7: iteration 67710/ 173500 | consumed samples: 17333760 | consumed tokens: 35499540480 | elapsed time per iteration (s): 0.43 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.983155E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.376 | TFLOPs: 31.03 | +7: iteration 67720/ 173500 | consumed samples: 17336320 | consumed tokens: 35504783360 | elapsed time per iteration (s): 0.44 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.962483E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.847 | TFLOPs: 30.27 | +7: iteration 67730/ 173500 | consumed samples: 17338880 | consumed tokens: 35510026240 | elapsed time per iteration (s): 0.43 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.972495E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.810 | TFLOPs: 31.42 | +7: iteration 67740/ 173500 | consumed samples: 17341440 | consumed tokens: 35515269120 | elapsed time per iteration (s): 0.43 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.974948E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.013 | TFLOPs: 31.53 | +7: iteration 67750/ 173500 | consumed samples: 17344000 | consumed tokens: 35520512000 | elapsed time per iteration (s): 0.42 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.979013E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.532 | TFLOPs: 32.03 | +7: iteration 67760/ 173500 | consumed samples: 17346560 | consumed tokens: 35525754880 | elapsed time per iteration (s): 0.43 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.968615E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.869 | TFLOPs: 31.37 | +7: iteration 67770/ 173500 | consumed samples: 17349120 | consumed tokens: 35530997760 | elapsed time per iteration (s): 0.42 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.977868E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.493 | TFLOPs: 31.77 | +7: iteration 67780/ 173500 | consumed samples: 17351680 | consumed tokens: 35536240640 | elapsed time per iteration (s): 0.43 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.969178E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.745 | TFLOPs: 31.52 | +7: iteration 67790/ 173500 | consumed samples: 17354240 | consumed tokens: 35541483520 | elapsed time per iteration (s): 0.42 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.968781E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.909 | TFLOPs: 31.79 | +7: iteration 67800/ 173500 | consumed samples: 17356800 | consumed tokens: 35546726400 | elapsed time per iteration (s): 0.42 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.974378E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.796 | TFLOPs: 32.00 | +7: iteration 67810/ 173500 | consumed samples: 17359360 | consumed tokens: 35551969280 | elapsed time per iteration (s): 0.43 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.986070E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.989 | TFLOPs: 31.11 | +7: iteration 67820/ 173500 | consumed samples: 17361920 | consumed tokens: 35557212160 | elapsed time per iteration (s): 0.42 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.971455E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.762 | TFLOPs: 31.78 | +7: iteration 67830/ 173500 | consumed samples: 17364480 | consumed tokens: 35562455040 | elapsed time per iteration (s): 0.42 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.974524E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.394 | TFLOPs: 31.76 | +7: iteration 67840/ 173500 | consumed samples: 17367040 | consumed tokens: 35567697920 | elapsed time per iteration (s): 0.43 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.978505E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.190 | TFLOPs: 31.54 | +7: iteration 67850/ 173500 | consumed samples: 17369600 | consumed tokens: 35572940800 | elapsed time per iteration (s): 0.43 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.983561E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.312 | TFLOPs: 31.60 | +7: iteration 67860/ 173500 | consumed samples: 17372160 | consumed tokens: 35578183680 | elapsed time per iteration (s): 0.42 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.976739E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.538 | TFLOPs: 31.98 | +7: iteration 67870/ 173500 | consumed samples: 17374720 | consumed tokens: 35583426560 | elapsed time per iteration (s): 0.43 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.974820E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.029 | TFLOPs: 31.27 | +7: iteration 67880/ 173500 | consumed samples: 17377280 | consumed tokens: 35588669440 | elapsed time per iteration (s): 0.42 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.979170E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.738 | TFLOPs: 31.62 | +7: iteration 67890/ 173500 | consumed samples: 17379840 | consumed tokens: 35593912320 | elapsed time per iteration (s): 0.42 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.978134E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.857 | TFLOPs: 31.84 | +7: iteration 67900/ 173500 | consumed samples: 17382400 | consumed tokens: 35599155200 | elapsed time per iteration (s): 0.42 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.974493E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.574 | TFLOPs: 31.77 | +7: iteration 67910/ 173500 | consumed samples: 17384960 | consumed tokens: 35604398080 | elapsed time per iteration (s): 0.42 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.976312E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.368 | TFLOPs: 31.71 | +7: iteration 67920/ 173500 | consumed samples: 17387520 | consumed tokens: 35609640960 | elapsed time per iteration (s): 0.43 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.983654E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.519 | TFLOPs: 31.35 | +7: iteration 67930/ 173500 | consumed samples: 17390080 | consumed tokens: 35614883840 | elapsed time per iteration (s): 0.43 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.968998E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.765 | TFLOPs: 31.26 | +7: iteration 67940/ 173500 | consumed samples: 17392640 | consumed tokens: 35620126720 | elapsed time per iteration (s): 0.42 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.982609E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.910 | TFLOPs: 31.79 | +7: iteration 67950/ 173500 | consumed samples: 17395200 | consumed tokens: 35625369600 | elapsed time per iteration (s): 0.42 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.977690E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.071 | TFLOPs: 31.75 | +7: iteration 67960/ 173500 | consumed samples: 17397760 | consumed tokens: 35630612480 | elapsed time per iteration (s): 0.42 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.972459E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.822 | TFLOPs: 31.73 | +7: iteration 67970/ 173500 | consumed samples: 17400320 | consumed tokens: 35635855360 | elapsed time per iteration (s): 0.43 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.977579E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.729 | TFLOPs: 31.31 | +7: iteration 67980/ 173500 | consumed samples: 17402880 | consumed tokens: 35641098240 | elapsed time per iteration (s): 0.42 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.982502E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.528 | TFLOPs: 31.77 | +7: iteration 67990/ 173500 | consumed samples: 17405440 | consumed tokens: 35646341120 | elapsed time per iteration (s): 0.43 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.973470E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.838 | TFLOPs: 31.47 | +0: [2023-03-17 07:15:27,073] [INFO] [logging.py:68:log_dist] [Rank 0] step=68000, skipped=0, lr=[0.00014160436454810027, 0.00014160436454810027, 0.00014160436454810027], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 68000/ 173500 | consumed samples: 17408000 | consumed tokens: 35651584000 | elapsed time per iteration (s): 0.42 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.972041E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.367 | TFLOPs: 31.76 | +0: steps: 68000 loss: 2.9720 iter time (s): 0.427 samples/sec: 598.917 +7: iteration 68010/ 173500 | consumed samples: 17410560 | consumed tokens: 35656826880 | elapsed time per iteration (s): 0.43 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.983253E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.121 | TFLOPs: 31.43 | +7: iteration 68020/ 173500 | consumed samples: 17413120 | consumed tokens: 35662069760 | elapsed time per iteration (s): 0.42 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.977105E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.696 | TFLOPs: 31.99 | +7: iteration 68030/ 173500 | consumed samples: 17415680 | consumed tokens: 35667312640 | elapsed time per iteration (s): 0.42 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.985589E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.390 | TFLOPs: 31.76 | +7: iteration 68040/ 173500 | consumed samples: 17418240 | consumed tokens: 35672555520 | elapsed time per iteration (s): 0.43 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.983393E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.409 | TFLOPs: 31.19 | +7: iteration 68050/ 173500 | consumed samples: 17420800 | consumed tokens: 35677798400 | elapsed time per iteration (s): 0.42 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.982026E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.168 | TFLOPs: 32.01 | +7: iteration 68060/ 173500 | consumed samples: 17423360 | consumed tokens: 35683041280 | elapsed time per iteration (s): 0.42 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.975325E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.426 | TFLOPs: 31.61 | +7: iteration 68070/ 173500 | consumed samples: 17425920 | consumed tokens: 35688284160 | elapsed time per iteration (s): 0.43 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.968889E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.354 | TFLOPs: 31.50 | +7: iteration 68080/ 173500 | consumed samples: 17428480 | consumed tokens: 35693527040 | elapsed time per iteration (s): 0.42 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.972753E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.462 | TFLOPs: 31.61 | +7: iteration 68090/ 173500 | consumed samples: 17431040 | consumed tokens: 35698769920 | elapsed time per iteration (s): 0.43 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.979103E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.741 | TFLOPs: 31.47 | +7: iteration 68100/ 173500 | consumed samples: 17433600 | consumed tokens: 35704012800 | elapsed time per iteration (s): 0.43 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.988255E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.744 | TFLOPs: 31.47 | +7: iteration 68110/ 173500 | consumed samples: 17436160 | consumed tokens: 35709255680 | elapsed time per iteration (s): 0.43 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.972708E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.338 | TFLOPs: 31.45 | +7: iteration 68120/ 173500 | consumed samples: 17438720 | consumed tokens: 35714498560 | elapsed time per iteration (s): 0.42 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.973726E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.240 | TFLOPs: 31.97 | +7: iteration 68130/ 173500 | consumed samples: 17441280 | consumed tokens: 35719741440 | elapsed time per iteration (s): 0.42 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.967932E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.962 | TFLOPs: 31.95 | +7: iteration 68140/ 173500 | consumed samples: 17443840 | consumed tokens: 35724984320 | elapsed time per iteration (s): 0.43 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.975225E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.950 | TFLOPs: 31.32 | +7: iteration 68150/ 173500 | consumed samples: 17446400 | consumed tokens: 35730227200 | elapsed time per iteration (s): 0.42 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.978975E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.719 | TFLOPs: 31.73 | +7: iteration 68160/ 173500 | consumed samples: 17448960 | consumed tokens: 35735470080 | elapsed time per iteration (s): 0.42 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.978394E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.646 | TFLOPs: 31.93 | +7: iteration 68170/ 173500 | consumed samples: 17451520 | consumed tokens: 35740712960 | elapsed time per iteration (s): 0.42 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.979353E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.744 | TFLOPs: 31.73 | +7: iteration 68180/ 173500 | consumed samples: 17454080 | consumed tokens: 35745955840 | elapsed time per iteration (s): 0.43 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.986293E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.369 | TFLOPs: 31.55 | +7: iteration 68190/ 173500 | consumed samples: 17456640 | consumed tokens: 35751198720 | elapsed time per iteration (s): 0.43 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.974954E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.747 | TFLOPs: 31.26 | +7: iteration 68200/ 173500 | consumed samples: 17459200 | consumed tokens: 35756441600 | elapsed time per iteration (s): 0.42 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.966778E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.791 | TFLOPs: 31.84 | +7: iteration 68210/ 173500 | consumed samples: 17461760 | consumed tokens: 35761684480 | elapsed time per iteration (s): 0.43 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.966265E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.324 | TFLOPs: 31.55 | +7: iteration 68220/ 173500 | consumed samples: 17464320 | consumed tokens: 35766927360 | elapsed time per iteration (s): 0.42 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.974848E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.298 | TFLOPs: 31.97 | +7: iteration 68230/ 173500 | consumed samples: 17466880 | consumed tokens: 35772170240 | elapsed time per iteration (s): 0.42 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.978118E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.041 | TFLOPs: 31.96 | +7: iteration 68240/ 173500 | consumed samples: 17469440 | consumed tokens: 35777413120 | elapsed time per iteration (s): 0.43 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.983268E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.970 | TFLOPs: 31.53 | +7: iteration 68250/ 173500 | consumed samples: 17472000 | consumed tokens: 35782656000 | elapsed time per iteration (s): 0.43 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.990099E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.198 | TFLOPs: 31.39 | +7: iteration 68260/ 173500 | consumed samples: 17474560 | consumed tokens: 35787898880 | elapsed time per iteration (s): 0.42 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.977261E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.275 | TFLOPs: 31.71 | +7: iteration 68270/ 173500 | consumed samples: 17477120 | consumed tokens: 35793141760 | elapsed time per iteration (s): 0.42 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.981997E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.651 | TFLOPs: 31.62 | +7: iteration 68280/ 173500 | consumed samples: 17479680 | consumed tokens: 35798384640 | elapsed time per iteration (s): 0.42 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.986379E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.732 | TFLOPs: 31.62 | +7: iteration 68290/ 173500 | consumed samples: 17482240 | consumed tokens: 35803627520 | elapsed time per iteration (s): 0.42 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.990784E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.346 | TFLOPs: 31.92 | +7: iteration 68300/ 173500 | consumed samples: 17484800 | consumed tokens: 35808870400 | elapsed time per iteration (s): 0.43 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.972535E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.305 | TFLOPs: 31.13 | +7: iteration 68310/ 173500 | consumed samples: 17487360 | consumed tokens: 35814113280 | elapsed time per iteration (s): 0.42 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.976324E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.225 | TFLOPs: 31.97 | +7: iteration 68320/ 173500 | consumed samples: 17489920 | consumed tokens: 35819356160 | elapsed time per iteration (s): 0.42 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.976886E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.014 | TFLOPs: 31.95 | +7: iteration 68330/ 173500 | consumed samples: 17492480 | consumed tokens: 35824599040 | elapsed time per iteration (s): 0.42 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.974588E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.163 | TFLOPs: 31.96 | +7: iteration 68340/ 173500 | consumed samples: 17495040 | consumed tokens: 35829841920 | elapsed time per iteration (s): 0.42 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.987733E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.780 | TFLOPs: 31.73 | +7: iteration 68350/ 173500 | consumed samples: 17497600 | consumed tokens: 35835084800 | elapsed time per iteration (s): 0.42 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.979240E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.908 | TFLOPs: 31.74 | +7: iteration 68360/ 173500 | consumed samples: 17500160 | consumed tokens: 35840327680 | elapsed time per iteration (s): 0.43 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.978048E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.687 | TFLOPs: 31.52 | +7: iteration 68370/ 173500 | consumed samples: 17502720 | consumed tokens: 35845570560 | elapsed time per iteration (s): 0.43 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.976076E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.355 | TFLOPs: 31.55 | +7: iteration 68380/ 173500 | consumed samples: 17505280 | consumed tokens: 35850813440 | elapsed time per iteration (s): 0.43 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.969150E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.571 | TFLOPs: 31.56 | +7: iteration 68390/ 173500 | consumed samples: 17507840 | consumed tokens: 35856056320 | elapsed time per iteration (s): 0.42 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.953779E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.782 | TFLOPs: 31.73 | +7: iteration 68400/ 173500 | consumed samples: 17510400 | consumed tokens: 35861299200 | elapsed time per iteration (s): 0.42 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.970082E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.021 | TFLOPs: 31.95 | +7: iteration 68410/ 173500 | consumed samples: 17512960 | consumed tokens: 35866542080 | elapsed time per iteration (s): 0.42 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.977168E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.956 | TFLOPs: 31.95 | +7: iteration 68420/ 173500 | consumed samples: 17515520 | consumed tokens: 35871784960 | elapsed time per iteration (s): 0.43 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.976916E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.038 | TFLOPs: 31.43 | +7: iteration 68430/ 173500 | consumed samples: 17518080 | consumed tokens: 35877027840 | elapsed time per iteration (s): 0.42 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.987175E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.475 | TFLOPs: 31.93 | +7: iteration 68440/ 173500 | consumed samples: 17520640 | consumed tokens: 35882270720 | elapsed time per iteration (s): 0.42 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.994137E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.458 | TFLOPs: 31.71 | +7: iteration 68450/ 173500 | consumed samples: 17523200 | consumed tokens: 35887513600 | elapsed time per iteration (s): 0.42 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.958214E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.373 | TFLOPs: 31.97 | +7: iteration 68460/ 173500 | consumed samples: 17525760 | consumed tokens: 35892756480 | elapsed time per iteration (s): 0.43 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.988824E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.128 | TFLOPs: 31.59 | +7: iteration 68470/ 173500 | consumed samples: 17528320 | consumed tokens: 35897999360 | elapsed time per iteration (s): 0.42 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.987023E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.889 | TFLOPs: 31.74 | +7: iteration 68480/ 173500 | consumed samples: 17530880 | consumed tokens: 35903242240 | elapsed time per iteration (s): 0.42 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.978271E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.190 | TFLOPs: 31.75 | +7: iteration 68490/ 173500 | consumed samples: 17533440 | consumed tokens: 35908485120 | elapsed time per iteration (s): 0.42 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.975684E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.159 | TFLOPs: 31.70 | +7: iteration 68500/ 173500 | consumed samples: 17536000 | consumed tokens: 35913728000 | elapsed time per iteration (s): 0.42 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.975977E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.575 | TFLOPs: 31.72 | +7: iteration 68510/ 173500 | consumed samples: 17538560 | consumed tokens: 35918970880 | elapsed time per iteration (s): 0.43 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.982047E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.198 | TFLOPs: 31.18 | +7: iteration 68520/ 173500 | consumed samples: 17541120 | consumed tokens: 35924213760 | elapsed time per iteration (s): 0.42 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.971923E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.305 | TFLOPs: 31.81 | +7: iteration 68530/ 173500 | consumed samples: 17543680 | consumed tokens: 35929456640 | elapsed time per iteration (s): 0.42 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.992083E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.811 | TFLOPs: 31.94 | +7: iteration 68540/ 173500 | consumed samples: 17546240 | consumed tokens: 35934699520 | elapsed time per iteration (s): 0.42 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.983688E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.735 | TFLOPs: 31.94 | +7: iteration 68550/ 173500 | consumed samples: 17548800 | consumed tokens: 35939942400 | elapsed time per iteration (s): 0.42 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.985823E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.398 | TFLOPs: 31.82 | +7: iteration 68560/ 173500 | consumed samples: 17551360 | consumed tokens: 35945185280 | elapsed time per iteration (s): 0.42 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.966490E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.715 | TFLOPs: 31.94 | +7: iteration 68570/ 173500 | consumed samples: 17553920 | consumed tokens: 35950428160 | elapsed time per iteration (s): 0.42 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.966635E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.782 | TFLOPs: 31.94 | +7: iteration 68580/ 173500 | consumed samples: 17556480 | consumed tokens: 35955671040 | elapsed time per iteration (s): 0.42 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.990212E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.553 | TFLOPs: 31.93 | +7: iteration 68590/ 173500 | consumed samples: 17559040 | consumed tokens: 35960913920 | elapsed time per iteration (s): 0.42 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.966334E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.853 | TFLOPs: 31.68 | +7: iteration 68600/ 173500 | consumed samples: 17561600 | consumed tokens: 35966156800 | elapsed time per iteration (s): 0.42 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.975083E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.105 | TFLOPs: 31.91 | +7: iteration 68610/ 173500 | consumed samples: 17564160 | consumed tokens: 35971399680 | elapsed time per iteration (s): 0.42 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.971488E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.217 | TFLOPs: 31.81 | +7: iteration 68620/ 173500 | consumed samples: 17566720 | consumed tokens: 35976642560 | elapsed time per iteration (s): 0.44 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.969514E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.645 | TFLOPs: 30.73 | +7: iteration 68630/ 173500 | consumed samples: 17569280 | consumed tokens: 35981885440 | elapsed time per iteration (s): 0.42 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.968773E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.821 | TFLOPs: 32.00 | +7: iteration 68640/ 173500 | consumed samples: 17571840 | consumed tokens: 35987128320 | elapsed time per iteration (s): 0.42 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.954152E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.075 | TFLOPs: 31.96 | +7: iteration 68650/ 173500 | consumed samples: 17574400 | consumed tokens: 35992371200 | elapsed time per iteration (s): 0.43 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.983426E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.408 | TFLOPs: 31.40 | +7: iteration 68660/ 173500 | consumed samples: 17576960 | consumed tokens: 35997614080 | elapsed time per iteration (s): 0.42 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.983769E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.855 | TFLOPs: 31.95 | +7: iteration 68670/ 173500 | consumed samples: 17579520 | consumed tokens: 36002856960 | elapsed time per iteration (s): 0.42 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.989608E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.245 | TFLOPs: 31.81 | +7: iteration 68680/ 173500 | consumed samples: 17582080 | consumed tokens: 36008099840 | elapsed time per iteration (s): 0.42 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.977548E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.999 | TFLOPs: 31.95 | +7: iteration 68690/ 173500 | consumed samples: 17584640 | consumed tokens: 36013342720 | elapsed time per iteration (s): 0.42 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.974700E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.611 | TFLOPs: 31.72 | +7: iteration 68700/ 173500 | consumed samples: 17587200 | consumed tokens: 36018585600 | elapsed time per iteration (s): 0.42 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.973828E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.861 | TFLOPs: 31.74 | +7: iteration 68710/ 173500 | consumed samples: 17589760 | consumed tokens: 36023828480 | elapsed time per iteration (s): 0.42 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.983033E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.479 | TFLOPs: 31.93 | +7: iteration 68720/ 173500 | consumed samples: 17592320 | consumed tokens: 36029071360 | elapsed time per iteration (s): 0.42 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.989090E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.195 | TFLOPs: 31.91 | +7: iteration 68730/ 173500 | consumed samples: 17594880 | consumed tokens: 36034314240 | elapsed time per iteration (s): 0.42 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.969589E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.389 | TFLOPs: 31.92 | +7: iteration 68740/ 173500 | consumed samples: 17597440 | consumed tokens: 36039557120 | elapsed time per iteration (s): 0.42 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.981096E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.866 | TFLOPs: 31.95 | +7: iteration 68750/ 173500 | consumed samples: 17600000 | consumed tokens: 36044800000 | elapsed time per iteration (s): 0.42 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.997956E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.700 | TFLOPs: 31.94 | +7: iteration 68760/ 173500 | consumed samples: 17602560 | consumed tokens: 36050042880 | elapsed time per iteration (s): 0.42 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.998760E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.736 | TFLOPs: 31.68 | +7: iteration 68770/ 173500 | consumed samples: 17605120 | consumed tokens: 36055285760 | elapsed time per iteration (s): 0.42 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.980788E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.901 | TFLOPs: 31.95 | +7: iteration 68780/ 173500 | consumed samples: 17607680 | consumed tokens: 36060528640 | elapsed time per iteration (s): 0.42 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.967516E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.467 | TFLOPs: 31.93 | +7: iteration 68790/ 173500 | consumed samples: 17610240 | consumed tokens: 36065771520 | elapsed time per iteration (s): 0.42 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.988667E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.615 | TFLOPs: 31.93 | +7: iteration 68800/ 173500 | consumed samples: 17612800 | consumed tokens: 36071014400 | elapsed time per iteration (s): 0.42 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.984868E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.286 | TFLOPs: 31.92 | +7: iteration 68810/ 173500 | consumed samples: 17615360 | consumed tokens: 36076257280 | elapsed time per iteration (s): 0.42 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 3.002479E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.325 | TFLOPs: 31.92 | +7: iteration 68820/ 173500 | consumed samples: 17617920 | consumed tokens: 36081500160 | elapsed time per iteration (s): 0.42 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.980841E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.085 | TFLOPs: 31.70 | +7: iteration 68830/ 173500 | consumed samples: 17620480 | consumed tokens: 36086743040 | elapsed time per iteration (s): 0.43 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.980272E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.143 | TFLOPs: 31.49 | +7: iteration 68840/ 173500 | consumed samples: 17623040 | consumed tokens: 36091985920 | elapsed time per iteration (s): 0.43 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.981048E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.793 | TFLOPs: 31.42 | +7: iteration 68850/ 173500 | consumed samples: 17625600 | consumed tokens: 36097228800 | elapsed time per iteration (s): 0.42 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.971437E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.562 | TFLOPs: 31.93 | +7: iteration 68860/ 173500 | consumed samples: 17628160 | consumed tokens: 36102471680 | elapsed time per iteration (s): 0.42 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.980165E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.748 | TFLOPs: 31.94 | +7: iteration 68870/ 173500 | consumed samples: 17630720 | consumed tokens: 36107714560 | elapsed time per iteration (s): 0.42 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.965729E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.418 | TFLOPs: 31.92 | +7: iteration 68880/ 173500 | consumed samples: 17633280 | consumed tokens: 36112957440 | elapsed time per iteration (s): 0.42 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.970993E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.819 | TFLOPs: 31.73 | +7: iteration 68890/ 173500 | consumed samples: 17635840 | consumed tokens: 36118200320 | elapsed time per iteration (s): 0.44 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.964444E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.488 | TFLOPs: 30.51 | +7: iteration 68900/ 173500 | consumed samples: 17638400 | consumed tokens: 36123443200 | elapsed time per iteration (s): 0.42 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.969308E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.036 | TFLOPs: 31.69 | +7: iteration 68910/ 173500 | consumed samples: 17640960 | consumed tokens: 36128686080 | elapsed time per iteration (s): 0.42 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.983788E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.233 | TFLOPs: 31.97 | +7: iteration 68920/ 173500 | consumed samples: 17643520 | consumed tokens: 36133928960 | elapsed time per iteration (s): 0.42 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.971641E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.169 | TFLOPs: 31.91 | +7: iteration 68930/ 173500 | consumed samples: 17646080 | consumed tokens: 36139171840 | elapsed time per iteration (s): 0.42 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.980409E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.545 | TFLOPs: 31.82 | +7: iteration 68940/ 173500 | consumed samples: 17648640 | consumed tokens: 36144414720 | elapsed time per iteration (s): 0.42 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.969640E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.281 | TFLOPs: 31.92 | +7: iteration 68950/ 173500 | consumed samples: 17651200 | consumed tokens: 36149657600 | elapsed time per iteration (s): 0.42 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.980295E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.395 | TFLOPs: 31.92 | +7: iteration 68960/ 173500 | consumed samples: 17653760 | consumed tokens: 36154900480 | elapsed time per iteration (s): 0.42 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.974318E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.093 | TFLOPs: 31.91 | +7: iteration 68970/ 173500 | consumed samples: 17656320 | consumed tokens: 36160143360 | elapsed time per iteration (s): 0.42 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.967088E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.102 | TFLOPs: 31.85 | +7: iteration 68980/ 173500 | consumed samples: 17658880 | consumed tokens: 36165386240 | elapsed time per iteration (s): 0.42 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.975734E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.486 | TFLOPs: 31.93 | +7: iteration 68990/ 173500 | consumed samples: 17661440 | consumed tokens: 36170629120 | elapsed time per iteration (s): 0.42 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.965727E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.971 | TFLOPs: 31.69 | +7: iteration 69000/ 173500 | consumed samples: 17664000 | consumed tokens: 36175872000 | elapsed time per iteration (s): 0.42 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.975670E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.585 | TFLOPs: 31.93 | +7: iteration 69010/ 173500 | consumed samples: 17666560 | consumed tokens: 36181114880 | elapsed time per iteration (s): 0.42 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.985828E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.106 | TFLOPs: 31.91 | +7: iteration 69020/ 173500 | consumed samples: 17669120 | consumed tokens: 36186357760 | elapsed time per iteration (s): 0.42 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.970744E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.595 | TFLOPs: 31.93 | +7: iteration 69030/ 173500 | consumed samples: 17671680 | consumed tokens: 36191600640 | elapsed time per iteration (s): 0.42 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.972423E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.417 | TFLOPs: 31.92 | +7: iteration 69040/ 173500 | consumed samples: 17674240 | consumed tokens: 36196843520 | elapsed time per iteration (s): 0.42 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.974519E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.135 | TFLOPs: 31.70 | +7: iteration 69050/ 173500 | consumed samples: 17676800 | consumed tokens: 36202086400 | elapsed time per iteration (s): 0.42 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.977829E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.545 | TFLOPs: 31.93 | +7: iteration 69060/ 173500 | consumed samples: 17679360 | consumed tokens: 36207329280 | elapsed time per iteration (s): 0.43 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.979642E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.002 | TFLOPs: 31.06 | +7: iteration 69070/ 173500 | consumed samples: 17681920 | consumed tokens: 36212572160 | elapsed time per iteration (s): 0.44 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.983171E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.807 | TFLOPs: 30.74 | +7: iteration 69080/ 173500 | consumed samples: 17684480 | consumed tokens: 36217815040 | elapsed time per iteration (s): 0.42 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.975583E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.848 | TFLOPs: 31.63 | +7: iteration 69090/ 173500 | consumed samples: 17687040 | consumed tokens: 36223057920 | elapsed time per iteration (s): 0.42 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.981682E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.687 | TFLOPs: 31.94 | +7: iteration 69100/ 173500 | consumed samples: 17689600 | consumed tokens: 36228300800 | elapsed time per iteration (s): 0.42 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.982015E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.465 | TFLOPs: 31.93 | +7: iteration 69110/ 173500 | consumed samples: 17692160 | consumed tokens: 36233543680 | elapsed time per iteration (s): 0.42 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.981792E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.341 | TFLOPs: 31.92 | +7: iteration 69120/ 173500 | consumed samples: 17694720 | consumed tokens: 36238786560 | elapsed time per iteration (s): 0.42 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.967660E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.838 | TFLOPs: 31.89 | +7: iteration 69130/ 173500 | consumed samples: 17697280 | consumed tokens: 36244029440 | elapsed time per iteration (s): 0.43 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.985527E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.814 | TFLOPs: 31.16 | +7: iteration 69140/ 173500 | consumed samples: 17699840 | consumed tokens: 36249272320 | elapsed time per iteration (s): 0.45 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.972469E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.280 | TFLOPs: 29.61 | +7: iteration 69150/ 173500 | consumed samples: 17702400 | consumed tokens: 36254515200 | elapsed time per iteration (s): 0.44 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.984026E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.232 | TFLOPs: 30.34 | +7: iteration 69160/ 173500 | consumed samples: 17704960 | consumed tokens: 36259758080 | elapsed time per iteration (s): 0.42 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.968282E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.880 | TFLOPs: 32.00 | +7: iteration 69170/ 173500 | consumed samples: 17707520 | consumed tokens: 36265000960 | elapsed time per iteration (s): 0.42 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.969720E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.570 | TFLOPs: 31.93 | +7: iteration 69180/ 173500 | consumed samples: 17710080 | consumed tokens: 36270243840 | elapsed time per iteration (s): 0.42 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.964311E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.739 | TFLOPs: 31.73 | +7: iteration 69190/ 173500 | consumed samples: 17712640 | consumed tokens: 36275486720 | elapsed time per iteration (s): 0.42 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.974903E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.896 | TFLOPs: 31.95 | +7: iteration 69200/ 173500 | consumed samples: 17715200 | consumed tokens: 36280729600 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.956338E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.343 | TFLOPs: 31.60 | +7: iteration 69210/ 173500 | consumed samples: 17717760 | consumed tokens: 36285972480 | elapsed time per iteration (s): 0.44 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.978465E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.157 | TFLOPs: 30.49 | +7: iteration 69220/ 173500 | consumed samples: 17720320 | consumed tokens: 36291215360 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.972535E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.923 | TFLOPs: 31.16 | +7: iteration 69230/ 173500 | consumed samples: 17722880 | consumed tokens: 36296458240 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.970805E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.128 | TFLOPs: 31.38 | +7: iteration 69240/ 173500 | consumed samples: 17725440 | consumed tokens: 36301701120 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.975517E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.101 | TFLOPs: 31.38 | +7: iteration 69250/ 173500 | consumed samples: 17728000 | consumed tokens: 36306944000 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.960627E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.575 | TFLOPs: 31.09 | +7: iteration 69260/ 173500 | consumed samples: 17730560 | consumed tokens: 36312186880 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.968643E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.713 | TFLOPs: 31.52 | +7: iteration 69270/ 173500 | consumed samples: 17733120 | consumed tokens: 36317429760 | elapsed time per iteration (s): 0.44 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.985426E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.941 | TFLOPs: 30.59 | +7: iteration 69280/ 173500 | consumed samples: 17735680 | consumed tokens: 36322672640 | elapsed time per iteration (s): 0.43 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.982179E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.209 | TFLOPs: 31.54 | +7: iteration 69290/ 173500 | consumed samples: 17738240 | consumed tokens: 36327915520 | elapsed time per iteration (s): 0.42 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.975786E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.992 | TFLOPs: 31.95 | +7: iteration 69300/ 173500 | consumed samples: 17740800 | consumed tokens: 36333158400 | elapsed time per iteration (s): 0.43 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.978109E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.459 | TFLOPs: 31.56 | +7: iteration 69310/ 173500 | consumed samples: 17743360 | consumed tokens: 36338401280 | elapsed time per iteration (s): 0.44 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.960899E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.046 | TFLOPs: 30.75 | +7: iteration 69320/ 173500 | consumed samples: 17745920 | consumed tokens: 36343644160 | elapsed time per iteration (s): 0.43 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.957415E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.792 | TFLOPs: 31.52 | +7: iteration 69330/ 173500 | consumed samples: 17748480 | consumed tokens: 36348887040 | elapsed time per iteration (s): 0.43 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.952438E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.818 | TFLOPs: 31.52 | +7: iteration 69340/ 173500 | consumed samples: 17751040 | consumed tokens: 36354129920 | elapsed time per iteration (s): 0.43 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.958133E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.339 | TFLOPs: 30.92 | +7: iteration 69350/ 173500 | consumed samples: 17753600 | consumed tokens: 36359372800 | elapsed time per iteration (s): 0.43 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.972094E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.911 | TFLOPs: 31.48 | +7: iteration 69360/ 173500 | consumed samples: 17756160 | consumed tokens: 36364615680 | elapsed time per iteration (s): 0.43 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.969928E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.871 | TFLOPs: 31.42 | +7: iteration 69370/ 173500 | consumed samples: 17758720 | consumed tokens: 36369858560 | elapsed time per iteration (s): 0.42 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.979182E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.617 | TFLOPs: 31.78 | +7: iteration 69380/ 173500 | consumed samples: 17761280 | consumed tokens: 36375101440 | elapsed time per iteration (s): 0.42 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.979036E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.996 | TFLOPs: 31.69 | +7: iteration 69390/ 173500 | consumed samples: 17763840 | consumed tokens: 36380344320 | elapsed time per iteration (s): 0.42 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.986940E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.753 | TFLOPs: 31.78 | +7: iteration 69400/ 173500 | consumed samples: 17766400 | consumed tokens: 36385587200 | elapsed time per iteration (s): 0.42 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.980058E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.934 | TFLOPs: 31.79 | +7: iteration 69410/ 173500 | consumed samples: 17768960 | consumed tokens: 36390830080 | elapsed time per iteration (s): 0.42 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.970570E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.633 | TFLOPs: 31.78 | +7: iteration 69420/ 173500 | consumed samples: 17771520 | consumed tokens: 36396072960 | elapsed time per iteration (s): 0.43 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.965656E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.377 | TFLOPs: 31.55 | +7: iteration 69430/ 173500 | consumed samples: 17774080 | consumed tokens: 36401315840 | elapsed time per iteration (s): 0.42 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.972494E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.859 | TFLOPs: 31.95 | +7: iteration 69440/ 173500 | consumed samples: 17776640 | consumed tokens: 36406558720 | elapsed time per iteration (s): 0.42 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.972919E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.795 | TFLOPs: 31.73 | +7: iteration 69450/ 173500 | consumed samples: 17779200 | consumed tokens: 36411801600 | elapsed time per iteration (s): 0.43 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.984180E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.060 | TFLOPs: 31.48 | +7: iteration 69460/ 173500 | consumed samples: 17781760 | consumed tokens: 36417044480 | elapsed time per iteration (s): 0.43 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.993252E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.022 | TFLOPs: 31.53 | +7: iteration 69470/ 173500 | consumed samples: 17784320 | consumed tokens: 36422287360 | elapsed time per iteration (s): 0.43 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.973174E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.593 | TFLOPs: 31.41 | +7: iteration 69480/ 173500 | consumed samples: 17786880 | consumed tokens: 36427530240 | elapsed time per iteration (s): 0.42 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.984869E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.433 | TFLOPs: 31.66 | +7: iteration 69490/ 173500 | consumed samples: 17789440 | consumed tokens: 36432773120 | elapsed time per iteration (s): 0.43 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.977917E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.533 | TFLOPs: 31.09 | +7: iteration 69500/ 173500 | consumed samples: 17792000 | consumed tokens: 36438016000 | elapsed time per iteration (s): 0.44 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.981636E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.712 | TFLOPs: 30.42 | +7: iteration 69510/ 173500 | consumed samples: 17794560 | consumed tokens: 36443258880 | elapsed time per iteration (s): 0.42 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.965151E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.488 | TFLOPs: 31.87 | +7: iteration 69520/ 173500 | consumed samples: 17797120 | consumed tokens: 36448501760 | elapsed time per iteration (s): 0.44 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.976604E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.383 | TFLOPs: 30.77 | +7: iteration 69530/ 173500 | consumed samples: 17799680 | consumed tokens: 36453744640 | elapsed time per iteration (s): 0.42 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.980411E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.626 | TFLOPs: 31.78 | +7: iteration 69540/ 173500 | consumed samples: 17802240 | consumed tokens: 36458987520 | elapsed time per iteration (s): 0.43 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.955778E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.176 | TFLOPs: 31.54 | +7: iteration 69550/ 173500 | consumed samples: 17804800 | consumed tokens: 36464230400 | elapsed time per iteration (s): 0.42 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.979615E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.463 | TFLOPs: 31.72 | +7: iteration 69560/ 173500 | consumed samples: 17807360 | consumed tokens: 36469473280 | elapsed time per iteration (s): 0.43 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.962283E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.744 | TFLOPs: 31.47 | +7: iteration 69570/ 173500 | consumed samples: 17809920 | consumed tokens: 36474716160 | elapsed time per iteration (s): 0.42 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.975033E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.084 | TFLOPs: 31.96 | +7: iteration 69580/ 173500 | consumed samples: 17812480 | consumed tokens: 36479959040 | elapsed time per iteration (s): 0.43 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.991004E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.426 | TFLOPs: 31.56 | +7: iteration 69590/ 173500 | consumed samples: 17815040 | consumed tokens: 36485201920 | elapsed time per iteration (s): 0.43 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.961046E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.027 | TFLOPs: 31.17 | +7: iteration 69600/ 173500 | consumed samples: 17817600 | consumed tokens: 36490444800 | elapsed time per iteration (s): 0.43 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.976811E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.062 | TFLOPs: 31.17 | +7: iteration 69610/ 173500 | consumed samples: 17820160 | consumed tokens: 36495687680 | elapsed time per iteration (s): 0.46 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.989823E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.373 | TFLOPs: 28.98 | +7: iteration 69620/ 173500 | consumed samples: 17822720 | consumed tokens: 36500930560 | elapsed time per iteration (s): 0.44 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.984288E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.204 | TFLOPs: 30.81 | +7: iteration 69630/ 173500 | consumed samples: 17825280 | consumed tokens: 36506173440 | elapsed time per iteration (s): 0.44 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.974111E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.924 | TFLOPs: 30.38 | +7: iteration 69640/ 173500 | consumed samples: 17827840 | consumed tokens: 36511416320 | elapsed time per iteration (s): 0.47 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.959683E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 543.615 | TFLOPs: 28.52 | +7: iteration 69650/ 173500 | consumed samples: 17830400 | consumed tokens: 36516659200 | elapsed time per iteration (s): 0.43 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.969702E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.175 | TFLOPs: 31.23 | +7: iteration 69660/ 173500 | consumed samples: 17832960 | consumed tokens: 36521902080 | elapsed time per iteration (s): 0.43 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.972345E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.778 | TFLOPs: 31.00 | +7: iteration 69670/ 173500 | consumed samples: 17835520 | consumed tokens: 36527144960 | elapsed time per iteration (s): 0.44 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.963079E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.047 | TFLOPs: 30.49 | +7: iteration 69680/ 173500 | consumed samples: 17838080 | consumed tokens: 36532387840 | elapsed time per iteration (s): 0.44 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.979540E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.471 | TFLOPs: 30.88 | +7: iteration 69690/ 173500 | consumed samples: 17840640 | consumed tokens: 36537630720 | elapsed time per iteration (s): 0.43 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.976332E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.191 | TFLOPs: 31.28 | +7: iteration 69700/ 173500 | consumed samples: 17843200 | consumed tokens: 36542873600 | elapsed time per iteration (s): 0.42 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.964919E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.265 | TFLOPs: 31.70 | +7: iteration 69710/ 173500 | consumed samples: 17845760 | consumed tokens: 36548116480 | elapsed time per iteration (s): 0.44 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.957075E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.860 | TFLOPs: 30.58 | +7: iteration 69720/ 173500 | consumed samples: 17848320 | consumed tokens: 36553359360 | elapsed time per iteration (s): 0.45 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.977445E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.879 | TFLOPs: 30.06 | +7: iteration 69730/ 173500 | consumed samples: 17850880 | consumed tokens: 36558602240 | elapsed time per iteration (s): 0.43 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.969065E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.001 | TFLOPs: 31.43 | +7: iteration 69740/ 173500 | consumed samples: 17853440 | consumed tokens: 36563845120 | elapsed time per iteration (s): 0.43 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.980086E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.295 | TFLOPs: 31.34 | +7: iteration 69750/ 173500 | consumed samples: 17856000 | consumed tokens: 36569088000 | elapsed time per iteration (s): 0.44 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.959098E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.670 | TFLOPs: 30.57 | +7: iteration 69760/ 173500 | consumed samples: 17858560 | consumed tokens: 36574330880 | elapsed time per iteration (s): 0.47 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.962299E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 545.158 | TFLOPs: 28.60 | +7: iteration 69770/ 173500 | consumed samples: 17861120 | consumed tokens: 36579573760 | elapsed time per iteration (s): 0.45 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.977880E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.171 | TFLOPs: 29.76 | +7: iteration 69780/ 173500 | consumed samples: 17863680 | consumed tokens: 36584816640 | elapsed time per iteration (s): 0.44 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.977994E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.010 | TFLOPs: 30.59 | +7: iteration 69790/ 173500 | consumed samples: 17866240 | consumed tokens: 36590059520 | elapsed time per iteration (s): 0.46 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.972493E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.899 | TFLOPs: 29.11 | +7: iteration 69800/ 173500 | consumed samples: 17868800 | consumed tokens: 36595302400 | elapsed time per iteration (s): 0.45 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.987430E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.510 | TFLOPs: 30.14 | +7: iteration 69810/ 173500 | consumed samples: 17871360 | consumed tokens: 36600545280 | elapsed time per iteration (s): 0.44 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.983274E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.940 | TFLOPs: 30.69 | +7: iteration 69820/ 173500 | consumed samples: 17873920 | consumed tokens: 36605788160 | elapsed time per iteration (s): 0.42 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.971202E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.393 | TFLOPs: 31.82 | +7: iteration 69830/ 173500 | consumed samples: 17876480 | consumed tokens: 36611031040 | elapsed time per iteration (s): 0.42 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.958851E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.536 | TFLOPs: 32.09 | +7: iteration 69840/ 173500 | consumed samples: 17879040 | consumed tokens: 36616273920 | elapsed time per iteration (s): 0.42 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.984699E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.213 | TFLOPs: 31.75 | +7: iteration 69850/ 173500 | consumed samples: 17881600 | consumed tokens: 36621516800 | elapsed time per iteration (s): 0.42 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.971467E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.501 | TFLOPs: 31.82 | +7: iteration 69860/ 173500 | consumed samples: 17884160 | consumed tokens: 36626759680 | elapsed time per iteration (s): 0.43 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.982432E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.566 | TFLOPs: 31.25 | +7: iteration 69870/ 173500 | consumed samples: 17886720 | consumed tokens: 36632002560 | elapsed time per iteration (s): 0.43 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.969156E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.319 | TFLOPs: 31.60 | +7: iteration 69880/ 173500 | consumed samples: 17889280 | consumed tokens: 36637245440 | elapsed time per iteration (s): 0.43 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.971500E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.885 | TFLOPs: 31.58 | +7: iteration 69890/ 173500 | consumed samples: 17891840 | consumed tokens: 36642488320 | elapsed time per iteration (s): 0.42 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.970918E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.658 | TFLOPs: 31.78 | +7: iteration 69900/ 173500 | consumed samples: 17894400 | consumed tokens: 36647731200 | elapsed time per iteration (s): 0.43 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.975171E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.080 | TFLOPs: 31.01 | +7: iteration 69910/ 173500 | consumed samples: 17896960 | consumed tokens: 36652974080 | elapsed time per iteration (s): 0.43 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.987096E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.435 | TFLOPs: 31.14 | +7: iteration 69920/ 173500 | consumed samples: 17899520 | consumed tokens: 36658216960 | elapsed time per iteration (s): 0.42 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.972460E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.615 | TFLOPs: 32.04 | +7: iteration 69930/ 173500 | consumed samples: 17902080 | consumed tokens: 36663459840 | elapsed time per iteration (s): 0.42 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.982481E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.668 | TFLOPs: 31.62 | +7: iteration 69940/ 173500 | consumed samples: 17904640 | consumed tokens: 36668702720 | elapsed time per iteration (s): 0.42 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.990858E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.582 | TFLOPs: 31.98 | +7: iteration 69950/ 173500 | consumed samples: 17907200 | consumed tokens: 36673945600 | elapsed time per iteration (s): 0.43 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.971460E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.959 | TFLOPs: 30.95 | +7: iteration 69960/ 173500 | consumed samples: 17909760 | consumed tokens: 36679188480 | elapsed time per iteration (s): 0.42 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.973617E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.745 | TFLOPs: 31.99 | +7: iteration 69970/ 173500 | consumed samples: 17912320 | consumed tokens: 36684431360 | elapsed time per iteration (s): 0.42 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.968492E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.487 | TFLOPs: 31.98 | +7: iteration 69980/ 173500 | consumed samples: 17914880 | consumed tokens: 36689674240 | elapsed time per iteration (s): 0.42 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.964771E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.593 | TFLOPs: 31.83 | +7: iteration 69990/ 173500 | consumed samples: 17917440 | consumed tokens: 36694917120 | elapsed time per iteration (s): 0.42 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.970388E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.171 | TFLOPs: 31.65 | +0: [2023-03-17 07:29:40,065] [INFO] [logging.py:68:log_dist] [Rank 0] step=70000, skipped=0, lr=[0.0001385013705497804, 0.0001385013705497804, 0.0001385013705497804], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 70000/ 173500 | consumed samples: 17920000 | consumed tokens: 36700160000 | elapsed time per iteration (s): 0.42 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.960332E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.784 | TFLOPs: 31.84 | +0: steps: 70000 loss: 2.9146 iter time (s): 0.424 samples/sec: 603.173 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 70000 | lm loss value: 3.313357E+00 | lm loss PPL: 2.747722E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 70000 to checkpoints_221m91b400m +0: [2023-03-17 07:29:40,231] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step70000 is begin to save! +0: [2023-03-17 07:29:40,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_01-model_00-model_states.pt... +0: [2023-03-17 07:29:40,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_01-model_00-model_states.pt. +0: [2023-03-17 07:29:40,381] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_03-model_00-model_states.pt... +0: [2023-03-17 07:29:40,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_03-model_00-model_states.pt. +0: [2023-03-17 07:29:40,403] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_04-model_00-model_states.pt... +0: [2023-03-17 07:29:40,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_04-model_00-model_states.pt. +0: [2023-03-17 07:29:40,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_05-model_00-model_states.pt... +0: [2023-03-17 07:29:40,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_05-model_00-model_states.pt. +0: [2023-03-17 07:29:40,454] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_06-model_00-model_states.pt... +0: [2023-03-17 07:29:40,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_06-model_00-model_states.pt. +0: [2023-03-17 07:29:40,480] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_07-model_00-model_states.pt... +0: [2023-03-17 07:29:40,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_07-model_00-model_states.pt. +0: [2023-03-17 07:29:40,506] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_08-model_00-model_states.pt... +0: [2023-03-17 07:29:40,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_08-model_00-model_states.pt. +0: [2023-03-17 07:29:40,531] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_09-model_00-model_states.pt... +0: [2023-03-17 07:29:40,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_09-model_00-model_states.pt. +0: [2023-03-17 07:29:40,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_10-model_00-model_states.pt... +0: [2023-03-17 07:29:40,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_10-model_00-model_states.pt. +0: [2023-03-17 07:29:40,581] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_11-model_00-model_states.pt... +0: [2023-03-17 07:29:40,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_11-model_00-model_states.pt. +0: [2023-03-17 07:29:40,606] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_12-model_00-model_states.pt... +0: [2023-03-17 07:29:40,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_12-model_00-model_states.pt. +0: [2023-03-17 07:29:40,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_13-model_00-model_states.pt... +0: [2023-03-17 07:29:40,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_13-model_00-model_states.pt. +0: [2023-03-17 07:29:40,655] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_14-model_00-model_states.pt... +0: [2023-03-17 07:29:40,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_14-model_00-model_states.pt. +0: [2023-03-17 07:29:40,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_15-model_00-model_states.pt... +0: [2023-03-17 07:29:40,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_15-model_00-model_states.pt. +0: [2023-03-17 07:29:40,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_16-model_00-model_states.pt... +0: [2023-03-17 07:29:40,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_16-model_00-model_states.pt. +0: [2023-03-17 07:29:40,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_17-model_00-model_states.pt... +0: [2023-03-17 07:29:40,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_17-model_00-model_states.pt. +0: [2023-03-17 07:29:40,753] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_18-model_00-model_states.pt... +0: [2023-03-17 07:29:40,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_18-model_00-model_states.pt. +0: [2023-03-17 07:29:40,778] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_19-model_00-model_states.pt... +0: [2023-03-17 07:29:40,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_19-model_00-model_states.pt. +0: [2023-03-17 07:29:40,802] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_20-model_00-model_states.pt... +0: [2023-03-17 07:29:40,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_20-model_00-model_states.pt. +0: [2023-03-17 07:29:40,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/layer_22-model_00-model_states.pt... +0: [2023-03-17 07:29:40,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/layer_22-model_00-model_states.pt. +0: [2023-03-17 07:29:40,831] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step70000/mp_rank_00_model_states.pt +0: [2023-03-17 07:29:40,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/mp_rank_00_model_states.pt... +0: [2023-03-17 07:29:40,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/mp_rank_00_model_states.pt. +0: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +7: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +6: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +2: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +4: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +5: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +0: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +1: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +7: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +6: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +2: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +4: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +5: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +3: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +1: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +7: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +6: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +1: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +5: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +3: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +6: [2023-03-17 07:29:40,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +0: [2023-03-17 07:29:40,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 07:29:40,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 07:29:40,904] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 07:29:40,904] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +0: [2023-03-17 07:29:40,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 07:29:40,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 07:29:40,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 07:29:40,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 07:29:40,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +0: [2023-03-17 07:29:40,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +0: [2023-03-17 07:29:40,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 07:29:40,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 07:29:40,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +0: [2023-03-17 07:29:40,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 07:29:40,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 07:29:40,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +0: [2023-03-17 07:29:40,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 07:29:40,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 07:29:40,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +0: [2023-03-17 07:29:40,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 07:29:40,922] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 07:29:40,922] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +2: [2023-03-17 07:29:40,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 07:29:40,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 07:29:40,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 07:29:40,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 07:29:40,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 07:29:40,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 07:29:40,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 07:29:40,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 07:29:40,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +2: [2023-03-17 07:29:40,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +2: [2023-03-17 07:29:40,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +2: [2023-03-17 07:29:40,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +2: [2023-03-17 07:29:40,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 07:29:40,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 07:29:40,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 07:29:40,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 07:29:40,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 07:29:40,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 07:29:40,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 07:29:40,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 07:29:40,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +2: [2023-03-17 07:29:40,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +2: [2023-03-17 07:29:40,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +2: [2023-03-17 07:29:40,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +1: [2023-03-17 07:29:40,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 07:29:40,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 07:29:40,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 07:29:40,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 07:29:40,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 07:29:40,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 07:29:40,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +1: [2023-03-17 07:29:40,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +1: [2023-03-17 07:29:40,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +1: [2023-03-17 07:29:40,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 07:29:40,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 07:29:40,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +1: [2023-03-17 07:29:40,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 07:29:40,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 07:29:40,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +1: [2023-03-17 07:29:40,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 07:29:40,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 07:29:40,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +1: [2023-03-17 07:29:40,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 07:29:40,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 07:29:40,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +1: [2023-03-17 07:29:40,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 07:29:40,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 07:29:40,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 07:29:40,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 07:29:40,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 07:29:40,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 07:29:40,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 07:29:40,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 07:29:40,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 07:29:40,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +6: [2023-03-17 07:29:40,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +6: [2023-03-17 07:29:40,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 07:29:40,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 07:29:40,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 07:29:40,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 07:29:40,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 07:29:40,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 07:29:40,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 07:29:40,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 07:29:40,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +3: [2023-03-17 07:29:40,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 07:29:40,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +3: [2023-03-17 07:29:40,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +5: [2023-03-17 07:29:40,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 07:29:40,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 07:29:40,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 07:29:40,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 07:29:40,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 07:29:40,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 07:29:40,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 07:29:40,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 07:29:40,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 07:29:40,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 07:29:40,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 07:29:40,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 07:29:40,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +5: [2023-03-17 07:29:40,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 07:29:40,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +5: [2023-03-17 07:29:40,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 07:29:40,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +5: [2023-03-17 07:29:40,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +5: [2023-03-17 07:29:40,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +5: [2023-03-17 07:29:40,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +5: [2023-03-17 07:29:40,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +5: [2023-03-17 07:29:40,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 07:29:40,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 07:29:40,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 07:29:40,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 07:29:40,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 07:29:40,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 07:29:40,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 07:29:40,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 07:29:40,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 07:29:40,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +4: [2023-03-17 07:29:40,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +4: [2023-03-17 07:29:40,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +0: [2023-03-17 07:29:40,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 07:29:40,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 07:29:40,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 07:29:40,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 07:29:40,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 07:29:40,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 07:29:40,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 07:29:40,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +7: [2023-03-17 07:29:40,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 07:29:40,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 07:29:40,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! +0: successfully saved checkpoint at iteration 70000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 774.81 +7: iteration 70010/ 173500 | consumed samples: 17922560 | consumed tokens: 36705402880 | elapsed time per iteration (s): 0.51 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.976147E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 499.464 | TFLOPs: 26.21 | +7: iteration 70020/ 173500 | consumed samples: 17925120 | consumed tokens: 36710645760 | elapsed time per iteration (s): 0.43 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.975954E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.762 | TFLOPs: 31.52 | +7: iteration 70030/ 173500 | consumed samples: 17927680 | consumed tokens: 36715888640 | elapsed time per iteration (s): 0.43 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.970747E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.653 | TFLOPs: 31.57 | +7: iteration 70040/ 173500 | consumed samples: 17930240 | consumed tokens: 36721131520 | elapsed time per iteration (s): 0.43 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.977174E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.820 | TFLOPs: 31.47 | +7: iteration 70050/ 173500 | consumed samples: 17932800 | consumed tokens: 36726374400 | elapsed time per iteration (s): 0.43 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.986148E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.218 | TFLOPs: 31.49 | +7: iteration 70060/ 173500 | consumed samples: 17935360 | consumed tokens: 36731617280 | elapsed time per iteration (s): 0.42 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.977090E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.747 | TFLOPs: 31.63 | +7: iteration 70070/ 173500 | consumed samples: 17937920 | consumed tokens: 36736860160 | elapsed time per iteration (s): 0.43 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.962696E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.741 | TFLOPs: 31.05 | +7: iteration 70080/ 173500 | consumed samples: 17940480 | consumed tokens: 36742103040 | elapsed time per iteration (s): 0.43 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.978546E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.643 | TFLOPs: 31.51 | +7: iteration 70090/ 173500 | consumed samples: 17943040 | consumed tokens: 36747345920 | elapsed time per iteration (s): 0.42 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.972441E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.199 | TFLOPs: 31.91 | +7: iteration 70100/ 173500 | consumed samples: 17945600 | consumed tokens: 36752588800 | elapsed time per iteration (s): 0.43 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.971511E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.244 | TFLOPs: 31.02 | +7: iteration 70110/ 173500 | consumed samples: 17948160 | consumed tokens: 36757831680 | elapsed time per iteration (s): 0.44 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.975948E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.967 | TFLOPs: 30.53 | +7: iteration 70120/ 173500 | consumed samples: 17950720 | consumed tokens: 36763074560 | elapsed time per iteration (s): 0.42 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.967653E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.104 | TFLOPs: 31.70 | +7: iteration 70130/ 173500 | consumed samples: 17953280 | consumed tokens: 36768317440 | elapsed time per iteration (s): 0.43 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.985717E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.195 | TFLOPs: 31.28 | +7: iteration 70140/ 173500 | consumed samples: 17955840 | consumed tokens: 36773560320 | elapsed time per iteration (s): 0.43 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.975257E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.247 | TFLOPs: 31.49 | +7: iteration 70150/ 173500 | consumed samples: 17958400 | consumed tokens: 36778803200 | elapsed time per iteration (s): 0.42 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.956630E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.869 | TFLOPs: 31.84 | +7: iteration 70160/ 173500 | consumed samples: 17960960 | consumed tokens: 36784046080 | elapsed time per iteration (s): 0.43 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.976946E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.838 | TFLOPs: 31.42 | +7: iteration 70170/ 173500 | consumed samples: 17963520 | consumed tokens: 36789288960 | elapsed time per iteration (s): 0.42 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.967217E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.260 | TFLOPs: 31.65 | +7: iteration 70180/ 173500 | consumed samples: 17966080 | consumed tokens: 36794531840 | elapsed time per iteration (s): 0.42 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.969365E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.225 | TFLOPs: 31.81 | +7: iteration 70190/ 173500 | consumed samples: 17968640 | consumed tokens: 36799774720 | elapsed time per iteration (s): 0.43 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.968717E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.527 | TFLOPs: 31.56 | +7: iteration 70200/ 173500 | consumed samples: 17971200 | consumed tokens: 36805017600 | elapsed time per iteration (s): 0.43 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.979608E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.684 | TFLOPs: 31.25 | +7: iteration 70210/ 173500 | consumed samples: 17973760 | consumed tokens: 36810260480 | elapsed time per iteration (s): 0.42 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.962211E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.865 | TFLOPs: 31.84 | +7: iteration 70220/ 173500 | consumed samples: 17976320 | consumed tokens: 36815503360 | elapsed time per iteration (s): 0.43 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.987745E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.241 | TFLOPs: 31.44 | +7: iteration 70230/ 173500 | consumed samples: 17978880 | consumed tokens: 36820746240 | elapsed time per iteration (s): 0.43 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.964292E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.194 | TFLOPs: 31.44 | +7: iteration 70240/ 173500 | consumed samples: 17981440 | consumed tokens: 36825989120 | elapsed time per iteration (s): 0.42 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.986760E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.327 | TFLOPs: 31.87 | +7: iteration 70250/ 173500 | consumed samples: 17984000 | consumed tokens: 36831232000 | elapsed time per iteration (s): 0.43 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.988520E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.426 | TFLOPs: 31.50 | +7: iteration 70260/ 173500 | consumed samples: 17986560 | consumed tokens: 36836474880 | elapsed time per iteration (s): 0.43 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.966644E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.247 | TFLOPs: 31.23 | +7: iteration 70270/ 173500 | consumed samples: 17989120 | consumed tokens: 36841717760 | elapsed time per iteration (s): 0.42 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.962950E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.088 | TFLOPs: 31.64 | +7: iteration 70280/ 173500 | consumed samples: 17991680 | consumed tokens: 36846960640 | elapsed time per iteration (s): 0.42 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.978200E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.270 | TFLOPs: 31.65 | +7: iteration 70290/ 173500 | consumed samples: 17994240 | consumed tokens: 36852203520 | elapsed time per iteration (s): 0.43 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.965704E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.114 | TFLOPs: 31.38 | +7: iteration 70300/ 173500 | consumed samples: 17996800 | consumed tokens: 36857446400 | elapsed time per iteration (s): 0.43 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.971425E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.708 | TFLOPs: 31.52 | +7: iteration 70310/ 173500 | consumed samples: 17999360 | consumed tokens: 36862689280 | elapsed time per iteration (s): 0.43 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.978412E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.924 | TFLOPs: 31.21 | +7: iteration 70320/ 173500 | consumed samples: 18001920 | consumed tokens: 36867932160 | elapsed time per iteration (s): 0.42 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.972633E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.056 | TFLOPs: 31.75 | +7: iteration 70330/ 173500 | consumed samples: 18004480 | consumed tokens: 36873175040 | elapsed time per iteration (s): 0.43 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.964322E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.232 | TFLOPs: 31.39 | +7: iteration 70340/ 173500 | consumed samples: 18007040 | consumed tokens: 36878417920 | elapsed time per iteration (s): 0.42 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.990925E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.517 | TFLOPs: 31.61 | +7: iteration 70350/ 173500 | consumed samples: 18009600 | consumed tokens: 36883660800 | elapsed time per iteration (s): 0.42 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.966109E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.134 | TFLOPs: 31.75 | +7: iteration 70360/ 173500 | consumed samples: 18012160 | consumed tokens: 36888903680 | elapsed time per iteration (s): 0.42 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.984907E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.396 | TFLOPs: 31.82 | +7: iteration 70370/ 173500 | consumed samples: 18014720 | consumed tokens: 36894146560 | elapsed time per iteration (s): 0.43 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.977648E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.096 | TFLOPs: 31.12 | +7: iteration 70380/ 173500 | consumed samples: 18017280 | consumed tokens: 36899389440 | elapsed time per iteration (s): 0.44 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.973421E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.242 | TFLOPs: 30.81 | +7: iteration 70390/ 173500 | consumed samples: 18019840 | consumed tokens: 36904632320 | elapsed time per iteration (s): 0.43 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.963497E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.487 | TFLOPs: 31.40 | +7: iteration 70400/ 173500 | consumed samples: 18022400 | consumed tokens: 36909875200 | elapsed time per iteration (s): 0.43 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.980366E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.663 | TFLOPs: 31.36 | +7: iteration 70410/ 173500 | consumed samples: 18024960 | consumed tokens: 36915118080 | elapsed time per iteration (s): 0.43 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.976023E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.656 | TFLOPs: 31.41 | +7: iteration 70420/ 173500 | consumed samples: 18027520 | consumed tokens: 36920360960 | elapsed time per iteration (s): 0.43 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.971975E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.126 | TFLOPs: 31.38 | +7: iteration 70430/ 173500 | consumed samples: 18030080 | consumed tokens: 36925603840 | elapsed time per iteration (s): 0.43 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.970583E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.002 | TFLOPs: 31.59 | +7: iteration 70440/ 173500 | consumed samples: 18032640 | consumed tokens: 36930846720 | elapsed time per iteration (s): 0.43 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.968942E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.788 | TFLOPs: 31.31 | +7: iteration 70450/ 173500 | consumed samples: 18035200 | consumed tokens: 36936089600 | elapsed time per iteration (s): 0.43 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.983719E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.299 | TFLOPs: 31.29 | +7: iteration 70460/ 173500 | consumed samples: 18037760 | consumed tokens: 36941332480 | elapsed time per iteration (s): 0.42 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.969654E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.620 | TFLOPs: 31.67 | +7: iteration 70470/ 173500 | consumed samples: 18040320 | consumed tokens: 36946575360 | elapsed time per iteration (s): 0.42 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.977934E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.577 | TFLOPs: 31.83 | +7: iteration 70480/ 173500 | consumed samples: 18042880 | consumed tokens: 36951818240 | elapsed time per iteration (s): 0.43 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.984176E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.415 | TFLOPs: 31.50 | +7: iteration 70490/ 173500 | consumed samples: 18045440 | consumed tokens: 36957061120 | elapsed time per iteration (s): 0.42 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.973105E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.300 | TFLOPs: 31.81 | +7: iteration 70500/ 173500 | consumed samples: 18048000 | consumed tokens: 36962304000 | elapsed time per iteration (s): 0.43 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.974269E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.441 | TFLOPs: 31.50 | +7: iteration 70510/ 173500 | consumed samples: 18050560 | consumed tokens: 36967546880 | elapsed time per iteration (s): 0.42 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.968095E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.978 | TFLOPs: 31.95 | +7: iteration 70520/ 173500 | consumed samples: 18053120 | consumed tokens: 36972789760 | elapsed time per iteration (s): 0.43 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.983078E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.171 | TFLOPs: 31.54 | +7: iteration 70530/ 173500 | consumed samples: 18055680 | consumed tokens: 36978032640 | elapsed time per iteration (s): 0.42 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.976692E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.623 | TFLOPs: 31.83 | +7: iteration 70540/ 173500 | consumed samples: 18058240 | consumed tokens: 36983275520 | elapsed time per iteration (s): 0.42 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.955380E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.325 | TFLOPs: 31.71 | +7: iteration 70550/ 173500 | consumed samples: 18060800 | consumed tokens: 36988518400 | elapsed time per iteration (s): 0.42 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.971708E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.886 | TFLOPs: 31.68 | +7: iteration 70560/ 173500 | consumed samples: 18063360 | consumed tokens: 36993761280 | elapsed time per iteration (s): 0.42 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.976014E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.866 | TFLOPs: 31.68 | +7: iteration 70570/ 173500 | consumed samples: 18065920 | consumed tokens: 36999004160 | elapsed time per iteration (s): 0.42 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.980181E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.764 | TFLOPs: 31.89 | +7: iteration 70580/ 173500 | consumed samples: 18068480 | consumed tokens: 37004247040 | elapsed time per iteration (s): 0.42 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.963031E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.371 | TFLOPs: 31.76 | +7: iteration 70590/ 173500 | consumed samples: 18071040 | consumed tokens: 37009489920 | elapsed time per iteration (s): 0.43 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.970818E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.754 | TFLOPs: 31.36 | +7: iteration 70600/ 173500 | consumed samples: 18073600 | consumed tokens: 37014732800 | elapsed time per iteration (s): 0.43 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.963861E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.261 | TFLOPs: 31.28 | +7: iteration 70610/ 173500 | consumed samples: 18076160 | consumed tokens: 37019975680 | elapsed time per iteration (s): 0.43 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.981215E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.927 | TFLOPs: 31.27 | +7: iteration 70620/ 173500 | consumed samples: 18078720 | consumed tokens: 37025218560 | elapsed time per iteration (s): 0.43 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.966668E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.460 | TFLOPs: 31.30 | +7: iteration 70630/ 173500 | consumed samples: 18081280 | consumed tokens: 37030461440 | elapsed time per iteration (s): 0.43 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.975724E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.314 | TFLOPs: 31.50 | +7: iteration 70640/ 173500 | consumed samples: 18083840 | consumed tokens: 37035704320 | elapsed time per iteration (s): 0.43 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.972116E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.931 | TFLOPs: 31.11 | +7: iteration 70650/ 173500 | consumed samples: 18086400 | consumed tokens: 37040947200 | elapsed time per iteration (s): 0.43 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.961068E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.869 | TFLOPs: 31.42 | +7: iteration 70660/ 173500 | consumed samples: 18088960 | consumed tokens: 37046190080 | elapsed time per iteration (s): 0.43 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.968715E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.412 | TFLOPs: 31.35 | +7: iteration 70670/ 173500 | consumed samples: 18091520 | consumed tokens: 37051432960 | elapsed time per iteration (s): 0.43 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.969158E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.436 | TFLOPs: 30.93 | +7: iteration 70680/ 173500 | consumed samples: 18094080 | consumed tokens: 37056675840 | elapsed time per iteration (s): 0.43 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.967400E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.520 | TFLOPs: 31.19 | +7: iteration 70690/ 173500 | consumed samples: 18096640 | consumed tokens: 37061918720 | elapsed time per iteration (s): 0.43 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.990014E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.683 | TFLOPs: 31.41 | +7: iteration 70700/ 173500 | consumed samples: 18099200 | consumed tokens: 37067161600 | elapsed time per iteration (s): 0.43 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.979977E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.092 | TFLOPs: 31.17 | +7: iteration 70710/ 173500 | consumed samples: 18101760 | consumed tokens: 37072404480 | elapsed time per iteration (s): 0.42 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.962345E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.220 | TFLOPs: 31.91 | +7: iteration 70720/ 173500 | consumed samples: 18104320 | consumed tokens: 37077647360 | elapsed time per iteration (s): 0.43 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.989575E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.990 | TFLOPs: 31.59 | +7: iteration 70730/ 173500 | consumed samples: 18106880 | consumed tokens: 37082890240 | elapsed time per iteration (s): 0.43 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.976556E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.895 | TFLOPs: 31.58 | +7: iteration 70740/ 173500 | consumed samples: 18109440 | consumed tokens: 37088133120 | elapsed time per iteration (s): 0.42 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.974161E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.111 | TFLOPs: 31.91 | +7: iteration 70750/ 173500 | consumed samples: 18112000 | consumed tokens: 37093376000 | elapsed time per iteration (s): 0.44 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.988078E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.022 | TFLOPs: 30.80 | +7: iteration 70760/ 173500 | consumed samples: 18114560 | consumed tokens: 37098618880 | elapsed time per iteration (s): 0.42 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.966277E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.913 | TFLOPs: 31.63 | +7: iteration 70770/ 173500 | consumed samples: 18117120 | consumed tokens: 37103861760 | elapsed time per iteration (s): 0.42 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.968740E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.688 | TFLOPs: 31.73 | +7: iteration 70780/ 173500 | consumed samples: 18119680 | consumed tokens: 37109104640 | elapsed time per iteration (s): 0.43 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.978028E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.568 | TFLOPs: 31.30 | +7: iteration 70790/ 173500 | consumed samples: 18122240 | consumed tokens: 37114347520 | elapsed time per iteration (s): 0.42 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.986799E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.081 | TFLOPs: 31.64 | +7: iteration 70800/ 173500 | consumed samples: 18124800 | consumed tokens: 37119590400 | elapsed time per iteration (s): 0.42 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.974821E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.718 | TFLOPs: 31.62 | +7: iteration 70810/ 173500 | consumed samples: 18127360 | consumed tokens: 37124833280 | elapsed time per iteration (s): 0.43 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.951916E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.577 | TFLOPs: 31.51 | +7: iteration 70820/ 173500 | consumed samples: 18129920 | consumed tokens: 37130076160 | elapsed time per iteration (s): 0.42 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.959838E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.618 | TFLOPs: 31.67 | +7: iteration 70830/ 173500 | consumed samples: 18132480 | consumed tokens: 37135319040 | elapsed time per iteration (s): 0.43 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.953388E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.869 | TFLOPs: 31.47 | +7: iteration 70840/ 173500 | consumed samples: 18135040 | consumed tokens: 37140561920 | elapsed time per iteration (s): 0.42 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.959103E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.104 | TFLOPs: 31.75 | +7: iteration 70850/ 173500 | consumed samples: 18137600 | consumed tokens: 37145804800 | elapsed time per iteration (s): 0.42 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.974957E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.828 | TFLOPs: 31.73 | +7: iteration 70860/ 173500 | consumed samples: 18140160 | consumed tokens: 37151047680 | elapsed time per iteration (s): 0.43 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.971667E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.362 | TFLOPs: 31.50 | +7: iteration 70870/ 173500 | consumed samples: 18142720 | consumed tokens: 37156290560 | elapsed time per iteration (s): 0.42 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.972317E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.004 | TFLOPs: 31.90 | +7: iteration 70880/ 173500 | consumed samples: 18145280 | consumed tokens: 37161533440 | elapsed time per iteration (s): 0.43 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.966709E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.101 | TFLOPs: 31.01 | +7: iteration 70890/ 173500 | consumed samples: 18147840 | consumed tokens: 37166776320 | elapsed time per iteration (s): 0.43 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.979468E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.173 | TFLOPs: 31.54 | +7: iteration 70900/ 173500 | consumed samples: 18150400 | consumed tokens: 37172019200 | elapsed time per iteration (s): 0.42 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.966875E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.306 | TFLOPs: 31.92 | +7: iteration 70910/ 173500 | consumed samples: 18152960 | consumed tokens: 37177262080 | elapsed time per iteration (s): 0.42 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.977101E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.625 | TFLOPs: 31.67 | +7: iteration 70920/ 173500 | consumed samples: 18155520 | consumed tokens: 37182504960 | elapsed time per iteration (s): 0.42 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.964980E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.384 | TFLOPs: 31.61 | +7: iteration 70930/ 173500 | consumed samples: 18158080 | consumed tokens: 37187747840 | elapsed time per iteration (s): 0.42 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.964543E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.724 | TFLOPs: 31.78 | +7: iteration 70940/ 173500 | consumed samples: 18160640 | consumed tokens: 37192990720 | elapsed time per iteration (s): 0.42 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.978603E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.743 | TFLOPs: 31.68 | +7: iteration 70950/ 173500 | consumed samples: 18163200 | consumed tokens: 37198233600 | elapsed time per iteration (s): 0.43 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.970917E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.606 | TFLOPs: 31.25 | +7: iteration 70960/ 173500 | consumed samples: 18165760 | consumed tokens: 37203476480 | elapsed time per iteration (s): 0.42 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.950508E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.439 | TFLOPs: 31.66 | +7: iteration 70970/ 173500 | consumed samples: 18168320 | consumed tokens: 37208719360 | elapsed time per iteration (s): 0.43 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.972071E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.734 | TFLOPs: 31.47 | +7: iteration 70980/ 173500 | consumed samples: 18170880 | consumed tokens: 37213962240 | elapsed time per iteration (s): 0.42 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.966958E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.455 | TFLOPs: 31.82 | +7: iteration 70990/ 173500 | consumed samples: 18173440 | consumed tokens: 37219205120 | elapsed time per iteration (s): 0.42 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.972956E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.018 | TFLOPs: 31.64 | +7: iteration 71000/ 173500 | consumed samples: 18176000 | consumed tokens: 37224448000 | elapsed time per iteration (s): 0.42 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.967013E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.625 | TFLOPs: 31.67 | +7: iteration 71010/ 173500 | consumed samples: 18178560 | consumed tokens: 37229690880 | elapsed time per iteration (s): 0.43 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.973841E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.984 | TFLOPs: 31.17 | +7: iteration 71020/ 173500 | consumed samples: 18181120 | consumed tokens: 37234933760 | elapsed time per iteration (s): 0.43 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.971238E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.778 | TFLOPs: 31.21 | +7: iteration 71030/ 173500 | consumed samples: 18183680 | consumed tokens: 37240176640 | elapsed time per iteration (s): 0.43 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.979916E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.185 | TFLOPs: 31.33 | +7: iteration 71040/ 173500 | consumed samples: 18186240 | consumed tokens: 37245419520 | elapsed time per iteration (s): 0.42 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.976651E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.377 | TFLOPs: 31.92 | +7: iteration 71050/ 173500 | consumed samples: 18188800 | consumed tokens: 37250662400 | elapsed time per iteration (s): 0.43 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.988056E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.225 | TFLOPs: 31.23 | +7: iteration 71060/ 173500 | consumed samples: 18191360 | consumed tokens: 37255905280 | elapsed time per iteration (s): 0.42 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.977116E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.600 | TFLOPs: 31.77 | +7: iteration 71070/ 173500 | consumed samples: 18193920 | consumed tokens: 37261148160 | elapsed time per iteration (s): 0.42 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.976519E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.657 | TFLOPs: 31.78 | +7: iteration 71080/ 173500 | consumed samples: 18196480 | consumed tokens: 37266391040 | elapsed time per iteration (s): 0.42 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.981547E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.530 | TFLOPs: 31.72 | +7: iteration 71090/ 173500 | consumed samples: 18199040 | consumed tokens: 37271633920 | elapsed time per iteration (s): 0.43 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.956627E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.895 | TFLOPs: 31.42 | +7: iteration 71100/ 173500 | consumed samples: 18201600 | consumed tokens: 37276876800 | elapsed time per iteration (s): 0.42 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.961214E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.767 | TFLOPs: 31.89 | +7: iteration 71110/ 173500 | consumed samples: 18204160 | consumed tokens: 37282119680 | elapsed time per iteration (s): 0.42 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.977938E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.062 | TFLOPs: 31.90 | +7: iteration 71120/ 173500 | consumed samples: 18206720 | consumed tokens: 37287362560 | elapsed time per iteration (s): 0.43 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.976758E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.100 | TFLOPs: 31.33 | +7: iteration 71130/ 173500 | consumed samples: 18209280 | consumed tokens: 37292605440 | elapsed time per iteration (s): 0.43 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.990385E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.583 | TFLOPs: 31.46 | +7: iteration 71140/ 173500 | consumed samples: 18211840 | consumed tokens: 37297848320 | elapsed time per iteration (s): 0.42 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.967302E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.424 | TFLOPs: 31.66 | +7: iteration 71150/ 173500 | consumed samples: 18214400 | consumed tokens: 37303091200 | elapsed time per iteration (s): 0.43 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.958183E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.589 | TFLOPs: 31.56 | +7: iteration 71160/ 173500 | consumed samples: 18216960 | consumed tokens: 37308334080 | elapsed time per iteration (s): 0.42 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.973005E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.006 | TFLOPs: 31.69 | +7: iteration 71170/ 173500 | consumed samples: 18219520 | consumed tokens: 37313576960 | elapsed time per iteration (s): 0.43 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.948050E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.895 | TFLOPs: 31.37 | +7: iteration 71180/ 173500 | consumed samples: 18222080 | consumed tokens: 37318819840 | elapsed time per iteration (s): 0.43 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.979794E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.944 | TFLOPs: 31.53 | +7: iteration 71190/ 173500 | consumed samples: 18224640 | consumed tokens: 37324062720 | elapsed time per iteration (s): 0.43 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.969152E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.381 | TFLOPs: 31.40 | +7: iteration 71200/ 173500 | consumed samples: 18227200 | consumed tokens: 37329305600 | elapsed time per iteration (s): 0.42 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.974061E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.181 | TFLOPs: 31.91 | +7: iteration 71210/ 173500 | consumed samples: 18229760 | consumed tokens: 37334548480 | elapsed time per iteration (s): 0.43 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.965996E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.112 | TFLOPs: 31.07 | +7: iteration 71220/ 173500 | consumed samples: 18232320 | consumed tokens: 37339791360 | elapsed time per iteration (s): 0.42 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.948200E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.467 | TFLOPs: 31.72 | +7: iteration 71230/ 173500 | consumed samples: 18234880 | consumed tokens: 37345034240 | elapsed time per iteration (s): 0.42 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.969125E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.577 | TFLOPs: 31.93 | +7: iteration 71240/ 173500 | consumed samples: 18237440 | consumed tokens: 37350277120 | elapsed time per iteration (s): 0.43 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.963905E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.141 | TFLOPs: 31.49 | +7: iteration 71250/ 173500 | consumed samples: 18240000 | consumed tokens: 37355520000 | elapsed time per iteration (s): 0.43 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.971536E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.661 | TFLOPs: 31.36 | +7: iteration 71260/ 173500 | consumed samples: 18242560 | consumed tokens: 37360762880 | elapsed time per iteration (s): 0.43 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.970079E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.777 | TFLOPs: 31.57 | +7: iteration 71270/ 173500 | consumed samples: 18245120 | consumed tokens: 37366005760 | elapsed time per iteration (s): 0.42 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.988920E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.696 | TFLOPs: 31.62 | +7: iteration 71280/ 173500 | consumed samples: 18247680 | consumed tokens: 37371248640 | elapsed time per iteration (s): 0.42 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.973204E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.985 | TFLOPs: 31.95 | +7: iteration 71290/ 173500 | consumed samples: 18250240 | consumed tokens: 37376491520 | elapsed time per iteration (s): 0.42 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.978462E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.907 | TFLOPs: 31.79 | +7: iteration 71300/ 173500 | consumed samples: 18252800 | consumed tokens: 37381734400 | elapsed time per iteration (s): 0.43 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.971871E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.564 | TFLOPs: 31.51 | +7: iteration 71310/ 173500 | consumed samples: 18255360 | consumed tokens: 37386977280 | elapsed time per iteration (s): 0.42 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.968294E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.046 | TFLOPs: 31.96 | +7: iteration 71320/ 173500 | consumed samples: 18257920 | consumed tokens: 37392220160 | elapsed time per iteration (s): 0.43 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.969064E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.107 | TFLOPs: 31.43 | +7: iteration 71330/ 173500 | consumed samples: 18260480 | consumed tokens: 37397463040 | elapsed time per iteration (s): 0.42 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.951678E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.862 | TFLOPs: 31.74 | +7: iteration 71340/ 173500 | consumed samples: 18263040 | consumed tokens: 37402705920 | elapsed time per iteration (s): 0.42 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.974155E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.451 | TFLOPs: 31.61 | +7: iteration 71350/ 173500 | consumed samples: 18265600 | consumed tokens: 37407948800 | elapsed time per iteration (s): 0.42 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.977884E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.554 | TFLOPs: 31.93 | +7: iteration 71360/ 173500 | consumed samples: 18268160 | consumed tokens: 37413191680 | elapsed time per iteration (s): 0.42 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.968537E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.705 | TFLOPs: 31.89 | +7: iteration 71370/ 173500 | consumed samples: 18270720 | consumed tokens: 37418434560 | elapsed time per iteration (s): 0.42 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.957347E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.701 | TFLOPs: 31.83 | +7: iteration 71380/ 173500 | consumed samples: 18273280 | consumed tokens: 37423677440 | elapsed time per iteration (s): 0.42 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.973518E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.666 | TFLOPs: 31.62 | +7: iteration 71390/ 173500 | consumed samples: 18275840 | consumed tokens: 37428920320 | elapsed time per iteration (s): 0.42 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.993598E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.371 | TFLOPs: 31.92 | +7: iteration 71400/ 173500 | consumed samples: 18278400 | consumed tokens: 37434163200 | elapsed time per iteration (s): 0.43 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.981778E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.176 | TFLOPs: 31.60 | +7: iteration 71410/ 173500 | consumed samples: 18280960 | consumed tokens: 37439406080 | elapsed time per iteration (s): 0.43 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.955339E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.679 | TFLOPs: 31.57 | +7: iteration 71420/ 173500 | consumed samples: 18283520 | consumed tokens: 37444648960 | elapsed time per iteration (s): 0.42 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.981203E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.458 | TFLOPs: 31.71 | +7: iteration 71430/ 173500 | consumed samples: 18286080 | consumed tokens: 37449891840 | elapsed time per iteration (s): 0.42 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.967199E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.571 | TFLOPs: 31.72 | +7: iteration 71440/ 173500 | consumed samples: 18288640 | consumed tokens: 37455134720 | elapsed time per iteration (s): 0.42 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.972437E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.530 | TFLOPs: 31.82 | +7: iteration 71450/ 173500 | consumed samples: 18291200 | consumed tokens: 37460377600 | elapsed time per iteration (s): 0.42 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.964369E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.723 | TFLOPs: 31.62 | +7: iteration 71460/ 173500 | consumed samples: 18293760 | consumed tokens: 37465620480 | elapsed time per iteration (s): 0.42 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.971763E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.275 | TFLOPs: 31.71 | +7: iteration 71470/ 173500 | consumed samples: 18296320 | consumed tokens: 37470863360 | elapsed time per iteration (s): 0.43 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.973276E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.136 | TFLOPs: 31.44 | +7: iteration 71480/ 173500 | consumed samples: 18298880 | consumed tokens: 37476106240 | elapsed time per iteration (s): 0.42 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.969641E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.241 | TFLOPs: 31.76 | +7: iteration 71490/ 173500 | consumed samples: 18301440 | consumed tokens: 37481349120 | elapsed time per iteration (s): 0.43 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.971672E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.943 | TFLOPs: 31.32 | +7: iteration 71500/ 173500 | consumed samples: 18304000 | consumed tokens: 37486592000 | elapsed time per iteration (s): 0.42 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.961572E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.959 | TFLOPs: 31.95 | +7: iteration 71510/ 173500 | consumed samples: 18306560 | consumed tokens: 37491834880 | elapsed time per iteration (s): 0.42 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.972372E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.209 | TFLOPs: 31.91 | +7: iteration 71520/ 173500 | consumed samples: 18309120 | consumed tokens: 37497077760 | elapsed time per iteration (s): 0.45 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.967429E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.199 | TFLOPs: 30.18 | +7: iteration 71530/ 173500 | consumed samples: 18311680 | consumed tokens: 37502320640 | elapsed time per iteration (s): 0.43 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.978981E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.415 | TFLOPs: 31.19 | +7: iteration 71540/ 173500 | consumed samples: 18314240 | consumed tokens: 37507563520 | elapsed time per iteration (s): 0.43 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.964557E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.149 | TFLOPs: 31.17 | +7: iteration 71550/ 173500 | consumed samples: 18316800 | consumed tokens: 37512806400 | elapsed time per iteration (s): 0.43 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.953481E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.165 | TFLOPs: 31.54 | +7: iteration 71560/ 173500 | consumed samples: 18319360 | consumed tokens: 37518049280 | elapsed time per iteration (s): 0.42 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.971523E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.833 | TFLOPs: 31.84 | +7: iteration 71570/ 173500 | consumed samples: 18321920 | consumed tokens: 37523292160 | elapsed time per iteration (s): 0.42 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.974514E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.376 | TFLOPs: 31.66 | +7: iteration 71580/ 173500 | consumed samples: 18324480 | consumed tokens: 37528535040 | elapsed time per iteration (s): 0.42 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.969933E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.367 | TFLOPs: 31.61 | +7: iteration 71590/ 173500 | consumed samples: 18327040 | consumed tokens: 37533777920 | elapsed time per iteration (s): 0.42 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.965502E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.303 | TFLOPs: 31.97 | +7: iteration 71600/ 173500 | consumed samples: 18329600 | consumed tokens: 37539020800 | elapsed time per iteration (s): 0.43 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.975820E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.215 | TFLOPs: 31.60 | +7: iteration 71610/ 173500 | consumed samples: 18332160 | consumed tokens: 37544263680 | elapsed time per iteration (s): 0.42 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.959032E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.211 | TFLOPs: 31.65 | +7: iteration 71620/ 173500 | consumed samples: 18334720 | consumed tokens: 37549506560 | elapsed time per iteration (s): 0.42 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.974370E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.612 | TFLOPs: 31.67 | +7: iteration 71630/ 173500 | consumed samples: 18337280 | consumed tokens: 37554749440 | elapsed time per iteration (s): 0.42 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.976962E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.685 | TFLOPs: 31.83 | +7: iteration 71640/ 173500 | consumed samples: 18339840 | consumed tokens: 37559992320 | elapsed time per iteration (s): 0.43 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.966222E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.799 | TFLOPs: 31.21 | +7: iteration 71650/ 173500 | consumed samples: 18342400 | consumed tokens: 37565235200 | elapsed time per iteration (s): 0.42 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.972515E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.653 | TFLOPs: 31.67 | +7: iteration 71660/ 173500 | consumed samples: 18344960 | consumed tokens: 37570478080 | elapsed time per iteration (s): 0.42 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.966677E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.209 | TFLOPs: 31.96 | +7: iteration 71670/ 173500 | consumed samples: 18347520 | consumed tokens: 37575720960 | elapsed time per iteration (s): 0.43 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.978241E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.979 | TFLOPs: 31.58 | +7: iteration 71680/ 173500 | consumed samples: 18350080 | consumed tokens: 37580963840 | elapsed time per iteration (s): 0.42 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.974693E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.740 | TFLOPs: 31.68 | +7: iteration 71690/ 173500 | consumed samples: 18352640 | consumed tokens: 37586206720 | elapsed time per iteration (s): 0.43 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.968384E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.705 | TFLOPs: 31.52 | +7: iteration 71700/ 173500 | consumed samples: 18355200 | consumed tokens: 37591449600 | elapsed time per iteration (s): 0.42 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.970132E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.490 | TFLOPs: 31.98 | +7: iteration 71710/ 173500 | consumed samples: 18357760 | consumed tokens: 37596692480 | elapsed time per iteration (s): 0.42 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.970098E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.056 | TFLOPs: 31.96 | +7: iteration 71720/ 173500 | consumed samples: 18360320 | consumed tokens: 37601935360 | elapsed time per iteration (s): 0.43 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.978883E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.825 | TFLOPs: 31.58 | +7: iteration 71730/ 173500 | consumed samples: 18362880 | consumed tokens: 37607178240 | elapsed time per iteration (s): 0.42 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.980453E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.714 | TFLOPs: 31.94 | +7: iteration 71740/ 173500 | consumed samples: 18365440 | consumed tokens: 37612421120 | elapsed time per iteration (s): 0.43 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.966251E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.117 | TFLOPs: 31.59 | +7: iteration 71750/ 173500 | consumed samples: 18368000 | consumed tokens: 37617664000 | elapsed time per iteration (s): 0.42 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.967687E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.529 | TFLOPs: 31.93 | +7: iteration 71760/ 173500 | consumed samples: 18370560 | consumed tokens: 37622906880 | elapsed time per iteration (s): 0.43 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.977016E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.681 | TFLOPs: 31.41 | +7: iteration 71770/ 173500 | consumed samples: 18373120 | consumed tokens: 37628149760 | elapsed time per iteration (s): 0.42 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.966926E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.973 | TFLOPs: 31.74 | +7: iteration 71780/ 173500 | consumed samples: 18375680 | consumed tokens: 37633392640 | elapsed time per iteration (s): 0.42 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.971396E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.774 | TFLOPs: 31.78 | +7: iteration 71790/ 173500 | consumed samples: 18378240 | consumed tokens: 37638635520 | elapsed time per iteration (s): 0.43 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.986752E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.645 | TFLOPs: 31.25 | +7: iteration 71800/ 173500 | consumed samples: 18380800 | consumed tokens: 37643878400 | elapsed time per iteration (s): 0.42 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.971224E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.410 | TFLOPs: 31.82 | +7: iteration 71810/ 173500 | consumed samples: 18383360 | consumed tokens: 37649121280 | elapsed time per iteration (s): 0.43 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.962336E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.590 | TFLOPs: 31.51 | +7: iteration 71820/ 173500 | consumed samples: 18385920 | consumed tokens: 37654364160 | elapsed time per iteration (s): 0.42 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.964926E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.803 | TFLOPs: 31.68 | +7: iteration 71830/ 173500 | consumed samples: 18388480 | consumed tokens: 37659607040 | elapsed time per iteration (s): 0.42 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.970752E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.771 | TFLOPs: 31.78 | +7: iteration 71840/ 173500 | consumed samples: 18391040 | consumed tokens: 37664849920 | elapsed time per iteration (s): 0.43 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.974236E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.780 | TFLOPs: 31.57 | +7: iteration 71850/ 173500 | consumed samples: 18393600 | consumed tokens: 37670092800 | elapsed time per iteration (s): 0.42 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.963971E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.289 | TFLOPs: 31.71 | +7: iteration 71860/ 173500 | consumed samples: 18396160 | consumed tokens: 37675335680 | elapsed time per iteration (s): 0.42 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.962254E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.083 | TFLOPs: 31.91 | +7: iteration 71870/ 173500 | consumed samples: 18398720 | consumed tokens: 37680578560 | elapsed time per iteration (s): 0.43 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.966103E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.114 | TFLOPs: 31.22 | +7: iteration 71880/ 173500 | consumed samples: 18401280 | consumed tokens: 37685821440 | elapsed time per iteration (s): 0.42 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.960962E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.462 | TFLOPs: 31.66 | +7: iteration 71890/ 173500 | consumed samples: 18403840 | consumed tokens: 37691064320 | elapsed time per iteration (s): 0.43 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.963248E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.700 | TFLOPs: 31.52 | +7: iteration 71900/ 173500 | consumed samples: 18406400 | consumed tokens: 37696307200 | elapsed time per iteration (s): 0.42 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.976920E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.324 | TFLOPs: 31.76 | +7: iteration 71910/ 173500 | consumed samples: 18408960 | consumed tokens: 37701550080 | elapsed time per iteration (s): 0.42 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.978582E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.862 | TFLOPs: 31.68 | +7: iteration 71920/ 173500 | consumed samples: 18411520 | consumed tokens: 37706792960 | elapsed time per iteration (s): 0.42 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.965544E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.093 | TFLOPs: 31.91 | +7: iteration 71930/ 173500 | consumed samples: 18414080 | consumed tokens: 37712035840 | elapsed time per iteration (s): 0.42 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.974593E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.790 | TFLOPs: 31.89 | +7: iteration 71940/ 173500 | consumed samples: 18416640 | consumed tokens: 37717278720 | elapsed time per iteration (s): 0.42 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.954041E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.070 | TFLOPs: 31.85 | +7: iteration 71950/ 173500 | consumed samples: 18419200 | consumed tokens: 37722521600 | elapsed time per iteration (s): 0.42 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.970448E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.765 | TFLOPs: 31.73 | +7: iteration 71960/ 173500 | consumed samples: 18421760 | consumed tokens: 37727764480 | elapsed time per iteration (s): 0.42 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.960681E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.167 | TFLOPs: 31.80 | +7: iteration 71970/ 173500 | consumed samples: 18424320 | consumed tokens: 37733007360 | elapsed time per iteration (s): 0.42 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.974187E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.657 | TFLOPs: 31.94 | +7: iteration 71980/ 173500 | consumed samples: 18426880 | consumed tokens: 37738250240 | elapsed time per iteration (s): 0.42 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.962383E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.725 | TFLOPs: 31.83 | +7: iteration 71990/ 173500 | consumed samples: 18429440 | consumed tokens: 37743493120 | elapsed time per iteration (s): 0.43 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.972157E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.950 | TFLOPs: 31.16 | +0: [2023-03-17 07:43:51,732] [INFO] [logging.py:68:log_dist] [Rank 0] step=72000, skipped=0, lr=[0.0001353602432066091, 0.0001353602432066091, 0.0001353602432066091], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 72000/ 173500 | consumed samples: 18432000 | consumed tokens: 37748736000 | elapsed time per iteration (s): 0.42 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.981449E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.276 | TFLOPs: 31.92 | +0: steps: 72000 loss: 2.9681 iter time (s): 0.424 samples/sec: 604.334 +7: iteration 72010/ 173500 | consumed samples: 18434560 | consumed tokens: 37753978880 | elapsed time per iteration (s): 0.43 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.969066E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.298 | TFLOPs: 31.60 | +7: iteration 72020/ 173500 | consumed samples: 18437120 | consumed tokens: 37759221760 | elapsed time per iteration (s): 0.43 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.968434E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.653 | TFLOPs: 31.46 | +7: iteration 72030/ 173500 | consumed samples: 18439680 | consumed tokens: 37764464640 | elapsed time per iteration (s): 0.43 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.971254E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.350 | TFLOPs: 31.18 | +7: iteration 72040/ 173500 | consumed samples: 18442240 | consumed tokens: 37769707520 | elapsed time per iteration (s): 0.43 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.970739E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.914 | TFLOPs: 31.58 | +7: iteration 72050/ 173500 | consumed samples: 18444800 | consumed tokens: 37774950400 | elapsed time per iteration (s): 0.42 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.981907E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.543 | TFLOPs: 31.67 | +7: iteration 72060/ 173500 | consumed samples: 18447360 | consumed tokens: 37780193280 | elapsed time per iteration (s): 0.42 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.973423E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.168 | TFLOPs: 31.80 | +7: iteration 72070/ 173500 | consumed samples: 18449920 | consumed tokens: 37785436160 | elapsed time per iteration (s): 0.43 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.966155E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.610 | TFLOPs: 31.46 | +7: iteration 72080/ 173500 | consumed samples: 18452480 | consumed tokens: 37790679040 | elapsed time per iteration (s): 0.42 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.974119E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.929 | TFLOPs: 31.90 | +7: iteration 72090/ 173500 | consumed samples: 18455040 | consumed tokens: 37795921920 | elapsed time per iteration (s): 0.43 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.972780E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.661 | TFLOPs: 31.57 | +7: iteration 72100/ 173500 | consumed samples: 18457600 | consumed tokens: 37801164800 | elapsed time per iteration (s): 0.43 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.974853E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.901 | TFLOPs: 31.48 | +7: iteration 72110/ 173500 | consumed samples: 18460160 | consumed tokens: 37806407680 | elapsed time per iteration (s): 0.42 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.973326E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.196 | TFLOPs: 31.91 | +7: iteration 72120/ 173500 | consumed samples: 18462720 | consumed tokens: 37811650560 | elapsed time per iteration (s): 0.43 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.967552E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.356 | TFLOPs: 31.50 | +7: iteration 72130/ 173500 | consumed samples: 18465280 | consumed tokens: 37816893440 | elapsed time per iteration (s): 0.42 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.977819E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.828 | TFLOPs: 31.89 | +7: iteration 72140/ 173500 | consumed samples: 18467840 | consumed tokens: 37822136320 | elapsed time per iteration (s): 0.42 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.959054E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.044 | TFLOPs: 31.80 | +7: iteration 72150/ 173500 | consumed samples: 18470400 | consumed tokens: 37827379200 | elapsed time per iteration (s): 0.42 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.986054E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.476 | TFLOPs: 31.82 | +7: iteration 72160/ 173500 | consumed samples: 18472960 | consumed tokens: 37832622080 | elapsed time per iteration (s): 0.43 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.981446E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.053 | TFLOPs: 31.48 | +7: iteration 72170/ 173500 | consumed samples: 18475520 | consumed tokens: 37837864960 | elapsed time per iteration (s): 0.43 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.975591E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.415 | TFLOPs: 31.50 | +7: iteration 72180/ 173500 | consumed samples: 18478080 | consumed tokens: 37843107840 | elapsed time per iteration (s): 0.42 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.995105E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.685 | TFLOPs: 31.62 | +7: iteration 72190/ 173500 | consumed samples: 18480640 | consumed tokens: 37848350720 | elapsed time per iteration (s): 0.42 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.966291E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.641 | TFLOPs: 31.88 | +7: iteration 72200/ 173500 | consumed samples: 18483200 | consumed tokens: 37853593600 | elapsed time per iteration (s): 0.42 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.961864E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.081 | TFLOPs: 31.85 | +7: iteration 72210/ 173500 | consumed samples: 18485760 | consumed tokens: 37858836480 | elapsed time per iteration (s): 0.43 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.966041E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.448 | TFLOPs: 31.14 | +7: iteration 72220/ 173500 | consumed samples: 18488320 | consumed tokens: 37864079360 | elapsed time per iteration (s): 0.42 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.965866E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.546 | TFLOPs: 31.77 | +7: iteration 72230/ 173500 | consumed samples: 18490880 | consumed tokens: 37869322240 | elapsed time per iteration (s): 0.42 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.975809E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.902 | TFLOPs: 31.84 | +7: iteration 72240/ 173500 | consumed samples: 18493440 | consumed tokens: 37874565120 | elapsed time per iteration (s): 0.42 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.985283E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.234 | TFLOPs: 31.65 | +7: iteration 72250/ 173500 | consumed samples: 18496000 | consumed tokens: 37879808000 | elapsed time per iteration (s): 0.43 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.973094E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.037 | TFLOPs: 31.43 | +7: iteration 72260/ 173500 | consumed samples: 18498560 | consumed tokens: 37885050880 | elapsed time per iteration (s): 0.43 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.967614E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.965 | TFLOPs: 31.27 | +7: iteration 72270/ 173500 | consumed samples: 18501120 | consumed tokens: 37890293760 | elapsed time per iteration (s): 0.46 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.958680E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.740 | TFLOPs: 29.32 | +7: iteration 72280/ 173500 | consumed samples: 18503680 | consumed tokens: 37895536640 | elapsed time per iteration (s): 0.44 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.985584E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.603 | TFLOPs: 30.78 | +7: iteration 72290/ 173500 | consumed samples: 18506240 | consumed tokens: 37900779520 | elapsed time per iteration (s): 0.44 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.968532E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.707 | TFLOPs: 30.84 | +7: iteration 72300/ 173500 | consumed samples: 18508800 | consumed tokens: 37906022400 | elapsed time per iteration (s): 0.42 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.980323E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.923 | TFLOPs: 32.05 | +7: iteration 72310/ 173500 | consumed samples: 18511360 | consumed tokens: 37911265280 | elapsed time per iteration (s): 0.43 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.953834E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.023 | TFLOPs: 31.48 | +7: iteration 72320/ 173500 | consumed samples: 18513920 | consumed tokens: 37916508160 | elapsed time per iteration (s): 0.43 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.976428E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.129 | TFLOPs: 31.33 | +7: iteration 72330/ 173500 | consumed samples: 18516480 | consumed tokens: 37921751040 | elapsed time per iteration (s): 0.42 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.966617E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.306 | TFLOPs: 32.02 | +7: iteration 72340/ 173500 | consumed samples: 18519040 | consumed tokens: 37926993920 | elapsed time per iteration (s): 0.42 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.965096E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.962 | TFLOPs: 31.79 | +7: iteration 72350/ 173500 | consumed samples: 18521600 | consumed tokens: 37932236800 | elapsed time per iteration (s): 0.43 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.975937E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.364 | TFLOPs: 31.40 | +7: iteration 72360/ 173500 | consumed samples: 18524160 | consumed tokens: 37937479680 | elapsed time per iteration (s): 0.43 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.970827E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.272 | TFLOPs: 31.34 | +7: iteration 72370/ 173500 | consumed samples: 18526720 | consumed tokens: 37942722560 | elapsed time per iteration (s): 0.42 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.952296E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.884 | TFLOPs: 31.68 | +7: iteration 72380/ 173500 | consumed samples: 18529280 | consumed tokens: 37947965440 | elapsed time per iteration (s): 0.42 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.965098E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.816 | TFLOPs: 31.68 | +7: iteration 72390/ 173500 | consumed samples: 18531840 | consumed tokens: 37953208320 | elapsed time per iteration (s): 0.42 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.972284E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.501 | TFLOPs: 31.98 | +7: iteration 72400/ 173500 | consumed samples: 18534400 | consumed tokens: 37958451200 | elapsed time per iteration (s): 0.42 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.973395E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.600 | TFLOPs: 31.83 | +7: iteration 72410/ 173500 | consumed samples: 18536960 | consumed tokens: 37963694080 | elapsed time per iteration (s): 0.42 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.968947E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.237 | TFLOPs: 31.76 | +7: iteration 72420/ 173500 | consumed samples: 18539520 | consumed tokens: 37968936960 | elapsed time per iteration (s): 0.42 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.966293E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.754 | TFLOPs: 31.89 | +7: iteration 72430/ 173500 | consumed samples: 18542080 | consumed tokens: 37974179840 | elapsed time per iteration (s): 0.42 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.955625E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.888 | TFLOPs: 31.95 | +7: iteration 72440/ 173500 | consumed samples: 18544640 | consumed tokens: 37979422720 | elapsed time per iteration (s): 0.42 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.956701E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.974 | TFLOPs: 31.79 | +7: iteration 72450/ 173500 | consumed samples: 18547200 | consumed tokens: 37984665600 | elapsed time per iteration (s): 0.43 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.968734E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.870 | TFLOPs: 31.37 | +7: iteration 72460/ 173500 | consumed samples: 18549760 | consumed tokens: 37989908480 | elapsed time per iteration (s): 0.42 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.979038E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.956 | TFLOPs: 31.95 | +7: iteration 72470/ 173500 | consumed samples: 18552320 | consumed tokens: 37995151360 | elapsed time per iteration (s): 0.42 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.969263E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.135 | TFLOPs: 31.80 | +7: iteration 72480/ 173500 | consumed samples: 18554880 | consumed tokens: 38000394240 | elapsed time per iteration (s): 0.42 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.973136E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.053 | TFLOPs: 31.80 | +7: iteration 72490/ 173500 | consumed samples: 18557440 | consumed tokens: 38005637120 | elapsed time per iteration (s): 0.42 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.963042E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.204 | TFLOPs: 31.65 | +7: iteration 72500/ 173500 | consumed samples: 18560000 | consumed tokens: 38010880000 | elapsed time per iteration (s): 0.42 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.978718E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.667 | TFLOPs: 31.62 | +7: iteration 72510/ 173500 | consumed samples: 18562560 | consumed tokens: 38016122880 | elapsed time per iteration (s): 0.43 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.950537E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.816 | TFLOPs: 31.47 | +7: iteration 72520/ 173500 | consumed samples: 18565120 | consumed tokens: 38021365760 | elapsed time per iteration (s): 0.42 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.968873E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.623 | TFLOPs: 31.62 | +7: iteration 72530/ 173500 | consumed samples: 18567680 | consumed tokens: 38026608640 | elapsed time per iteration (s): 0.42 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.966572E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.200 | TFLOPs: 31.96 | +7: iteration 72540/ 173500 | consumed samples: 18570240 | consumed tokens: 38031851520 | elapsed time per iteration (s): 0.42 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.959265E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.925 | TFLOPs: 31.79 | +7: iteration 72550/ 173500 | consumed samples: 18572800 | consumed tokens: 38037094400 | elapsed time per iteration (s): 0.42 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.972418E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.968 | TFLOPs: 31.69 | +7: iteration 72560/ 173500 | consumed samples: 18575360 | consumed tokens: 38042337280 | elapsed time per iteration (s): 0.42 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.960474E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.914 | TFLOPs: 31.79 | +7: iteration 72570/ 173500 | consumed samples: 18577920 | consumed tokens: 38047580160 | elapsed time per iteration (s): 0.43 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.963468E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.700 | TFLOPs: 31.47 | +7: iteration 72580/ 173500 | consumed samples: 18580480 | consumed tokens: 38052823040 | elapsed time per iteration (s): 0.42 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.966451E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.886 | TFLOPs: 31.95 | +7: iteration 72590/ 173500 | consumed samples: 18583040 | consumed tokens: 38058065920 | elapsed time per iteration (s): 0.42 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.980589E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.914 | TFLOPs: 31.74 | +7: iteration 72600/ 173500 | consumed samples: 18585600 | consumed tokens: 38063308800 | elapsed time per iteration (s): 0.42 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.966940E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.982 | TFLOPs: 31.95 | +7: iteration 72610/ 173500 | consumed samples: 18588160 | consumed tokens: 38068551680 | elapsed time per iteration (s): 0.42 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.956580E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.247 | TFLOPs: 31.86 | +7: iteration 72620/ 173500 | consumed samples: 18590720 | consumed tokens: 38073794560 | elapsed time per iteration (s): 0.42 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.972143E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.495 | TFLOPs: 31.93 | +7: iteration 72630/ 173500 | consumed samples: 18593280 | consumed tokens: 38079037440 | elapsed time per iteration (s): 0.42 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.963666E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.147 | TFLOPs: 31.80 | +7: iteration 72640/ 173500 | consumed samples: 18595840 | consumed tokens: 38084280320 | elapsed time per iteration (s): 0.42 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.966586E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.206 | TFLOPs: 31.75 | +7: iteration 72650/ 173500 | consumed samples: 18598400 | consumed tokens: 38089523200 | elapsed time per iteration (s): 0.43 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.986080E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.255 | TFLOPs: 31.55 | +7: iteration 72660/ 173500 | consumed samples: 18600960 | consumed tokens: 38094766080 | elapsed time per iteration (s): 0.43 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.965973E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.508 | TFLOPs: 31.56 | +7: iteration 72670/ 173500 | consumed samples: 18603520 | consumed tokens: 38100008960 | elapsed time per iteration (s): 0.42 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.978558E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.621 | TFLOPs: 31.72 | +7: iteration 72680/ 173500 | consumed samples: 18606080 | consumed tokens: 38105251840 | elapsed time per iteration (s): 0.42 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.964510E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.803 | TFLOPs: 31.73 | +7: iteration 72690/ 173500 | consumed samples: 18608640 | consumed tokens: 38110494720 | elapsed time per iteration (s): 0.42 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.973205E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.973 | TFLOPs: 31.74 | +7: iteration 72700/ 173500 | consumed samples: 18611200 | consumed tokens: 38115737600 | elapsed time per iteration (s): 0.44 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.969696E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.402 | TFLOPs: 30.87 | +7: iteration 72710/ 173500 | consumed samples: 18613760 | consumed tokens: 38120980480 | elapsed time per iteration (s): 0.42 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.968209E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.262 | TFLOPs: 31.97 | +7: iteration 72720/ 173500 | consumed samples: 18616320 | consumed tokens: 38126223360 | elapsed time per iteration (s): 0.43 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.952940E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.536 | TFLOPs: 31.51 | +7: iteration 72730/ 173500 | consumed samples: 18618880 | consumed tokens: 38131466240 | elapsed time per iteration (s): 0.44 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.966693E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.178 | TFLOPs: 30.86 | +7: iteration 72740/ 173500 | consumed samples: 18621440 | consumed tokens: 38136709120 | elapsed time per iteration (s): 0.43 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.964210E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.990 | TFLOPs: 31.17 | +7: iteration 72750/ 173500 | consumed samples: 18624000 | consumed tokens: 38141952000 | elapsed time per iteration (s): 0.42 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.955326E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.178 | TFLOPs: 32.02 | +7: iteration 72760/ 173500 | consumed samples: 18626560 | consumed tokens: 38147194880 | elapsed time per iteration (s): 0.42 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.972474E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.376 | TFLOPs: 31.97 | +7: iteration 72770/ 173500 | consumed samples: 18629120 | consumed tokens: 38152437760 | elapsed time per iteration (s): 0.42 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.982464E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.356 | TFLOPs: 31.97 | +7: iteration 72780/ 173500 | consumed samples: 18631680 | consumed tokens: 38157680640 | elapsed time per iteration (s): 0.43 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.973088E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.694 | TFLOPs: 31.31 | +7: iteration 72790/ 173500 | consumed samples: 18634240 | consumed tokens: 38162923520 | elapsed time per iteration (s): 0.44 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.964315E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.020 | TFLOPs: 30.49 | +7: iteration 72800/ 173500 | consumed samples: 18636800 | consumed tokens: 38168166400 | elapsed time per iteration (s): 0.43 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.968679E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.949 | TFLOPs: 31.37 | +7: iteration 72810/ 173500 | consumed samples: 18639360 | consumed tokens: 38173409280 | elapsed time per iteration (s): 0.42 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.961191E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.015 | TFLOPs: 32.01 | +7: iteration 72820/ 173500 | consumed samples: 18641920 | consumed tokens: 38178652160 | elapsed time per iteration (s): 0.42 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.963756E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.028 | TFLOPs: 31.85 | +7: iteration 72830/ 173500 | consumed samples: 18644480 | consumed tokens: 38183895040 | elapsed time per iteration (s): 0.42 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.967857E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.445 | TFLOPs: 31.77 | +7: iteration 72840/ 173500 | consumed samples: 18647040 | consumed tokens: 38189137920 | elapsed time per iteration (s): 0.42 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.964044E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.359 | TFLOPs: 31.76 | +7: iteration 72850/ 173500 | consumed samples: 18649600 | consumed tokens: 38194380800 | elapsed time per iteration (s): 0.42 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.963897E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.695 | TFLOPs: 31.67 | +7: iteration 72860/ 173500 | consumed samples: 18652160 | consumed tokens: 38199623680 | elapsed time per iteration (s): 0.43 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.962214E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.591 | TFLOPs: 31.51 | +7: iteration 72870/ 173500 | consumed samples: 18654720 | consumed tokens: 38204866560 | elapsed time per iteration (s): 0.42 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.957588E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.036 | TFLOPs: 31.75 | +7: iteration 72880/ 173500 | consumed samples: 18657280 | consumed tokens: 38210109440 | elapsed time per iteration (s): 0.43 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.979777E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.671 | TFLOPs: 31.41 | +7: iteration 72890/ 173500 | consumed samples: 18659840 | consumed tokens: 38215352320 | elapsed time per iteration (s): 0.42 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.972830E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.854 | TFLOPs: 31.84 | +7: iteration 72900/ 173500 | consumed samples: 18662400 | consumed tokens: 38220595200 | elapsed time per iteration (s): 0.43 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.962321E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.473 | TFLOPs: 31.30 | +7: iteration 72910/ 173500 | consumed samples: 18664960 | consumed tokens: 38225838080 | elapsed time per iteration (s): 0.43 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.980992E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.950 | TFLOPs: 31.06 | +7: iteration 72920/ 173500 | consumed samples: 18667520 | consumed tokens: 38231080960 | elapsed time per iteration (s): 0.46 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.975890E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.416 | TFLOPs: 29.30 | +7: iteration 72930/ 173500 | consumed samples: 18670080 | consumed tokens: 38236323840 | elapsed time per iteration (s): 0.46 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.959009E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.434 | TFLOPs: 29.25 | +7: iteration 72940/ 173500 | consumed samples: 18672640 | consumed tokens: 38241566720 | elapsed time per iteration (s): 0.44 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.986132E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.149 | TFLOPs: 30.44 | +7: iteration 72950/ 173500 | consumed samples: 18675200 | consumed tokens: 38246809600 | elapsed time per iteration (s): 0.45 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.965825E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.433 | TFLOPs: 29.82 | +7: iteration 72960/ 173500 | consumed samples: 18677760 | consumed tokens: 38252052480 | elapsed time per iteration (s): 0.46 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.983862E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.856 | TFLOPs: 29.48 | +7: iteration 72970/ 173500 | consumed samples: 18680320 | consumed tokens: 38257295360 | elapsed time per iteration (s): 0.42 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.950113E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.249 | TFLOPs: 31.91 | +7: iteration 72980/ 173500 | consumed samples: 18682880 | consumed tokens: 38262538240 | elapsed time per iteration (s): 0.43 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.967103E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.200 | TFLOPs: 31.07 | +7: iteration 72990/ 173500 | consumed samples: 18685440 | consumed tokens: 38267781120 | elapsed time per iteration (s): 0.42 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.969882E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.943 | TFLOPs: 31.69 | +7: iteration 73000/ 173500 | consumed samples: 18688000 | consumed tokens: 38273024000 | elapsed time per iteration (s): 0.44 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.969516E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.026 | TFLOPs: 30.70 | +7: iteration 73010/ 173500 | consumed samples: 18690560 | consumed tokens: 38278266880 | elapsed time per iteration (s): 0.43 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.968410E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.977 | TFLOPs: 31.32 | +7: iteration 73020/ 173500 | consumed samples: 18693120 | consumed tokens: 38283509760 | elapsed time per iteration (s): 0.43 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.965377E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.191 | TFLOPs: 31.02 | +7: iteration 73030/ 173500 | consumed samples: 18695680 | consumed tokens: 38288752640 | elapsed time per iteration (s): 0.45 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.971791E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.002 | TFLOPs: 29.54 | +7: iteration 73040/ 173500 | consumed samples: 18698240 | consumed tokens: 38293995520 | elapsed time per iteration (s): 0.45 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.966830E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.532 | TFLOPs: 29.73 | +7: iteration 73050/ 173500 | consumed samples: 18700800 | consumed tokens: 38299238400 | elapsed time per iteration (s): 0.47 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.975078E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 546.015 | TFLOPs: 28.65 | +7: iteration 73060/ 173500 | consumed samples: 18703360 | consumed tokens: 38304481280 | elapsed time per iteration (s): 0.45 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.973481E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.349 | TFLOPs: 29.77 | +7: iteration 73070/ 173500 | consumed samples: 18705920 | consumed tokens: 38309724160 | elapsed time per iteration (s): 0.46 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.977262E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.838 | TFLOPs: 29.37 | +7: iteration 73080/ 173500 | consumed samples: 18708480 | consumed tokens: 38314967040 | elapsed time per iteration (s): 0.46 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.962367E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.596 | TFLOPs: 29.36 | +7: iteration 73090/ 173500 | consumed samples: 18711040 | consumed tokens: 38320209920 | elapsed time per iteration (s): 0.46 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.973177E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.716 | TFLOPs: 29.47 | +7: iteration 73100/ 173500 | consumed samples: 18713600 | consumed tokens: 38325452800 | elapsed time per iteration (s): 0.44 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.956071E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.714 | TFLOPs: 30.21 | +7: iteration 73110/ 173500 | consumed samples: 18716160 | consumed tokens: 38330695680 | elapsed time per iteration (s): 0.43 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.979081E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.079 | TFLOPs: 31.38 | +7: iteration 73120/ 173500 | consumed samples: 18718720 | consumed tokens: 38335938560 | elapsed time per iteration (s): 0.45 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.972082E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.871 | TFLOPs: 30.01 | +7: iteration 73130/ 173500 | consumed samples: 18721280 | consumed tokens: 38341181440 | elapsed time per iteration (s): 0.43 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.966782E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.494 | TFLOPs: 30.93 | +7: iteration 73140/ 173500 | consumed samples: 18723840 | consumed tokens: 38346424320 | elapsed time per iteration (s): 0.43 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.971090E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.325 | TFLOPs: 31.18 | +7: iteration 73150/ 173500 | consumed samples: 18726400 | consumed tokens: 38351667200 | elapsed time per iteration (s): 0.43 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.958916E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.172 | TFLOPs: 31.49 | +7: iteration 73160/ 173500 | consumed samples: 18728960 | consumed tokens: 38356910080 | elapsed time per iteration (s): 0.44 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.971416E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.859 | TFLOPs: 30.42 | +7: iteration 73170/ 173500 | consumed samples: 18731520 | consumed tokens: 38362152960 | elapsed time per iteration (s): 0.44 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.971766E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.622 | TFLOPs: 30.36 | +7: iteration 73180/ 173500 | consumed samples: 18734080 | consumed tokens: 38367395840 | elapsed time per iteration (s): 0.43 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.981688E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.815 | TFLOPs: 31.05 | +7: iteration 73190/ 173500 | consumed samples: 18736640 | consumed tokens: 38372638720 | elapsed time per iteration (s): 0.44 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.975684E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.420 | TFLOPs: 30.51 | +7: iteration 73200/ 173500 | consumed samples: 18739200 | consumed tokens: 38377881600 | elapsed time per iteration (s): 0.44 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.971796E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.602 | TFLOPs: 30.25 | +7: iteration 73210/ 173500 | consumed samples: 18741760 | consumed tokens: 38383124480 | elapsed time per iteration (s): 0.43 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.962355E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.101 | TFLOPs: 31.43 | +7: iteration 73220/ 173500 | consumed samples: 18744320 | consumed tokens: 38388367360 | elapsed time per iteration (s): 0.44 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.970784E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.758 | TFLOPs: 30.37 | +7: iteration 73230/ 173500 | consumed samples: 18746880 | consumed tokens: 38393610240 | elapsed time per iteration (s): 0.43 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.972025E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.534 | TFLOPs: 31.56 | +7: iteration 73240/ 173500 | consumed samples: 18749440 | consumed tokens: 38398853120 | elapsed time per iteration (s): 0.43 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.967940E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.604 | TFLOPs: 31.46 | +7: iteration 73250/ 173500 | consumed samples: 18752000 | consumed tokens: 38404096000 | elapsed time per iteration (s): 0.43 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.960639E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.223 | TFLOPs: 31.07 | +7: iteration 73260/ 173500 | consumed samples: 18754560 | consumed tokens: 38409338880 | elapsed time per iteration (s): 0.44 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.972834E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.423 | TFLOPs: 30.35 | +7: iteration 73270/ 173500 | consumed samples: 18757120 | consumed tokens: 38414581760 | elapsed time per iteration (s): 0.43 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.968925E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.781 | TFLOPs: 31.52 | +7: iteration 73280/ 173500 | consumed samples: 18759680 | consumed tokens: 38419824640 | elapsed time per iteration (s): 0.43 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.963901E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.194 | TFLOPs: 31.07 | +7: iteration 73290/ 173500 | consumed samples: 18762240 | consumed tokens: 38425067520 | elapsed time per iteration (s): 0.44 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.964524E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.894 | TFLOPs: 30.85 | +7: iteration 73300/ 173500 | consumed samples: 18764800 | consumed tokens: 38430310400 | elapsed time per iteration (s): 0.43 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.991624E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.282 | TFLOPs: 31.34 | +7: iteration 73310/ 173500 | consumed samples: 18767360 | consumed tokens: 38435553280 | elapsed time per iteration (s): 0.43 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.962744E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.794 | TFLOPs: 31.10 | +7: iteration 73320/ 173500 | consumed samples: 18769920 | consumed tokens: 38440796160 | elapsed time per iteration (s): 0.43 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.980967E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.056 | TFLOPs: 31.27 | +7: iteration 73330/ 173500 | consumed samples: 18772480 | consumed tokens: 38446039040 | elapsed time per iteration (s): 0.43 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.966827E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.732 | TFLOPs: 31.05 | +7: iteration 73340/ 173500 | consumed samples: 18775040 | consumed tokens: 38451281920 | elapsed time per iteration (s): 0.43 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.963292E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.466 | TFLOPs: 31.35 | +7: iteration 73350/ 173500 | consumed samples: 18777600 | consumed tokens: 38456524800 | elapsed time per iteration (s): 0.43 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.971458E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.601 | TFLOPs: 30.99 | +7: iteration 73360/ 173500 | consumed samples: 18780160 | consumed tokens: 38461767680 | elapsed time per iteration (s): 0.44 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.954082E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.498 | TFLOPs: 30.62 | +7: iteration 73370/ 173500 | consumed samples: 18782720 | consumed tokens: 38467010560 | elapsed time per iteration (s): 0.43 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.971341E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.763 | TFLOPs: 31.00 | +7: iteration 73380/ 173500 | consumed samples: 18785280 | consumed tokens: 38472253440 | elapsed time per iteration (s): 0.43 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.973211E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.583 | TFLOPs: 31.04 | +7: iteration 73390/ 173500 | consumed samples: 18787840 | consumed tokens: 38477496320 | elapsed time per iteration (s): 0.44 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.967138E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.219 | TFLOPs: 30.29 | +7: iteration 73400/ 173500 | consumed samples: 18790400 | consumed tokens: 38482739200 | elapsed time per iteration (s): 0.42 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.970175E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.557 | TFLOPs: 31.67 | +7: iteration 73410/ 173500 | consumed samples: 18792960 | consumed tokens: 38487982080 | elapsed time per iteration (s): 0.43 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.960404E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.659 | TFLOPs: 31.57 | +7: iteration 73420/ 173500 | consumed samples: 18795520 | consumed tokens: 38493224960 | elapsed time per iteration (s): 0.43 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.960548E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.992 | TFLOPs: 30.96 | +7: iteration 73430/ 173500 | consumed samples: 18798080 | consumed tokens: 38498467840 | elapsed time per iteration (s): 0.44 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.959307E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.505 | TFLOPs: 30.51 | +7: iteration 73440/ 173500 | consumed samples: 18800640 | consumed tokens: 38503710720 | elapsed time per iteration (s): 0.43 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.953365E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.066 | TFLOPs: 31.33 | +7: iteration 73450/ 173500 | consumed samples: 18803200 | consumed tokens: 38508953600 | elapsed time per iteration (s): 0.43 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.963255E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.356 | TFLOPs: 31.50 | +7: iteration 73460/ 173500 | consumed samples: 18805760 | consumed tokens: 38514196480 | elapsed time per iteration (s): 0.44 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.968779E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.266 | TFLOPs: 30.39 | +7: iteration 73470/ 173500 | consumed samples: 18808320 | consumed tokens: 38519439360 | elapsed time per iteration (s): 0.44 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.970185E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.094 | TFLOPs: 30.33 | +7: iteration 73480/ 173500 | consumed samples: 18810880 | consumed tokens: 38524682240 | elapsed time per iteration (s): 0.44 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.957481E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.734 | TFLOPs: 30.58 | +7: iteration 73490/ 173500 | consumed samples: 18813440 | consumed tokens: 38529925120 | elapsed time per iteration (s): 0.43 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.968150E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.748 | TFLOPs: 31.42 | +7: iteration 73500/ 173500 | consumed samples: 18816000 | consumed tokens: 38535168000 | elapsed time per iteration (s): 0.43 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.967208E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.993 | TFLOPs: 31.17 | +7: iteration 73510/ 173500 | consumed samples: 18818560 | consumed tokens: 38540410880 | elapsed time per iteration (s): 0.43 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.977009E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.444 | TFLOPs: 30.98 | +7: iteration 73520/ 173500 | consumed samples: 18821120 | consumed tokens: 38545653760 | elapsed time per iteration (s): 0.44 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.962888E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.322 | TFLOPs: 30.71 | +7: iteration 73530/ 173500 | consumed samples: 18823680 | consumed tokens: 38550896640 | elapsed time per iteration (s): 0.43 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.984064E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.864 | TFLOPs: 31.11 | +7: iteration 73540/ 173500 | consumed samples: 18826240 | consumed tokens: 38556139520 | elapsed time per iteration (s): 0.44 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.966779E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.306 | TFLOPs: 30.66 | +7: iteration 73550/ 173500 | consumed samples: 18828800 | consumed tokens: 38561382400 | elapsed time per iteration (s): 0.43 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.976579E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.730 | TFLOPs: 30.89 | +7: iteration 73560/ 173500 | consumed samples: 18831360 | consumed tokens: 38566625280 | elapsed time per iteration (s): 0.43 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.955535E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.338 | TFLOPs: 31.18 | +7: iteration 73570/ 173500 | consumed samples: 18833920 | consumed tokens: 38571868160 | elapsed time per iteration (s): 0.42 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.966932E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.441 | TFLOPs: 31.71 | +7: iteration 73580/ 173500 | consumed samples: 18836480 | consumed tokens: 38577111040 | elapsed time per iteration (s): 0.44 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.956746E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.483 | TFLOPs: 30.25 | +7: iteration 73590/ 173500 | consumed samples: 18839040 | consumed tokens: 38582353920 | elapsed time per iteration (s): 0.43 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.966438E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.585 | TFLOPs: 30.93 | +7: iteration 73600/ 173500 | consumed samples: 18841600 | consumed tokens: 38587596800 | elapsed time per iteration (s): 0.43 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.966982E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.102 | TFLOPs: 31.59 | +7: iteration 73610/ 173500 | consumed samples: 18844160 | consumed tokens: 38592839680 | elapsed time per iteration (s): 0.43 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.963043E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.242 | TFLOPs: 31.23 | +7: iteration 73620/ 173500 | consumed samples: 18846720 | consumed tokens: 38598082560 | elapsed time per iteration (s): 0.44 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.987234E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.368 | TFLOPs: 30.71 | +7: iteration 73630/ 173500 | consumed samples: 18849280 | consumed tokens: 38603325440 | elapsed time per iteration (s): 0.43 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.975805E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.826 | TFLOPs: 31.00 | +7: iteration 73640/ 173500 | consumed samples: 18851840 | consumed tokens: 38608568320 | elapsed time per iteration (s): 0.44 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.978510E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.891 | TFLOPs: 30.58 | +7: iteration 73650/ 173500 | consumed samples: 18854400 | consumed tokens: 38613811200 | elapsed time per iteration (s): 0.43 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.967014E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.288 | TFLOPs: 31.18 | +7: iteration 73660/ 173500 | consumed samples: 18856960 | consumed tokens: 38619054080 | elapsed time per iteration (s): 0.43 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.972266E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.118 | TFLOPs: 31.07 | +7: iteration 73670/ 173500 | consumed samples: 18859520 | consumed tokens: 38624296960 | elapsed time per iteration (s): 0.43 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.969030E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.327 | TFLOPs: 31.50 | +7: iteration 73680/ 173500 | consumed samples: 18862080 | consumed tokens: 38629539840 | elapsed time per iteration (s): 0.45 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.968185E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.112 | TFLOPs: 30.02 | +7: iteration 73690/ 173500 | consumed samples: 18864640 | consumed tokens: 38634782720 | elapsed time per iteration (s): 0.44 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.971707E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.748 | TFLOPs: 30.58 | +7: iteration 73700/ 173500 | consumed samples: 18867200 | consumed tokens: 38640025600 | elapsed time per iteration (s): 0.45 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.957005E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.312 | TFLOPs: 29.61 | +7: iteration 73710/ 173500 | consumed samples: 18869760 | consumed tokens: 38645268480 | elapsed time per iteration (s): 0.44 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.977110E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.454 | TFLOPs: 30.46 | +7: iteration 73720/ 173500 | consumed samples: 18872320 | consumed tokens: 38650511360 | elapsed time per iteration (s): 0.44 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.973890E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.277 | TFLOPs: 30.87 | +7: iteration 73730/ 173500 | consumed samples: 18874880 | consumed tokens: 38655754240 | elapsed time per iteration (s): 0.44 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.969565E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.382 | TFLOPs: 30.61 | +7: iteration 73740/ 173500 | consumed samples: 18877440 | consumed tokens: 38660997120 | elapsed time per iteration (s): 0.43 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.968467E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.958 | TFLOPs: 30.95 | +7: iteration 73750/ 173500 | consumed samples: 18880000 | consumed tokens: 38666240000 | elapsed time per iteration (s): 0.43 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.977532E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.532 | TFLOPs: 31.25 | +7: iteration 73760/ 173500 | consumed samples: 18882560 | consumed tokens: 38671482880 | elapsed time per iteration (s): 0.43 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.969159E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.038 | TFLOPs: 31.59 | +7: iteration 73770/ 173500 | consumed samples: 18885120 | consumed tokens: 38676725760 | elapsed time per iteration (s): 0.43 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.958274E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.292 | TFLOPs: 31.34 | +7: iteration 73780/ 173500 | consumed samples: 18887680 | consumed tokens: 38681968640 | elapsed time per iteration (s): 0.44 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.976011E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.213 | TFLOPs: 30.76 | +7: iteration 73790/ 173500 | consumed samples: 18890240 | consumed tokens: 38687211520 | elapsed time per iteration (s): 0.43 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.962613E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.360 | TFLOPs: 31.08 | +7: iteration 73800/ 173500 | consumed samples: 18892800 | consumed tokens: 38692454400 | elapsed time per iteration (s): 0.44 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.960140E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.103 | TFLOPs: 30.44 | +7: iteration 73810/ 173500 | consumed samples: 18895360 | consumed tokens: 38697697280 | elapsed time per iteration (s): 0.43 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.961523E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.267 | TFLOPs: 31.18 | +7: iteration 73820/ 173500 | consumed samples: 18897920 | consumed tokens: 38702940160 | elapsed time per iteration (s): 0.43 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.970835E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.722 | TFLOPs: 31.20 | +7: iteration 73830/ 173500 | consumed samples: 18900480 | consumed tokens: 38708183040 | elapsed time per iteration (s): 0.43 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.981796E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.856 | TFLOPs: 31.58 | +7: iteration 73840/ 173500 | consumed samples: 18903040 | consumed tokens: 38713425920 | elapsed time per iteration (s): 0.43 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.977145E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.449 | TFLOPs: 30.98 | +7: iteration 73850/ 173500 | consumed samples: 18905600 | consumed tokens: 38718668800 | elapsed time per iteration (s): 0.42 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.972403E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.888 | TFLOPs: 31.89 | +7: iteration 73860/ 173500 | consumed samples: 18908160 | consumed tokens: 38723911680 | elapsed time per iteration (s): 0.46 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.963833E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.261 | TFLOPs: 29.03 | +7: iteration 73870/ 173500 | consumed samples: 18910720 | consumed tokens: 38729154560 | elapsed time per iteration (s): 0.43 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.963361E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.312 | TFLOPs: 30.97 | +7: iteration 73880/ 173500 | consumed samples: 18913280 | consumed tokens: 38734397440 | elapsed time per iteration (s): 0.45 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.974134E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.720 | TFLOPs: 29.89 | +7: iteration 73890/ 173500 | consumed samples: 18915840 | consumed tokens: 38739640320 | elapsed time per iteration (s): 0.43 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.983009E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.627 | TFLOPs: 31.51 | +7: iteration 73900/ 173500 | consumed samples: 18918400 | consumed tokens: 38744883200 | elapsed time per iteration (s): 0.43 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.963201E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.286 | TFLOPs: 31.29 | +7: iteration 73910/ 173500 | consumed samples: 18920960 | consumed tokens: 38750126080 | elapsed time per iteration (s): 0.43 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.960680E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.356 | TFLOPs: 31.13 | +7: iteration 73920/ 173500 | consumed samples: 18923520 | consumed tokens: 38755368960 | elapsed time per iteration (s): 0.43 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.978123E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.322 | TFLOPs: 31.45 | +7: iteration 73930/ 173500 | consumed samples: 18926080 | consumed tokens: 38760611840 | elapsed time per iteration (s): 0.44 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.959849E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.245 | TFLOPs: 30.76 | +7: iteration 73940/ 173500 | consumed samples: 18928640 | consumed tokens: 38765854720 | elapsed time per iteration (s): 0.43 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.972942E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.122 | TFLOPs: 31.28 | +7: iteration 73950/ 173500 | consumed samples: 18931200 | consumed tokens: 38771097600 | elapsed time per iteration (s): 0.42 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.952620E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.907 | TFLOPs: 31.63 | +7: iteration 73960/ 173500 | consumed samples: 18933760 | consumed tokens: 38776340480 | elapsed time per iteration (s): 0.43 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.970177E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.371 | TFLOPs: 31.29 | +7: iteration 73970/ 173500 | consumed samples: 18936320 | consumed tokens: 38781583360 | elapsed time per iteration (s): 0.42 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.969925E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.946 | TFLOPs: 32.00 | +7: iteration 73980/ 173500 | consumed samples: 18938880 | consumed tokens: 38786826240 | elapsed time per iteration (s): 0.43 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.959192E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.745 | TFLOPs: 31.26 | +7: iteration 73990/ 173500 | consumed samples: 18941440 | consumed tokens: 38792069120 | elapsed time per iteration (s): 0.43 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.978917E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.692 | TFLOPs: 31.31 | +0: [2023-03-17 07:58:13,633] [INFO] [logging.py:68:log_dist] [Rank 0] step=74000, skipped=0, lr=[0.0001321851851828754, 0.0001321851851828754, 0.0001321851851828754], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 74000/ 173500 | consumed samples: 18944000 | consumed tokens: 38797312000 | elapsed time per iteration (s): 0.42 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.970102E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.730 | TFLOPs: 31.83 | +0: steps: 74000 loss: 2.9669 iter time (s): 0.429 samples/sec: 596.905 +7: iteration 74010/ 173500 | consumed samples: 18946560 | consumed tokens: 38802554880 | elapsed time per iteration (s): 0.44 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.960166E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.988 | TFLOPs: 30.22 | +7: iteration 74020/ 173500 | consumed samples: 18949120 | consumed tokens: 38807797760 | elapsed time per iteration (s): 0.43 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.952180E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.167 | TFLOPs: 31.12 | +7: iteration 74030/ 173500 | consumed samples: 18951680 | consumed tokens: 38813040640 | elapsed time per iteration (s): 0.43 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.970459E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.031 | TFLOPs: 31.27 | +7: iteration 74040/ 173500 | consumed samples: 18954240 | consumed tokens: 38818283520 | elapsed time per iteration (s): 0.42 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.965188E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.972 | TFLOPs: 31.79 | +7: iteration 74050/ 173500 | consumed samples: 18956800 | consumed tokens: 38823526400 | elapsed time per iteration (s): 0.44 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.969445E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.759 | TFLOPs: 30.79 | +7: iteration 74060/ 173500 | consumed samples: 18959360 | consumed tokens: 38828769280 | elapsed time per iteration (s): 0.44 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.971500E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.895 | TFLOPs: 30.58 | +7: iteration 74070/ 173500 | consumed samples: 18961920 | consumed tokens: 38834012160 | elapsed time per iteration (s): 0.42 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.965964E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.708 | TFLOPs: 31.68 | +7: iteration 74080/ 173500 | consumed samples: 18964480 | consumed tokens: 38839255040 | elapsed time per iteration (s): 0.44 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.964481E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.425 | TFLOPs: 30.87 | +7: iteration 74090/ 173500 | consumed samples: 18967040 | consumed tokens: 38844497920 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.964645E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.206 | TFLOPs: 31.44 | +7: iteration 74100/ 173500 | consumed samples: 18969600 | consumed tokens: 38849740800 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.971151E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.632 | TFLOPs: 30.99 | +7: iteration 74110/ 173500 | consumed samples: 18972160 | consumed tokens: 38854983680 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.963445E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.990 | TFLOPs: 31.06 | +7: iteration 74120/ 173500 | consumed samples: 18974720 | consumed tokens: 38860226560 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.957308E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.874 | TFLOPs: 31.42 | +7: iteration 74130/ 173500 | consumed samples: 18977280 | consumed tokens: 38865469440 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.955770E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.656 | TFLOPs: 31.57 | +7: iteration 74140/ 173500 | consumed samples: 18979840 | consumed tokens: 38870712320 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.978922E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.277 | TFLOPs: 31.29 | +7: iteration 74150/ 173500 | consumed samples: 18982400 | consumed tokens: 38875955200 | elapsed time per iteration (s): 0.43 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.953286E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.183 | TFLOPs: 31.54 | +7: iteration 74160/ 173500 | consumed samples: 18984960 | consumed tokens: 38881198080 | elapsed time per iteration (s): 0.43 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.962717E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.540 | TFLOPs: 31.35 | +7: iteration 74170/ 173500 | consumed samples: 18987520 | consumed tokens: 38886440960 | elapsed time per iteration (s): 0.43 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.952920E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.312 | TFLOPs: 31.55 | +7: iteration 74180/ 173500 | consumed samples: 18990080 | consumed tokens: 38891683840 | elapsed time per iteration (s): 0.42 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.965530E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.847 | TFLOPs: 31.79 | +7: iteration 74190/ 173500 | consumed samples: 18992640 | consumed tokens: 38896926720 | elapsed time per iteration (s): 0.43 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.963183E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.744 | TFLOPs: 31.52 | +7: iteration 74200/ 173500 | consumed samples: 18995200 | consumed tokens: 38902169600 | elapsed time per iteration (s): 0.42 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.961592E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.005 | TFLOPs: 31.74 | +7: iteration 74210/ 173500 | consumed samples: 18997760 | consumed tokens: 38907412480 | elapsed time per iteration (s): 0.43 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.965506E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.758 | TFLOPs: 31.31 | +7: iteration 74220/ 173500 | consumed samples: 19000320 | consumed tokens: 38912655360 | elapsed time per iteration (s): 0.42 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.965930E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.408 | TFLOPs: 31.92 | +7: iteration 74230/ 173500 | consumed samples: 19002880 | consumed tokens: 38917898240 | elapsed time per iteration (s): 0.42 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.969199E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.596 | TFLOPs: 31.72 | +7: iteration 74240/ 173500 | consumed samples: 19005440 | consumed tokens: 38923141120 | elapsed time per iteration (s): 0.42 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.970716E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.469 | TFLOPs: 31.66 | +7: iteration 74250/ 173500 | consumed samples: 19008000 | consumed tokens: 38928384000 | elapsed time per iteration (s): 0.42 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.955524E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.945 | TFLOPs: 31.85 | +7: iteration 74260/ 173500 | consumed samples: 19010560 | consumed tokens: 38933626880 | elapsed time per iteration (s): 0.42 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.977272E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.783 | TFLOPs: 31.68 | +7: iteration 74270/ 173500 | consumed samples: 19013120 | consumed tokens: 38938869760 | elapsed time per iteration (s): 0.42 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.971144E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.717 | TFLOPs: 31.68 | +7: iteration 74280/ 173500 | consumed samples: 19015680 | consumed tokens: 38944112640 | elapsed time per iteration (s): 0.42 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.956701E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.176 | TFLOPs: 31.75 | +7: iteration 74290/ 173500 | consumed samples: 19018240 | consumed tokens: 38949355520 | elapsed time per iteration (s): 0.43 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.959213E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.870 | TFLOPs: 31.32 | +7: iteration 74300/ 173500 | consumed samples: 19020800 | consumed tokens: 38954598400 | elapsed time per iteration (s): 0.43 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.965599E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.851 | TFLOPs: 31.53 | +7: iteration 74310/ 173500 | consumed samples: 19023360 | consumed tokens: 38959841280 | elapsed time per iteration (s): 0.42 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.954788E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.493 | TFLOPs: 31.93 | +7: iteration 74320/ 173500 | consumed samples: 19025920 | consumed tokens: 38965084160 | elapsed time per iteration (s): 0.43 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.978394E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.045 | TFLOPs: 31.54 | +7: iteration 74330/ 173500 | consumed samples: 19028480 | consumed tokens: 38970327040 | elapsed time per iteration (s): 0.42 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.960118E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.713 | TFLOPs: 31.73 | +7: iteration 74340/ 173500 | consumed samples: 19031040 | consumed tokens: 38975569920 | elapsed time per iteration (s): 0.42 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.957955E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.068 | TFLOPs: 31.64 | +7: iteration 74350/ 173500 | consumed samples: 19033600 | consumed tokens: 38980812800 | elapsed time per iteration (s): 0.43 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.960384E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.616 | TFLOPs: 31.51 | +7: iteration 74360/ 173500 | consumed samples: 19036160 | consumed tokens: 38986055680 | elapsed time per iteration (s): 0.42 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.978368E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.175 | TFLOPs: 31.75 | +7: iteration 74370/ 173500 | consumed samples: 19038720 | consumed tokens: 38991298560 | elapsed time per iteration (s): 0.43 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.953794E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.813 | TFLOPs: 31.58 | +7: iteration 74380/ 173500 | consumed samples: 19041280 | consumed tokens: 38996541440 | elapsed time per iteration (s): 0.43 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.969028E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.752 | TFLOPs: 31.36 | +7: iteration 74390/ 173500 | consumed samples: 19043840 | consumed tokens: 39001784320 | elapsed time per iteration (s): 0.42 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.934390E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.612 | TFLOPs: 31.62 | +7: iteration 74400/ 173500 | consumed samples: 19046400 | consumed tokens: 39007027200 | elapsed time per iteration (s): 0.43 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.971561E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.009 | TFLOPs: 31.32 | +7: iteration 74410/ 173500 | consumed samples: 19048960 | consumed tokens: 39012270080 | elapsed time per iteration (s): 0.43 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.967676E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.428 | TFLOPs: 31.56 | +7: iteration 74420/ 173500 | consumed samples: 19051520 | consumed tokens: 39017512960 | elapsed time per iteration (s): 0.42 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.965708E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.469 | TFLOPs: 31.93 | +7: iteration 74430/ 173500 | consumed samples: 19054080 | consumed tokens: 39022755840 | elapsed time per iteration (s): 0.42 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.958446E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.701 | TFLOPs: 31.78 | +7: iteration 74440/ 173500 | consumed samples: 19056640 | consumed tokens: 39027998720 | elapsed time per iteration (s): 0.42 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.968085E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.706 | TFLOPs: 31.78 | +7: iteration 74450/ 173500 | consumed samples: 19059200 | consumed tokens: 39033241600 | elapsed time per iteration (s): 0.42 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.970026E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.449 | TFLOPs: 31.71 | +7: iteration 74460/ 173500 | consumed samples: 19061760 | consumed tokens: 39038484480 | elapsed time per iteration (s): 0.42 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.959318E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.664 | TFLOPs: 31.78 | +7: iteration 74470/ 173500 | consumed samples: 19064320 | consumed tokens: 39043727360 | elapsed time per iteration (s): 0.43 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.968627E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.721 | TFLOPs: 31.26 | +7: iteration 74480/ 173500 | consumed samples: 19066880 | consumed tokens: 39048970240 | elapsed time per iteration (s): 0.42 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.966714E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.869 | TFLOPs: 31.89 | +7: iteration 74490/ 173500 | consumed samples: 19069440 | consumed tokens: 39054213120 | elapsed time per iteration (s): 0.43 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.963307E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.581 | TFLOPs: 31.51 | +7: iteration 74500/ 173500 | consumed samples: 19072000 | consumed tokens: 39059456000 | elapsed time per iteration (s): 0.43 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.960966E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.925 | TFLOPs: 31.32 | +7: iteration 74510/ 173500 | consumed samples: 19074560 | consumed tokens: 39064698880 | elapsed time per iteration (s): 0.43 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.971907E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.565 | TFLOPs: 31.51 | +7: iteration 74520/ 173500 | consumed samples: 19077120 | consumed tokens: 39069941760 | elapsed time per iteration (s): 0.43 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.957616E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.573 | TFLOPs: 31.04 | +7: iteration 74530/ 173500 | consumed samples: 19079680 | consumed tokens: 39075184640 | elapsed time per iteration (s): 0.42 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.966998E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.262 | TFLOPs: 31.65 | +7: iteration 74540/ 173500 | consumed samples: 19082240 | consumed tokens: 39080427520 | elapsed time per iteration (s): 0.43 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.969301E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.177 | TFLOPs: 31.60 | +7: iteration 74550/ 173500 | consumed samples: 19084800 | consumed tokens: 39085670400 | elapsed time per iteration (s): 0.42 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.946581E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.707 | TFLOPs: 31.78 | +7: iteration 74560/ 173500 | consumed samples: 19087360 | consumed tokens: 39090913280 | elapsed time per iteration (s): 0.42 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.962324E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.850 | TFLOPs: 31.68 | +7: iteration 74570/ 173500 | consumed samples: 19089920 | consumed tokens: 39096156160 | elapsed time per iteration (s): 0.43 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.970613E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.004 | TFLOPs: 31.22 | +7: iteration 74580/ 173500 | consumed samples: 19092480 | consumed tokens: 39101399040 | elapsed time per iteration (s): 0.43 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.967197E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.494 | TFLOPs: 31.24 | +7: iteration 74590/ 173500 | consumed samples: 19095040 | consumed tokens: 39106641920 | elapsed time per iteration (s): 0.42 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.968423E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.362 | TFLOPs: 31.66 | +7: iteration 74600/ 173500 | consumed samples: 19097600 | consumed tokens: 39111884800 | elapsed time per iteration (s): 0.42 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.950745E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.620 | TFLOPs: 31.83 | +7: iteration 74610/ 173500 | consumed samples: 19100160 | consumed tokens: 39117127680 | elapsed time per iteration (s): 0.42 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.967516E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.924 | TFLOPs: 31.63 | +7: iteration 74620/ 173500 | consumed samples: 19102720 | consumed tokens: 39122370560 | elapsed time per iteration (s): 0.42 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.950138E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.425 | TFLOPs: 31.66 | +7: iteration 74630/ 173500 | consumed samples: 19105280 | consumed tokens: 39127613440 | elapsed time per iteration (s): 0.46 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.970194E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.459 | TFLOPs: 29.20 | +7: iteration 74640/ 173500 | consumed samples: 19107840 | consumed tokens: 39132856320 | elapsed time per iteration (s): 0.43 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.965596E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.556 | TFLOPs: 31.46 | +7: iteration 74650/ 173500 | consumed samples: 19110400 | consumed tokens: 39138099200 | elapsed time per iteration (s): 0.43 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.973663E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.179 | TFLOPs: 31.02 | +7: iteration 74660/ 173500 | consumed samples: 19112960 | consumed tokens: 39143342080 | elapsed time per iteration (s): 0.42 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.961268E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.844 | TFLOPs: 31.95 | +7: iteration 74670/ 173500 | consumed samples: 19115520 | consumed tokens: 39148584960 | elapsed time per iteration (s): 0.43 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.957285E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.409 | TFLOPs: 31.50 | +7: iteration 74680/ 173500 | consumed samples: 19118080 | consumed tokens: 39153827840 | elapsed time per iteration (s): 0.43 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.968415E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.097 | TFLOPs: 31.38 | +7: iteration 74690/ 173500 | consumed samples: 19120640 | consumed tokens: 39159070720 | elapsed time per iteration (s): 0.43 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.966991E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.631 | TFLOPs: 31.46 | +7: iteration 74700/ 173500 | consumed samples: 19123200 | consumed tokens: 39164313600 | elapsed time per iteration (s): 0.43 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.973059E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.359 | TFLOPs: 31.50 | +7: iteration 74710/ 173500 | consumed samples: 19125760 | consumed tokens: 39169556480 | elapsed time per iteration (s): 0.43 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.986091E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.981 | TFLOPs: 31.43 | +7: iteration 74720/ 173500 | consumed samples: 19128320 | consumed tokens: 39174799360 | elapsed time per iteration (s): 0.42 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.957802E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.794 | TFLOPs: 31.73 | +7: iteration 74730/ 173500 | consumed samples: 19130880 | consumed tokens: 39180042240 | elapsed time per iteration (s): 0.42 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.972906E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.617 | TFLOPs: 31.93 | +7: iteration 74740/ 173500 | consumed samples: 19133440 | consumed tokens: 39185285120 | elapsed time per iteration (s): 0.43 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.972323E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.771 | TFLOPs: 31.52 | +7: iteration 74750/ 173500 | consumed samples: 19136000 | consumed tokens: 39190528000 | elapsed time per iteration (s): 0.43 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.976484E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.576 | TFLOPs: 31.30 | +7: iteration 74760/ 173500 | consumed samples: 19138560 | consumed tokens: 39195770880 | elapsed time per iteration (s): 0.43 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.958746E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.539 | TFLOPs: 31.35 | +7: iteration 74770/ 173500 | consumed samples: 19141120 | consumed tokens: 39201013760 | elapsed time per iteration (s): 0.43 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.961450E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.530 | TFLOPs: 31.30 | +7: iteration 74780/ 173500 | consumed samples: 19143680 | consumed tokens: 39206256640 | elapsed time per iteration (s): 0.43 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.954195E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.405 | TFLOPs: 31.55 | +7: iteration 74790/ 173500 | consumed samples: 19146240 | consumed tokens: 39211499520 | elapsed time per iteration (s): 0.43 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.969426E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.297 | TFLOPs: 31.50 | +7: iteration 74800/ 173500 | consumed samples: 19148800 | consumed tokens: 39216742400 | elapsed time per iteration (s): 0.43 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.984281E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.881 | TFLOPs: 31.11 | +7: iteration 74810/ 173500 | consumed samples: 19151360 | consumed tokens: 39221985280 | elapsed time per iteration (s): 0.44 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.969181E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.734 | TFLOPs: 30.68 | +7: iteration 74820/ 173500 | consumed samples: 19153920 | consumed tokens: 39227228160 | elapsed time per iteration (s): 0.44 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.974661E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.663 | TFLOPs: 30.78 | +7: iteration 74830/ 173500 | consumed samples: 19156480 | consumed tokens: 39232471040 | elapsed time per iteration (s): 0.43 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.971228E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.217 | TFLOPs: 31.07 | +7: iteration 74840/ 173500 | consumed samples: 19159040 | consumed tokens: 39237713920 | elapsed time per iteration (s): 0.43 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.959556E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.371 | TFLOPs: 31.45 | +7: iteration 74850/ 173500 | consumed samples: 19161600 | consumed tokens: 39242956800 | elapsed time per iteration (s): 0.43 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.954430E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.124 | TFLOPs: 31.49 | +7: iteration 74860/ 173500 | consumed samples: 19164160 | consumed tokens: 39248199680 | elapsed time per iteration (s): 0.43 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.963821E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.260 | TFLOPs: 31.18 | +7: iteration 74870/ 173500 | consumed samples: 19166720 | consumed tokens: 39253442560 | elapsed time per iteration (s): 0.43 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.975594E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.978 | TFLOPs: 31.32 | +7: iteration 74880/ 173500 | consumed samples: 19169280 | consumed tokens: 39258685440 | elapsed time per iteration (s): 0.43 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.959543E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.962 | TFLOPs: 31.27 | +7: iteration 74890/ 173500 | consumed samples: 19171840 | consumed tokens: 39263928320 | elapsed time per iteration (s): 0.43 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.957582E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.694 | TFLOPs: 31.41 | +7: iteration 74900/ 173500 | consumed samples: 19174400 | consumed tokens: 39269171200 | elapsed time per iteration (s): 0.43 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.966036E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.315 | TFLOPs: 31.50 | +7: iteration 74910/ 173500 | consumed samples: 19176960 | consumed tokens: 39274414080 | elapsed time per iteration (s): 0.42 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.974339E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.395 | TFLOPs: 31.82 | +7: iteration 74920/ 173500 | consumed samples: 19179520 | consumed tokens: 39279656960 | elapsed time per iteration (s): 0.42 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.961966E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.069 | TFLOPs: 31.85 | +7: iteration 74930/ 173500 | consumed samples: 19182080 | consumed tokens: 39284899840 | elapsed time per iteration (s): 0.42 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.943429E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.080 | TFLOPs: 31.80 | +7: iteration 74940/ 173500 | consumed samples: 19184640 | consumed tokens: 39290142720 | elapsed time per iteration (s): 0.43 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.973777E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.590 | TFLOPs: 31.35 | +7: iteration 74950/ 173500 | consumed samples: 19187200 | consumed tokens: 39295385600 | elapsed time per iteration (s): 0.43 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.967712E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.295 | TFLOPs: 31.55 | +7: iteration 74960/ 173500 | consumed samples: 19189760 | consumed tokens: 39300628480 | elapsed time per iteration (s): 0.43 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.966585E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.161 | TFLOPs: 31.44 | +7: iteration 74970/ 173500 | consumed samples: 19192320 | consumed tokens: 39305871360 | elapsed time per iteration (s): 0.42 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.959800E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.377 | TFLOPs: 31.71 | +7: iteration 74980/ 173500 | consumed samples: 19194880 | consumed tokens: 39311114240 | elapsed time per iteration (s): 0.44 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.969782E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.633 | TFLOPs: 30.78 | +7: iteration 74990/ 173500 | consumed samples: 19197440 | consumed tokens: 39316357120 | elapsed time per iteration (s): 0.43 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.965388E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.987 | TFLOPs: 31.27 | +7: iteration 75000/ 173500 | consumed samples: 19200000 | consumed tokens: 39321600000 | elapsed time per iteration (s): 0.43 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.968470E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.438 | TFLOPs: 31.03 | +7: iteration 75010/ 173500 | consumed samples: 19202560 | consumed tokens: 39326842880 | elapsed time per iteration (s): 0.43 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.952481E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.111 | TFLOPs: 31.38 | +7: iteration 75020/ 173500 | consumed samples: 19205120 | consumed tokens: 39332085760 | elapsed time per iteration (s): 0.43 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.968173E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.664 | TFLOPs: 31.36 | +7: iteration 75030/ 173500 | consumed samples: 19207680 | consumed tokens: 39337328640 | elapsed time per iteration (s): 0.43 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.958335E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.251 | TFLOPs: 31.49 | +7: iteration 75040/ 173500 | consumed samples: 19210240 | consumed tokens: 39342571520 | elapsed time per iteration (s): 0.42 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.955278E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.851 | TFLOPs: 31.63 | +7: iteration 75050/ 173500 | consumed samples: 19212800 | consumed tokens: 39347814400 | elapsed time per iteration (s): 0.44 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.973485E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.402 | TFLOPs: 30.77 | +7: iteration 75060/ 173500 | consumed samples: 19215360 | consumed tokens: 39353057280 | elapsed time per iteration (s): 0.42 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.964401E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.353 | TFLOPs: 31.71 | +7: iteration 75070/ 173500 | consumed samples: 19217920 | consumed tokens: 39358300160 | elapsed time per iteration (s): 0.43 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.967618E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.892 | TFLOPs: 31.48 | +7: iteration 75080/ 173500 | consumed samples: 19220480 | consumed tokens: 39363543040 | elapsed time per iteration (s): 0.43 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.963004E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.981 | TFLOPs: 31.32 | +7: iteration 75090/ 173500 | consumed samples: 19223040 | consumed tokens: 39368785920 | elapsed time per iteration (s): 0.42 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.977251E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.300 | TFLOPs: 31.76 | +7: iteration 75100/ 173500 | consumed samples: 19225600 | consumed tokens: 39374028800 | elapsed time per iteration (s): 0.43 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.968702E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.289 | TFLOPs: 31.39 | +7: iteration 75110/ 173500 | consumed samples: 19228160 | consumed tokens: 39379271680 | elapsed time per iteration (s): 0.42 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.971141E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.249 | TFLOPs: 31.70 | +7: iteration 75120/ 173500 | consumed samples: 19230720 | consumed tokens: 39384514560 | elapsed time per iteration (s): 0.44 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.968870E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.164 | TFLOPs: 30.70 | +7: iteration 75130/ 173500 | consumed samples: 19233280 | consumed tokens: 39389757440 | elapsed time per iteration (s): 0.43 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.956528E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.448 | TFLOPs: 31.35 | +7: iteration 75140/ 173500 | consumed samples: 19235840 | consumed tokens: 39395000320 | elapsed time per iteration (s): 0.43 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.973896E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.854 | TFLOPs: 31.47 | +7: iteration 75150/ 173500 | consumed samples: 19238400 | consumed tokens: 39400243200 | elapsed time per iteration (s): 0.42 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.978810E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.279 | TFLOPs: 31.65 | +7: iteration 75160/ 173500 | consumed samples: 19240960 | consumed tokens: 39405486080 | elapsed time per iteration (s): 0.42 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.969945E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.928 | TFLOPs: 31.79 | +7: iteration 75170/ 173500 | consumed samples: 19243520 | consumed tokens: 39410728960 | elapsed time per iteration (s): 0.43 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.964746E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.474 | TFLOPs: 31.35 | +7: iteration 75180/ 173500 | consumed samples: 19246080 | consumed tokens: 39415971840 | elapsed time per iteration (s): 0.43 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.956981E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.471 | TFLOPs: 30.93 | +7: iteration 75190/ 173500 | consumed samples: 19248640 | consumed tokens: 39421214720 | elapsed time per iteration (s): 0.43 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.972599E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.333 | TFLOPs: 31.08 | +7: iteration 75200/ 173500 | consumed samples: 19251200 | consumed tokens: 39426457600 | elapsed time per iteration (s): 0.45 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.979703E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.940 | TFLOPs: 30.17 | +7: iteration 75210/ 173500 | consumed samples: 19253760 | consumed tokens: 39431700480 | elapsed time per iteration (s): 0.43 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.964372E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.314 | TFLOPs: 31.13 | +7: iteration 75220/ 173500 | consumed samples: 19256320 | consumed tokens: 39436943360 | elapsed time per iteration (s): 0.43 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.949390E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.782 | TFLOPs: 31.57 | +7: iteration 75230/ 173500 | consumed samples: 19258880 | consumed tokens: 39442186240 | elapsed time per iteration (s): 0.43 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.953293E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.863 | TFLOPs: 31.58 | +7: iteration 75240/ 173500 | consumed samples: 19261440 | consumed tokens: 39447429120 | elapsed time per iteration (s): 0.43 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.962973E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.451 | TFLOPs: 31.29 | +7: iteration 75250/ 173500 | consumed samples: 19264000 | consumed tokens: 39452672000 | elapsed time per iteration (s): 0.42 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.957213E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.185 | TFLOPs: 31.81 | +7: iteration 75260/ 173500 | consumed samples: 19266560 | consumed tokens: 39457914880 | elapsed time per iteration (s): 0.43 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.963107E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.286 | TFLOPs: 31.18 | +7: iteration 75270/ 173500 | consumed samples: 19269120 | consumed tokens: 39463157760 | elapsed time per iteration (s): 0.43 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.977507E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.466 | TFLOPs: 31.03 | +7: iteration 75280/ 173500 | consumed samples: 19271680 | consumed tokens: 39468400640 | elapsed time per iteration (s): 0.43 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.946246E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.689 | TFLOPs: 31.15 | +7: iteration 75290/ 173500 | consumed samples: 19274240 | consumed tokens: 39473643520 | elapsed time per iteration (s): 0.42 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.945323E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.267 | TFLOPs: 31.65 | +7: iteration 75300/ 173500 | consumed samples: 19276800 | consumed tokens: 39478886400 | elapsed time per iteration (s): 0.42 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.973201E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.296 | TFLOPs: 31.81 | +7: iteration 75310/ 173500 | consumed samples: 19279360 | consumed tokens: 39484129280 | elapsed time per iteration (s): 0.43 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.957179E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.278 | TFLOPs: 31.44 | +7: iteration 75320/ 173500 | consumed samples: 19281920 | consumed tokens: 39489372160 | elapsed time per iteration (s): 0.42 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.965149E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.537 | TFLOPs: 31.61 | +7: iteration 75330/ 173500 | consumed samples: 19284480 | consumed tokens: 39494615040 | elapsed time per iteration (s): 0.44 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.975742E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.419 | TFLOPs: 30.72 | +7: iteration 75340/ 173500 | consumed samples: 19287040 | consumed tokens: 39499857920 | elapsed time per iteration (s): 0.42 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.962696E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.784 | TFLOPs: 31.63 | +7: iteration 75350/ 173500 | consumed samples: 19289600 | consumed tokens: 39505100800 | elapsed time per iteration (s): 0.43 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.971431E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.644 | TFLOPs: 31.15 | +7: iteration 75360/ 173500 | consumed samples: 19292160 | consumed tokens: 39510343680 | elapsed time per iteration (s): 0.42 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.968960E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.225 | TFLOPs: 31.81 | +7: iteration 75370/ 173500 | consumed samples: 19294720 | consumed tokens: 39515586560 | elapsed time per iteration (s): 0.42 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.964143E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.511 | TFLOPs: 31.61 | +7: iteration 75380/ 173500 | consumed samples: 19297280 | consumed tokens: 39520829440 | elapsed time per iteration (s): 0.42 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.962900E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.580 | TFLOPs: 31.72 | +7: iteration 75390/ 173500 | consumed samples: 19299840 | consumed tokens: 39526072320 | elapsed time per iteration (s): 0.43 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.935863E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.433 | TFLOPs: 31.03 | +7: iteration 75400/ 173500 | consumed samples: 19302400 | consumed tokens: 39531315200 | elapsed time per iteration (s): 0.43 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.957395E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.273 | TFLOPs: 31.29 | +7: iteration 75410/ 173500 | consumed samples: 19304960 | consumed tokens: 39536558080 | elapsed time per iteration (s): 0.42 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.972540E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.530 | TFLOPs: 31.67 | +7: iteration 75420/ 173500 | consumed samples: 19307520 | consumed tokens: 39541800960 | elapsed time per iteration (s): 0.43 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.964724E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.852 | TFLOPs: 31.21 | +7: iteration 75430/ 173500 | consumed samples: 19310080 | consumed tokens: 39547043840 | elapsed time per iteration (s): 0.43 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.965832E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.786 | TFLOPs: 31.31 | +7: iteration 75440/ 173500 | consumed samples: 19312640 | consumed tokens: 39552286720 | elapsed time per iteration (s): 0.42 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.972062E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.727 | TFLOPs: 31.68 | +7: iteration 75450/ 173500 | consumed samples: 19315200 | consumed tokens: 39557529600 | elapsed time per iteration (s): 0.44 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.965604E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.504 | TFLOPs: 30.83 | +7: iteration 75460/ 173500 | consumed samples: 19317760 | consumed tokens: 39562772480 | elapsed time per iteration (s): 0.42 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.952299E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.955 | TFLOPs: 31.69 | +7: iteration 75470/ 173500 | consumed samples: 19320320 | consumed tokens: 39568015360 | elapsed time per iteration (s): 0.43 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.961773E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.547 | TFLOPs: 31.51 | +7: iteration 75480/ 173500 | consumed samples: 19322880 | consumed tokens: 39573258240 | elapsed time per iteration (s): 0.44 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.955623E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.054 | TFLOPs: 30.54 | +7: iteration 75490/ 173500 | consumed samples: 19325440 | consumed tokens: 39578501120 | elapsed time per iteration (s): 0.43 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.962012E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.063 | TFLOPs: 31.59 | +7: iteration 75500/ 173500 | consumed samples: 19328000 | consumed tokens: 39583744000 | elapsed time per iteration (s): 0.42 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.961324E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.482 | TFLOPs: 31.77 | +7: iteration 75510/ 173500 | consumed samples: 19330560 | consumed tokens: 39588986880 | elapsed time per iteration (s): 0.43 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.960720E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.664 | TFLOPs: 31.20 | +7: iteration 75520/ 173500 | consumed samples: 19333120 | consumed tokens: 39594229760 | elapsed time per iteration (s): 0.44 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.953028E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.598 | TFLOPs: 30.46 | +7: iteration 75530/ 173500 | consumed samples: 19335680 | consumed tokens: 39599472640 | elapsed time per iteration (s): 0.43 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.958869E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.766 | TFLOPs: 31.36 | +7: iteration 75540/ 173500 | consumed samples: 19338240 | consumed tokens: 39604715520 | elapsed time per iteration (s): 0.42 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.959480E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.080 | TFLOPs: 31.75 | +7: iteration 75550/ 173500 | consumed samples: 19340800 | consumed tokens: 39609958400 | elapsed time per iteration (s): 0.42 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.970283E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.910 | TFLOPs: 31.69 | +7: iteration 75560/ 173500 | consumed samples: 19343360 | consumed tokens: 39615201280 | elapsed time per iteration (s): 0.42 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.972205E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.646 | TFLOPs: 31.67 | +7: iteration 75570/ 173500 | consumed samples: 19345920 | consumed tokens: 39620444160 | elapsed time per iteration (s): 0.43 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.951190E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.599 | TFLOPs: 31.36 | +7: iteration 75580/ 173500 | consumed samples: 19348480 | consumed tokens: 39625687040 | elapsed time per iteration (s): 0.42 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.959376E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.311 | TFLOPs: 31.76 | +7: iteration 75590/ 173500 | consumed samples: 19351040 | consumed tokens: 39630929920 | elapsed time per iteration (s): 0.43 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.961677E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.920 | TFLOPs: 31.27 | +7: iteration 75600/ 173500 | consumed samples: 19353600 | consumed tokens: 39636172800 | elapsed time per iteration (s): 0.42 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.977473E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.164 | TFLOPs: 31.80 | +7: iteration 75610/ 173500 | consumed samples: 19356160 | consumed tokens: 39641415680 | elapsed time per iteration (s): 0.43 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.962230E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.062 | TFLOPs: 31.06 | +7: iteration 75620/ 173500 | consumed samples: 19358720 | consumed tokens: 39646658560 | elapsed time per iteration (s): 0.42 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.972565E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.427 | TFLOPs: 31.92 | +7: iteration 75630/ 173500 | consumed samples: 19361280 | consumed tokens: 39651901440 | elapsed time per iteration (s): 0.44 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.966526E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.541 | TFLOPs: 30.83 | +7: iteration 75640/ 173500 | consumed samples: 19363840 | consumed tokens: 39657144320 | elapsed time per iteration (s): 0.43 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.962865E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.907 | TFLOPs: 31.11 | +7: iteration 75650/ 173500 | consumed samples: 19366400 | consumed tokens: 39662387200 | elapsed time per iteration (s): 0.42 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.966202E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.350 | TFLOPs: 31.97 | +7: iteration 75660/ 173500 | consumed samples: 19368960 | consumed tokens: 39667630080 | elapsed time per iteration (s): 0.44 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.972395E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.887 | TFLOPs: 30.74 | +7: iteration 75670/ 173500 | consumed samples: 19371520 | consumed tokens: 39672872960 | elapsed time per iteration (s): 0.42 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.973517E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.701 | TFLOPs: 31.78 | +7: iteration 75680/ 173500 | consumed samples: 19374080 | consumed tokens: 39678115840 | elapsed time per iteration (s): 0.43 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.962951E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.649 | TFLOPs: 31.04 | +7: iteration 75690/ 173500 | consumed samples: 19376640 | consumed tokens: 39683358720 | elapsed time per iteration (s): 0.43 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.966132E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.188 | TFLOPs: 31.39 | +7: iteration 75700/ 173500 | consumed samples: 19379200 | consumed tokens: 39688601600 | elapsed time per iteration (s): 0.42 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.971875E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.153 | TFLOPs: 31.75 | +7: iteration 75710/ 173500 | consumed samples: 19381760 | consumed tokens: 39693844480 | elapsed time per iteration (s): 0.43 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.960498E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.532 | TFLOPs: 31.19 | +7: iteration 75720/ 173500 | consumed samples: 19384320 | consumed tokens: 39699087360 | elapsed time per iteration (s): 0.43 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.963751E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.216 | TFLOPs: 31.44 | +7: iteration 75730/ 173500 | consumed samples: 19386880 | consumed tokens: 39704330240 | elapsed time per iteration (s): 0.42 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.975956E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.756 | TFLOPs: 31.94 | +7: iteration 75740/ 173500 | consumed samples: 19389440 | consumed tokens: 39709573120 | elapsed time per iteration (s): 0.43 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.967704E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.630 | TFLOPs: 31.30 | +7: iteration 75750/ 173500 | consumed samples: 19392000 | consumed tokens: 39714816000 | elapsed time per iteration (s): 0.43 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.951902E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.506 | TFLOPs: 31.46 | +7: iteration 75760/ 173500 | consumed samples: 19394560 | consumed tokens: 39720058880 | elapsed time per iteration (s): 0.42 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.961452E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.251 | TFLOPs: 31.97 | +7: iteration 75770/ 173500 | consumed samples: 19397120 | consumed tokens: 39725301760 | elapsed time per iteration (s): 0.43 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.973959E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.853 | TFLOPs: 31.32 | +7: iteration 75780/ 173500 | consumed samples: 19399680 | consumed tokens: 39730544640 | elapsed time per iteration (s): 0.42 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.977416E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.015 | TFLOPs: 31.64 | +7: iteration 75790/ 173500 | consumed samples: 19402240 | consumed tokens: 39735787520 | elapsed time per iteration (s): 0.43 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.966124E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.496 | TFLOPs: 31.14 | +7: iteration 75800/ 173500 | consumed samples: 19404800 | consumed tokens: 39741030400 | elapsed time per iteration (s): 0.43 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.972851E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.674 | TFLOPs: 31.46 | +7: iteration 75810/ 173500 | consumed samples: 19407360 | consumed tokens: 39746273280 | elapsed time per iteration (s): 0.43 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.976945E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.166 | TFLOPs: 31.44 | +7: iteration 75820/ 173500 | consumed samples: 19409920 | consumed tokens: 39751516160 | elapsed time per iteration (s): 0.43 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.950873E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.926 | TFLOPs: 31.32 | +7: iteration 75830/ 173500 | consumed samples: 19412480 | consumed tokens: 39756759040 | elapsed time per iteration (s): 0.43 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.966527E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.912 | TFLOPs: 31.48 | +7: iteration 75840/ 173500 | consumed samples: 19415040 | consumed tokens: 39762001920 | elapsed time per iteration (s): 0.42 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.961771E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.549 | TFLOPs: 31.67 | +7: iteration 75850/ 173500 | consumed samples: 19417600 | consumed tokens: 39767244800 | elapsed time per iteration (s): 0.43 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.939675E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.795 | TFLOPs: 31.58 | +7: iteration 75860/ 173500 | consumed samples: 19420160 | consumed tokens: 39772487680 | elapsed time per iteration (s): 0.43 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.962862E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.846 | TFLOPs: 31.47 | +7: iteration 75870/ 173500 | consumed samples: 19422720 | consumed tokens: 39777730560 | elapsed time per iteration (s): 0.43 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.959658E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.863 | TFLOPs: 31.42 | +7: iteration 75880/ 173500 | consumed samples: 19425280 | consumed tokens: 39782973440 | elapsed time per iteration (s): 0.43 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.962644E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.928 | TFLOPs: 31.21 | +7: iteration 75890/ 173500 | consumed samples: 19427840 | consumed tokens: 39788216320 | elapsed time per iteration (s): 0.43 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.960464E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.482 | TFLOPs: 31.19 | +7: iteration 75900/ 173500 | consumed samples: 19430400 | consumed tokens: 39793459200 | elapsed time per iteration (s): 0.42 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.963325E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.992 | TFLOPs: 31.80 | +7: iteration 75910/ 173500 | consumed samples: 19432960 | consumed tokens: 39798702080 | elapsed time per iteration (s): 0.43 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.960115E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.436 | TFLOPs: 31.40 | +7: iteration 75920/ 173500 | consumed samples: 19435520 | consumed tokens: 39803944960 | elapsed time per iteration (s): 0.43 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.952663E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.029 | TFLOPs: 31.06 | +7: iteration 75930/ 173500 | consumed samples: 19438080 | consumed tokens: 39809187840 | elapsed time per iteration (s): 0.43 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.975204E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.107 | TFLOPs: 31.49 | +7: iteration 75940/ 173500 | consumed samples: 19440640 | consumed tokens: 39814430720 | elapsed time per iteration (s): 0.43 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.954828E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.858 | TFLOPs: 31.53 | +7: iteration 75950/ 173500 | consumed samples: 19443200 | consumed tokens: 39819673600 | elapsed time per iteration (s): 0.43 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.967313E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.782 | TFLOPs: 31.36 | +7: iteration 75960/ 173500 | consumed samples: 19445760 | consumed tokens: 39824916480 | elapsed time per iteration (s): 0.43 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.961188E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.683 | TFLOPs: 31.10 | +7: iteration 75970/ 173500 | consumed samples: 19448320 | consumed tokens: 39830159360 | elapsed time per iteration (s): 0.43 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.965705E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.643 | TFLOPs: 31.36 | +7: iteration 75980/ 173500 | consumed samples: 19450880 | consumed tokens: 39835402240 | elapsed time per iteration (s): 0.43 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.963112E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.902 | TFLOPs: 31.27 | +7: iteration 75990/ 173500 | consumed samples: 19453440 | consumed tokens: 39840645120 | elapsed time per iteration (s): 0.43 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.965889E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.758 | TFLOPs: 31.21 | +0: [2023-03-17 08:12:28,875] [INFO] [logging.py:68:log_dist] [Rank 0] step=76000, skipped=0, lr=[0.0001289804445403464, 0.0001289804445403464, 0.0001289804445403464], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 76000/ 173500 | consumed samples: 19456000 | consumed tokens: 39845888000 | elapsed time per iteration (s): 0.43 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.951787E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.469 | TFLOPs: 31.51 | +0: steps: 76000 loss: 2.9526 iter time (s): 0.425 samples/sec: 601.950 +7: iteration 76010/ 173500 | consumed samples: 19458560 | consumed tokens: 39851130880 | elapsed time per iteration (s): 0.43 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.973415E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.562 | TFLOPs: 31.14 | +7: iteration 76020/ 173500 | consumed samples: 19461120 | consumed tokens: 39856373760 | elapsed time per iteration (s): 0.43 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.955687E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.382 | TFLOPs: 31.55 | +7: iteration 76030/ 173500 | consumed samples: 19463680 | consumed tokens: 39861616640 | elapsed time per iteration (s): 0.43 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.966046E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.113 | TFLOPs: 31.28 | +7: iteration 76040/ 173500 | consumed samples: 19466240 | consumed tokens: 39866859520 | elapsed time per iteration (s): 0.43 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.964127E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.671 | TFLOPs: 31.20 | +7: iteration 76050/ 173500 | consumed samples: 19468800 | consumed tokens: 39872102400 | elapsed time per iteration (s): 0.43 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.953850E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.380 | TFLOPs: 31.50 | +7: iteration 76060/ 173500 | consumed samples: 19471360 | consumed tokens: 39877345280 | elapsed time per iteration (s): 0.43 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.957881E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.758 | TFLOPs: 31.52 | +7: iteration 76070/ 173500 | consumed samples: 19473920 | consumed tokens: 39882588160 | elapsed time per iteration (s): 0.43 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.949345E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.407 | TFLOPs: 31.55 | +7: iteration 76080/ 173500 | consumed samples: 19476480 | consumed tokens: 39887831040 | elapsed time per iteration (s): 0.43 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.949469E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.716 | TFLOPs: 30.89 | +7: iteration 76090/ 173500 | consumed samples: 19479040 | consumed tokens: 39893073920 | elapsed time per iteration (s): 0.42 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.949836E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.140 | TFLOPs: 31.70 | +7: iteration 76100/ 173500 | consumed samples: 19481600 | consumed tokens: 39898316800 | elapsed time per iteration (s): 0.42 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.958110E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.079 | TFLOPs: 31.70 | +7: iteration 76110/ 173500 | consumed samples: 19484160 | consumed tokens: 39903559680 | elapsed time per iteration (s): 0.43 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.972520E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.666 | TFLOPs: 31.46 | +7: iteration 76120/ 173500 | consumed samples: 19486720 | consumed tokens: 39908802560 | elapsed time per iteration (s): 0.42 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.964650E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.581 | TFLOPs: 31.67 | +7: iteration 76130/ 173500 | consumed samples: 19489280 | consumed tokens: 39914045440 | elapsed time per iteration (s): 0.44 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.965096E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.015 | TFLOPs: 30.75 | +7: iteration 76140/ 173500 | consumed samples: 19491840 | consumed tokens: 39919288320 | elapsed time per iteration (s): 0.43 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.965129E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.100 | TFLOPs: 31.43 | +7: iteration 76150/ 173500 | consumed samples: 19494400 | consumed tokens: 39924531200 | elapsed time per iteration (s): 0.43 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.968722E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.282 | TFLOPs: 31.60 | +7: iteration 76160/ 173500 | consumed samples: 19496960 | consumed tokens: 39929774080 | elapsed time per iteration (s): 0.43 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.955401E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.303 | TFLOPs: 30.97 | +7: iteration 76170/ 173500 | consumed samples: 19499520 | consumed tokens: 39935016960 | elapsed time per iteration (s): 0.42 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.967281E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.180 | TFLOPs: 31.86 | +7: iteration 76180/ 173500 | consumed samples: 19502080 | consumed tokens: 39940259840 | elapsed time per iteration (s): 0.43 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.961822E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.012 | TFLOPs: 31.53 | +7: iteration 76190/ 173500 | consumed samples: 19504640 | consumed tokens: 39945502720 | elapsed time per iteration (s): 0.42 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.950888E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.781 | TFLOPs: 31.68 | +7: iteration 76200/ 173500 | consumed samples: 19507200 | consumed tokens: 39950745600 | elapsed time per iteration (s): 0.43 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.958570E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.765 | TFLOPs: 31.52 | +7: iteration 76210/ 173500 | consumed samples: 19509760 | consumed tokens: 39955988480 | elapsed time per iteration (s): 0.42 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.967219E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.923 | TFLOPs: 31.79 | +7: iteration 76220/ 173500 | consumed samples: 19512320 | consumed tokens: 39961231360 | elapsed time per iteration (s): 0.43 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.964262E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.538 | TFLOPs: 31.40 | +7: iteration 76230/ 173500 | consumed samples: 19514880 | consumed tokens: 39966474240 | elapsed time per iteration (s): 0.43 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.949473E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.946 | TFLOPs: 31.48 | +7: iteration 76240/ 173500 | consumed samples: 19517440 | consumed tokens: 39971717120 | elapsed time per iteration (s): 0.42 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.955636E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.956 | TFLOPs: 31.69 | +7: iteration 76250/ 173500 | consumed samples: 19520000 | consumed tokens: 39976960000 | elapsed time per iteration (s): 0.43 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.958978E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.037 | TFLOPs: 31.22 | +7: iteration 76260/ 173500 | consumed samples: 19522560 | consumed tokens: 39982202880 | elapsed time per iteration (s): 0.42 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.973266E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.640 | TFLOPs: 31.93 | +7: iteration 76270/ 173500 | consumed samples: 19525120 | consumed tokens: 39987445760 | elapsed time per iteration (s): 0.45 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.971900E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.991 | TFLOPs: 29.91 | +7: iteration 76280/ 173500 | consumed samples: 19527680 | consumed tokens: 39992688640 | elapsed time per iteration (s): 0.45 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.963677E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.697 | TFLOPs: 30.10 | +7: iteration 76290/ 173500 | consumed samples: 19530240 | consumed tokens: 39997931520 | elapsed time per iteration (s): 0.44 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.968494E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.538 | TFLOPs: 30.25 | +7: iteration 76300/ 173500 | consumed samples: 19532800 | consumed tokens: 40003174400 | elapsed time per iteration (s): 0.46 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.967740E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.975 | TFLOPs: 29.43 | +7: iteration 76310/ 173500 | consumed samples: 19535360 | consumed tokens: 40008417280 | elapsed time per iteration (s): 0.42 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.951464E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.498 | TFLOPs: 32.08 | +7: iteration 76320/ 173500 | consumed samples: 19537920 | consumed tokens: 40013660160 | elapsed time per iteration (s): 0.44 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.965373E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.806 | TFLOPs: 30.63 | +7: iteration 76330/ 173500 | consumed samples: 19540480 | consumed tokens: 40018903040 | elapsed time per iteration (s): 0.43 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.954710E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.533 | TFLOPs: 31.14 | +7: iteration 76340/ 173500 | consumed samples: 19543040 | consumed tokens: 40024145920 | elapsed time per iteration (s): 0.45 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.960625E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.855 | TFLOPs: 29.85 | +7: iteration 76350/ 173500 | consumed samples: 19545600 | consumed tokens: 40029388800 | elapsed time per iteration (s): 0.42 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.956028E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.363 | TFLOPs: 31.81 | +7: iteration 76360/ 173500 | consumed samples: 19548160 | consumed tokens: 40034631680 | elapsed time per iteration (s): 0.45 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.963540E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.543 | TFLOPs: 30.09 | +7: iteration 76370/ 173500 | consumed samples: 19550720 | consumed tokens: 40039874560 | elapsed time per iteration (s): 0.44 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.958451E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.927 | TFLOPs: 30.43 | +7: iteration 76380/ 173500 | consumed samples: 19553280 | consumed tokens: 40045117440 | elapsed time per iteration (s): 0.45 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.972092E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.013 | TFLOPs: 30.17 | +7: iteration 76390/ 173500 | consumed samples: 19555840 | consumed tokens: 40050360320 | elapsed time per iteration (s): 0.46 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.952174E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.094 | TFLOPs: 29.07 | +7: iteration 76400/ 173500 | consumed samples: 19558400 | consumed tokens: 40055603200 | elapsed time per iteration (s): 0.45 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.954729E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.273 | TFLOPs: 29.82 | +7: iteration 76410/ 173500 | consumed samples: 19560960 | consumed tokens: 40060846080 | elapsed time per iteration (s): 0.44 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.958500E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.470 | TFLOPs: 30.46 | +7: iteration 76420/ 173500 | consumed samples: 19563520 | consumed tokens: 40066088960 | elapsed time per iteration (s): 0.45 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.958427E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.778 | TFLOPs: 29.95 | +7: iteration 76430/ 173500 | consumed samples: 19566080 | consumed tokens: 40071331840 | elapsed time per iteration (s): 0.45 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.952494E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.791 | TFLOPs: 29.63 | +7: iteration 76440/ 173500 | consumed samples: 19568640 | consumed tokens: 40076574720 | elapsed time per iteration (s): 0.49 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.958527E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 525.588 | TFLOPs: 27.58 | +7: iteration 76450/ 173500 | consumed samples: 19571200 | consumed tokens: 40081817600 | elapsed time per iteration (s): 0.45 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.966632E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.662 | TFLOPs: 29.63 | +7: iteration 76460/ 173500 | consumed samples: 19573760 | consumed tokens: 40087060480 | elapsed time per iteration (s): 0.45 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.973483E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.572 | TFLOPs: 29.99 | +7: iteration 76470/ 173500 | consumed samples: 19576320 | consumed tokens: 40092303360 | elapsed time per iteration (s): 0.43 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.966122E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.670 | TFLOPs: 31.36 | +7: iteration 76480/ 173500 | consumed samples: 19578880 | consumed tokens: 40097546240 | elapsed time per iteration (s): 0.42 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.951205E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.470 | TFLOPs: 32.08 | +7: iteration 76490/ 173500 | consumed samples: 19581440 | consumed tokens: 40102789120 | elapsed time per iteration (s): 0.42 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.979263E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.034 | TFLOPs: 31.75 | +7: iteration 76500/ 173500 | consumed samples: 19584000 | consumed tokens: 40108032000 | elapsed time per iteration (s): 0.43 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.967438E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.903 | TFLOPs: 31.42 | +7: iteration 76510/ 173500 | consumed samples: 19586560 | consumed tokens: 40113274880 | elapsed time per iteration (s): 0.43 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.960552E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.306 | TFLOPs: 31.50 | +7: iteration 76520/ 173500 | consumed samples: 19589120 | consumed tokens: 40118517760 | elapsed time per iteration (s): 0.43 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.953868E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.582 | TFLOPs: 31.46 | +7: iteration 76530/ 173500 | consumed samples: 19591680 | consumed tokens: 40123760640 | elapsed time per iteration (s): 0.42 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.960336E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.050 | TFLOPs: 31.96 | +7: iteration 76540/ 173500 | consumed samples: 19594240 | consumed tokens: 40129003520 | elapsed time per iteration (s): 0.43 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.961099E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.776 | TFLOPs: 31.42 | +7: iteration 76550/ 173500 | consumed samples: 19596800 | consumed tokens: 40134246400 | elapsed time per iteration (s): 0.43 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.954449E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.591 | TFLOPs: 31.35 | +7: iteration 76560/ 173500 | consumed samples: 19599360 | consumed tokens: 40139489280 | elapsed time per iteration (s): 0.42 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.960243E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.460 | TFLOPs: 31.98 | +7: iteration 76570/ 173500 | consumed samples: 19601920 | consumed tokens: 40144732160 | elapsed time per iteration (s): 0.43 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.956355E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.752 | TFLOPs: 31.05 | +7: iteration 76580/ 173500 | consumed samples: 19604480 | consumed tokens: 40149975040 | elapsed time per iteration (s): 0.42 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.962887E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.579 | TFLOPs: 31.77 | +7: iteration 76590/ 173500 | consumed samples: 19607040 | consumed tokens: 40155217920 | elapsed time per iteration (s): 0.43 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.956512E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.554 | TFLOPs: 31.20 | +7: iteration 76600/ 173500 | consumed samples: 19609600 | consumed tokens: 40160460800 | elapsed time per iteration (s): 0.43 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.964437E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.873 | TFLOPs: 31.21 | +7: iteration 76610/ 173500 | consumed samples: 19612160 | consumed tokens: 40165703680 | elapsed time per iteration (s): 0.43 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.969933E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.719 | TFLOPs: 31.57 | +7: iteration 76620/ 173500 | consumed samples: 19614720 | consumed tokens: 40170946560 | elapsed time per iteration (s): 0.43 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.954798E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.257 | TFLOPs: 31.28 | +7: iteration 76630/ 173500 | consumed samples: 19617280 | consumed tokens: 40176189440 | elapsed time per iteration (s): 0.43 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.977120E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.468 | TFLOPs: 31.51 | +7: iteration 76640/ 173500 | consumed samples: 19619840 | consumed tokens: 40181432320 | elapsed time per iteration (s): 0.42 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.957217E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.336 | TFLOPs: 31.76 | +7: iteration 76650/ 173500 | consumed samples: 19622400 | consumed tokens: 40186675200 | elapsed time per iteration (s): 0.42 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.950507E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.367 | TFLOPs: 31.76 | +7: iteration 76660/ 173500 | consumed samples: 19624960 | consumed tokens: 40191918080 | elapsed time per iteration (s): 0.42 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.970860E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.515 | TFLOPs: 31.72 | +7: iteration 76670/ 173500 | consumed samples: 19627520 | consumed tokens: 40197160960 | elapsed time per iteration (s): 0.42 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.966004E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.055 | TFLOPs: 31.96 | +7: iteration 76680/ 173500 | consumed samples: 19630080 | consumed tokens: 40202403840 | elapsed time per iteration (s): 0.42 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.971326E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.212 | TFLOPs: 31.65 | +7: iteration 76690/ 173500 | consumed samples: 19632640 | consumed tokens: 40207646720 | elapsed time per iteration (s): 0.43 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.957930E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.526 | TFLOPs: 31.14 | +7: iteration 76700/ 173500 | consumed samples: 19635200 | consumed tokens: 40212889600 | elapsed time per iteration (s): 0.43 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.959630E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.372 | TFLOPs: 31.40 | +7: iteration 76710/ 173500 | consumed samples: 19637760 | consumed tokens: 40218132480 | elapsed time per iteration (s): 0.42 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.956223E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.961 | TFLOPs: 31.74 | +7: iteration 76720/ 173500 | consumed samples: 19640320 | consumed tokens: 40223375360 | elapsed time per iteration (s): 0.42 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.957774E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.261 | TFLOPs: 31.65 | +7: iteration 76730/ 173500 | consumed samples: 19642880 | consumed tokens: 40228618240 | elapsed time per iteration (s): 0.43 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.964353E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.352 | TFLOPs: 31.45 | +7: iteration 76740/ 173500 | consumed samples: 19645440 | consumed tokens: 40233861120 | elapsed time per iteration (s): 0.44 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.963462E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.384 | TFLOPs: 30.82 | +7: iteration 76750/ 173500 | consumed samples: 19648000 | consumed tokens: 40239104000 | elapsed time per iteration (s): 0.42 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.971154E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.190 | TFLOPs: 31.91 | +7: iteration 76760/ 173500 | consumed samples: 19650560 | consumed tokens: 40244346880 | elapsed time per iteration (s): 0.42 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.966858E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.361 | TFLOPs: 31.60 | +7: iteration 76770/ 173500 | consumed samples: 19653120 | consumed tokens: 40249589760 | elapsed time per iteration (s): 0.43 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.973958E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.212 | TFLOPs: 31.39 | +7: iteration 76780/ 173500 | consumed samples: 19655680 | consumed tokens: 40254832640 | elapsed time per iteration (s): 0.42 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.961064E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.117 | TFLOPs: 31.80 | +7: iteration 76790/ 173500 | consumed samples: 19658240 | consumed tokens: 40260075520 | elapsed time per iteration (s): 0.42 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.975812E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.830 | TFLOPs: 31.68 | +7: iteration 76800/ 173500 | consumed samples: 19660800 | consumed tokens: 40265318400 | elapsed time per iteration (s): 0.42 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.948511E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.373 | TFLOPs: 31.71 | +7: iteration 76810/ 173500 | consumed samples: 19663360 | consumed tokens: 40270561280 | elapsed time per iteration (s): 0.43 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.964861E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.292 | TFLOPs: 31.60 | +7: iteration 76820/ 173500 | consumed samples: 19665920 | consumed tokens: 40275804160 | elapsed time per iteration (s): 0.42 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.952159E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.667 | TFLOPs: 31.73 | +7: iteration 76830/ 173500 | consumed samples: 19668480 | consumed tokens: 40281047040 | elapsed time per iteration (s): 0.43 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.951427E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.848 | TFLOPs: 31.37 | +7: iteration 76840/ 173500 | consumed samples: 19671040 | consumed tokens: 40286289920 | elapsed time per iteration (s): 0.42 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.969923E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.579 | TFLOPs: 31.93 | +7: iteration 76850/ 173500 | consumed samples: 19673600 | consumed tokens: 40291532800 | elapsed time per iteration (s): 0.43 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.957784E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.206 | TFLOPs: 31.33 | +7: iteration 76860/ 173500 | consumed samples: 19676160 | consumed tokens: 40296775680 | elapsed time per iteration (s): 0.42 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.958467E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.284 | TFLOPs: 31.97 | +7: iteration 76870/ 173500 | consumed samples: 19678720 | consumed tokens: 40302018560 | elapsed time per iteration (s): 0.43 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.970763E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.673 | TFLOPs: 31.46 | +7: iteration 76880/ 173500 | consumed samples: 19681280 | consumed tokens: 40307261440 | elapsed time per iteration (s): 0.42 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.953721E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.428 | TFLOPs: 31.61 | +7: iteration 76890/ 173500 | consumed samples: 19683840 | consumed tokens: 40312504320 | elapsed time per iteration (s): 0.43 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.950645E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.140 | TFLOPs: 31.49 | +7: iteration 76900/ 173500 | consumed samples: 19686400 | consumed tokens: 40317747200 | elapsed time per iteration (s): 0.42 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.965897E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.867 | TFLOPs: 31.95 | +7: iteration 76910/ 173500 | consumed samples: 19688960 | consumed tokens: 40322990080 | elapsed time per iteration (s): 0.43 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.948233E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.236 | TFLOPs: 31.55 | +7: iteration 76920/ 173500 | consumed samples: 19691520 | consumed tokens: 40328232960 | elapsed time per iteration (s): 0.43 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.966988E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.970 | TFLOPs: 31.53 | +7: iteration 76930/ 173500 | consumed samples: 19694080 | consumed tokens: 40333475840 | elapsed time per iteration (s): 0.42 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.961756E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.184 | TFLOPs: 31.75 | +7: iteration 76940/ 173500 | consumed samples: 19696640 | consumed tokens: 40338718720 | elapsed time per iteration (s): 0.43 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.965190E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.907 | TFLOPs: 31.06 | +7: iteration 76950/ 173500 | consumed samples: 19699200 | consumed tokens: 40343961600 | elapsed time per iteration (s): 0.42 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.974832E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.685 | TFLOPs: 31.73 | +7: iteration 76960/ 173500 | consumed samples: 19701760 | consumed tokens: 40349204480 | elapsed time per iteration (s): 0.42 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.949967E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.424 | TFLOPs: 31.77 | +7: iteration 76970/ 173500 | consumed samples: 19704320 | consumed tokens: 40354447360 | elapsed time per iteration (s): 0.42 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.965379E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.216 | TFLOPs: 31.70 | +7: iteration 76980/ 173500 | consumed samples: 19706880 | consumed tokens: 40359690240 | elapsed time per iteration (s): 0.42 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.958494E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.121 | TFLOPs: 31.80 | +7: iteration 76990/ 173500 | consumed samples: 19709440 | consumed tokens: 40364933120 | elapsed time per iteration (s): 0.42 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.962576E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.674 | TFLOPs: 31.78 | +7: iteration 77000/ 173500 | consumed samples: 19712000 | consumed tokens: 40370176000 | elapsed time per iteration (s): 0.42 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.959067E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.123 | TFLOPs: 31.70 | +7: iteration 77010/ 173500 | consumed samples: 19714560 | consumed tokens: 40375418880 | elapsed time per iteration (s): 0.42 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.946615E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.685 | TFLOPs: 31.94 | +7: iteration 77020/ 173500 | consumed samples: 19717120 | consumed tokens: 40380661760 | elapsed time per iteration (s): 0.43 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.962943E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.676 | TFLOPs: 31.46 | +7: iteration 77030/ 173500 | consumed samples: 19719680 | consumed tokens: 40385904640 | elapsed time per iteration (s): 0.43 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.953702E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.376 | TFLOPs: 31.45 | +7: iteration 77040/ 173500 | consumed samples: 19722240 | consumed tokens: 40391147520 | elapsed time per iteration (s): 0.42 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.969438E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.894 | TFLOPs: 31.74 | +7: iteration 77050/ 173500 | consumed samples: 19724800 | consumed tokens: 40396390400 | elapsed time per iteration (s): 0.42 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.957059E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.928 | TFLOPs: 31.63 | +7: iteration 77060/ 173500 | consumed samples: 19727360 | consumed tokens: 40401633280 | elapsed time per iteration (s): 0.42 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.962820E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.419 | TFLOPs: 31.61 | +7: iteration 77070/ 173500 | consumed samples: 19729920 | consumed tokens: 40406876160 | elapsed time per iteration (s): 0.42 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.960556E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.689 | TFLOPs: 31.94 | +7: iteration 77080/ 173500 | consumed samples: 19732480 | consumed tokens: 40412119040 | elapsed time per iteration (s): 0.42 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.971602E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.612 | TFLOPs: 31.93 | +7: iteration 77090/ 173500 | consumed samples: 19735040 | consumed tokens: 40417361920 | elapsed time per iteration (s): 0.42 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.968325E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.388 | TFLOPs: 31.92 | +7: iteration 77100/ 173500 | consumed samples: 19737600 | consumed tokens: 40422604800 | elapsed time per iteration (s): 0.42 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.963567E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.551 | TFLOPs: 31.61 | +7: iteration 77110/ 173500 | consumed samples: 19740160 | consumed tokens: 40427847680 | elapsed time per iteration (s): 0.42 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.968665E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.370 | TFLOPs: 31.92 | +7: iteration 77120/ 173500 | consumed samples: 19742720 | consumed tokens: 40433090560 | elapsed time per iteration (s): 0.42 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.968279E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.463 | TFLOPs: 31.87 | +7: iteration 77130/ 173500 | consumed samples: 19745280 | consumed tokens: 40438333440 | elapsed time per iteration (s): 0.42 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.963434E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.581 | TFLOPs: 31.93 | +7: iteration 77140/ 173500 | consumed samples: 19747840 | consumed tokens: 40443576320 | elapsed time per iteration (s): 0.43 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.958641E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.995 | TFLOPs: 31.53 | +7: iteration 77150/ 173500 | consumed samples: 19750400 | consumed tokens: 40448819200 | elapsed time per iteration (s): 0.42 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.961613E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.035 | TFLOPs: 31.69 | +7: iteration 77160/ 173500 | consumed samples: 19752960 | consumed tokens: 40454062080 | elapsed time per iteration (s): 0.42 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.943797E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.717 | TFLOPs: 31.73 | +7: iteration 77170/ 173500 | consumed samples: 19755520 | consumed tokens: 40459304960 | elapsed time per iteration (s): 0.43 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.958809E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.153 | TFLOPs: 31.54 | +7: iteration 77180/ 173500 | consumed samples: 19758080 | consumed tokens: 40464547840 | elapsed time per iteration (s): 0.43 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.970144E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.618 | TFLOPs: 31.20 | +7: iteration 77190/ 173500 | consumed samples: 19760640 | consumed tokens: 40469790720 | elapsed time per iteration (s): 0.42 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.954386E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.705 | TFLOPs: 31.94 | +7: iteration 77200/ 173500 | consumed samples: 19763200 | consumed tokens: 40475033600 | elapsed time per iteration (s): 0.42 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.955479E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.558 | TFLOPs: 31.93 | +7: iteration 77210/ 173500 | consumed samples: 19765760 | consumed tokens: 40480276480 | elapsed time per iteration (s): 0.43 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.955425E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.705 | TFLOPs: 31.36 | +7: iteration 77220/ 173500 | consumed samples: 19768320 | consumed tokens: 40485519360 | elapsed time per iteration (s): 0.43 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.951684E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.531 | TFLOPs: 31.25 | +7: iteration 77230/ 173500 | consumed samples: 19770880 | consumed tokens: 40490762240 | elapsed time per iteration (s): 0.43 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.968849E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.054 | TFLOPs: 31.12 | +7: iteration 77240/ 173500 | consumed samples: 19773440 | consumed tokens: 40496005120 | elapsed time per iteration (s): 0.43 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.967724E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.523 | TFLOPs: 31.51 | +7: iteration 77250/ 173500 | consumed samples: 19776000 | consumed tokens: 40501248000 | elapsed time per iteration (s): 0.42 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.953926E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.290 | TFLOPs: 31.76 | +7: iteration 77260/ 173500 | consumed samples: 19778560 | consumed tokens: 40506490880 | elapsed time per iteration (s): 0.43 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.968171E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.340 | TFLOPs: 31.18 | +7: iteration 77270/ 173500 | consumed samples: 19781120 | consumed tokens: 40511733760 | elapsed time per iteration (s): 0.42 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.971078E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.369 | TFLOPs: 31.76 | +7: iteration 77280/ 173500 | consumed samples: 19783680 | consumed tokens: 40516976640 | elapsed time per iteration (s): 0.43 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.961131E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.678 | TFLOPs: 31.36 | +7: iteration 77290/ 173500 | consumed samples: 19786240 | consumed tokens: 40522219520 | elapsed time per iteration (s): 0.42 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.957120E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.872 | TFLOPs: 31.74 | +7: iteration 77300/ 173500 | consumed samples: 19788800 | consumed tokens: 40527462400 | elapsed time per iteration (s): 0.43 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.971699E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.110 | TFLOPs: 31.54 | +7: iteration 77310/ 173500 | consumed samples: 19791360 | consumed tokens: 40532705280 | elapsed time per iteration (s): 0.43 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.970908E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.896 | TFLOPs: 31.58 | +7: iteration 77320/ 173500 | consumed samples: 19793920 | consumed tokens: 40537948160 | elapsed time per iteration (s): 0.42 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.966190E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.147 | TFLOPs: 31.75 | +7: iteration 77330/ 173500 | consumed samples: 19796480 | consumed tokens: 40543191040 | elapsed time per iteration (s): 0.43 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.959156E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.811 | TFLOPs: 31.52 | +7: iteration 77340/ 173500 | consumed samples: 19799040 | consumed tokens: 40548433920 | elapsed time per iteration (s): 0.42 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.955275E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.812 | TFLOPs: 31.94 | +7: iteration 77350/ 173500 | consumed samples: 19801600 | consumed tokens: 40553676800 | elapsed time per iteration (s): 0.42 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.953936E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.871 | TFLOPs: 31.95 | +7: iteration 77360/ 173500 | consumed samples: 19804160 | consumed tokens: 40558919680 | elapsed time per iteration (s): 0.42 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.977745E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.704 | TFLOPs: 31.94 | +7: iteration 77370/ 173500 | consumed samples: 19806720 | consumed tokens: 40564162560 | elapsed time per iteration (s): 0.43 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.955239E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.223 | TFLOPs: 31.49 | +7: iteration 77380/ 173500 | consumed samples: 19809280 | consumed tokens: 40569405440 | elapsed time per iteration (s): 0.42 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.954938E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.261 | TFLOPs: 31.65 | +7: iteration 77390/ 173500 | consumed samples: 19811840 | consumed tokens: 40574648320 | elapsed time per iteration (s): 0.42 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.964991E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.216 | TFLOPs: 31.91 | +7: iteration 77400/ 173500 | consumed samples: 19814400 | consumed tokens: 40579891200 | elapsed time per iteration (s): 0.43 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.957580E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.932 | TFLOPs: 30.95 | +7: iteration 77410/ 173500 | consumed samples: 19816960 | consumed tokens: 40585134080 | elapsed time per iteration (s): 0.42 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.965902E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.773 | TFLOPs: 31.94 | +7: iteration 77420/ 173500 | consumed samples: 19819520 | consumed tokens: 40590376960 | elapsed time per iteration (s): 0.42 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.954104E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.474 | TFLOPs: 31.72 | +7: iteration 77430/ 173500 | consumed samples: 19822080 | consumed tokens: 40595619840 | elapsed time per iteration (s): 0.43 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.954811E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.380 | TFLOPs: 31.55 | +7: iteration 77440/ 173500 | consumed samples: 19824640 | consumed tokens: 40600862720 | elapsed time per iteration (s): 0.43 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.967370E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.265 | TFLOPs: 31.60 | +7: iteration 77450/ 173500 | consumed samples: 19827200 | consumed tokens: 40606105600 | elapsed time per iteration (s): 0.42 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.961779E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.253 | TFLOPs: 31.65 | +7: iteration 77460/ 173500 | consumed samples: 19829760 | consumed tokens: 40611348480 | elapsed time per iteration (s): 0.42 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.957211E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.449 | TFLOPs: 31.92 | +7: iteration 77470/ 173500 | consumed samples: 19832320 | consumed tokens: 40616591360 | elapsed time per iteration (s): 0.42 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.958576E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.409 | TFLOPs: 31.61 | +7: iteration 77480/ 173500 | consumed samples: 19834880 | consumed tokens: 40621834240 | elapsed time per iteration (s): 0.43 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.970439E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.213 | TFLOPs: 31.54 | +7: iteration 77490/ 173500 | consumed samples: 19837440 | consumed tokens: 40627077120 | elapsed time per iteration (s): 0.42 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.965269E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.037 | TFLOPs: 31.96 | +7: iteration 77500/ 173500 | consumed samples: 19840000 | consumed tokens: 40632320000 | elapsed time per iteration (s): 0.42 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.959732E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.431 | TFLOPs: 31.92 | +7: iteration 77510/ 173500 | consumed samples: 19842560 | consumed tokens: 40637562880 | elapsed time per iteration (s): 0.42 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.963887E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.395 | TFLOPs: 31.76 | +7: iteration 77520/ 173500 | consumed samples: 19845120 | consumed tokens: 40642805760 | elapsed time per iteration (s): 0.43 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.964034E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.964 | TFLOPs: 31.32 | +7: iteration 77530/ 173500 | consumed samples: 19847680 | consumed tokens: 40648048640 | elapsed time per iteration (s): 0.43 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.964400E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.454 | TFLOPs: 31.14 | +7: iteration 77540/ 173500 | consumed samples: 19850240 | consumed tokens: 40653291520 | elapsed time per iteration (s): 0.42 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.967682E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.403 | TFLOPs: 31.87 | +7: iteration 77550/ 173500 | consumed samples: 19852800 | consumed tokens: 40658534400 | elapsed time per iteration (s): 0.42 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.945083E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.916 | TFLOPs: 31.90 | +7: iteration 77560/ 173500 | consumed samples: 19855360 | consumed tokens: 40663777280 | elapsed time per iteration (s): 0.42 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.960205E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.185 | TFLOPs: 31.75 | +7: iteration 77570/ 173500 | consumed samples: 19857920 | consumed tokens: 40669020160 | elapsed time per iteration (s): 0.42 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.957038E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.741 | TFLOPs: 31.89 | +7: iteration 77580/ 173500 | consumed samples: 19860480 | consumed tokens: 40674263040 | elapsed time per iteration (s): 0.42 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.969559E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.565 | TFLOPs: 31.67 | +7: iteration 77590/ 173500 | consumed samples: 19863040 | consumed tokens: 40679505920 | elapsed time per iteration (s): 0.42 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.963059E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.253 | TFLOPs: 31.86 | +7: iteration 77600/ 173500 | consumed samples: 19865600 | consumed tokens: 40684748800 | elapsed time per iteration (s): 0.43 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.955794E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.930 | TFLOPs: 31.48 | +7: iteration 77610/ 173500 | consumed samples: 19868160 | consumed tokens: 40689991680 | elapsed time per iteration (s): 0.42 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.952086E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.283 | TFLOPs: 31.71 | +7: iteration 77620/ 173500 | consumed samples: 19870720 | consumed tokens: 40695234560 | elapsed time per iteration (s): 0.42 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.964785E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.849 | TFLOPs: 31.68 | +7: iteration 77630/ 173500 | consumed samples: 19873280 | consumed tokens: 40700477440 | elapsed time per iteration (s): 0.42 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.955775E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.855 | TFLOPs: 31.68 | +7: iteration 77640/ 173500 | consumed samples: 19875840 | consumed tokens: 40705720320 | elapsed time per iteration (s): 0.42 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.966348E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.588 | TFLOPs: 31.88 | +7: iteration 77650/ 173500 | consumed samples: 19878400 | consumed tokens: 40710963200 | elapsed time per iteration (s): 0.42 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.961178E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.639 | TFLOPs: 31.88 | +7: iteration 77660/ 173500 | consumed samples: 19880960 | consumed tokens: 40716206080 | elapsed time per iteration (s): 0.42 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.948331E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.694 | TFLOPs: 31.88 | +7: iteration 77670/ 173500 | consumed samples: 19883520 | consumed tokens: 40721448960 | elapsed time per iteration (s): 0.42 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.937015E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.576 | TFLOPs: 31.83 | +7: iteration 77680/ 173500 | consumed samples: 19886080 | consumed tokens: 40726691840 | elapsed time per iteration (s): 0.42 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.963714E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.870 | TFLOPs: 31.63 | +7: iteration 77690/ 173500 | consumed samples: 19888640 | consumed tokens: 40731934720 | elapsed time per iteration (s): 0.42 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.946793E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.619 | TFLOPs: 31.88 | +7: iteration 77700/ 173500 | consumed samples: 19891200 | consumed tokens: 40737177600 | elapsed time per iteration (s): 0.42 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.950267E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.609 | TFLOPs: 31.78 | +7: iteration 77710/ 173500 | consumed samples: 19893760 | consumed tokens: 40742420480 | elapsed time per iteration (s): 0.42 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.950462E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.160 | TFLOPs: 31.65 | +7: iteration 77720/ 173500 | consumed samples: 19896320 | consumed tokens: 40747663360 | elapsed time per iteration (s): 0.42 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.965867E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.846 | TFLOPs: 31.68 | +7: iteration 77730/ 173500 | consumed samples: 19898880 | consumed tokens: 40752906240 | elapsed time per iteration (s): 0.43 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.964969E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.537 | TFLOPs: 31.56 | +7: iteration 77740/ 173500 | consumed samples: 19901440 | consumed tokens: 40758149120 | elapsed time per iteration (s): 0.42 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.969029E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.975 | TFLOPs: 31.90 | +7: iteration 77750/ 173500 | consumed samples: 19904000 | consumed tokens: 40763392000 | elapsed time per iteration (s): 0.42 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.968841E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.269 | TFLOPs: 31.65 | +7: iteration 77760/ 173500 | consumed samples: 19906560 | consumed tokens: 40768634880 | elapsed time per iteration (s): 0.43 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.965078E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.788 | TFLOPs: 31.57 | +7: iteration 77770/ 173500 | consumed samples: 19909120 | consumed tokens: 40773877760 | elapsed time per iteration (s): 0.42 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.934987E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.032 | TFLOPs: 31.90 | +7: iteration 77780/ 173500 | consumed samples: 19911680 | consumed tokens: 40779120640 | elapsed time per iteration (s): 0.42 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.955669E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.436 | TFLOPs: 31.87 | +7: iteration 77790/ 173500 | consumed samples: 19914240 | consumed tokens: 40784363520 | elapsed time per iteration (s): 0.42 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.951415E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.464 | TFLOPs: 31.87 | +7: iteration 77800/ 173500 | consumed samples: 19916800 | consumed tokens: 40789606400 | elapsed time per iteration (s): 0.42 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.943666E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.135 | TFLOPs: 31.86 | +7: iteration 77810/ 173500 | consumed samples: 19919360 | consumed tokens: 40794849280 | elapsed time per iteration (s): 0.42 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.952990E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.523 | TFLOPs: 31.88 | +7: iteration 77820/ 173500 | consumed samples: 19921920 | consumed tokens: 40800092160 | elapsed time per iteration (s): 0.42 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.960925E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.311 | TFLOPs: 31.71 | +7: iteration 77830/ 173500 | consumed samples: 19924480 | consumed tokens: 40805335040 | elapsed time per iteration (s): 0.42 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.958111E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.344 | TFLOPs: 31.66 | +7: iteration 77840/ 173500 | consumed samples: 19927040 | consumed tokens: 40810577920 | elapsed time per iteration (s): 0.42 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.969739E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.245 | TFLOPs: 31.86 | +7: iteration 77850/ 173500 | consumed samples: 19929600 | consumed tokens: 40815820800 | elapsed time per iteration (s): 0.42 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.962161E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.357 | TFLOPs: 31.76 | +7: iteration 77860/ 173500 | consumed samples: 19932160 | consumed tokens: 40821063680 | elapsed time per iteration (s): 0.42 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.949466E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.894 | TFLOPs: 31.74 | +7: iteration 77870/ 173500 | consumed samples: 19934720 | consumed tokens: 40826306560 | elapsed time per iteration (s): 0.42 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.961907E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.528 | TFLOPs: 31.67 | +7: iteration 77880/ 173500 | consumed samples: 19937280 | consumed tokens: 40831549440 | elapsed time per iteration (s): 0.42 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.956458E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.621 | TFLOPs: 31.88 | +7: iteration 77890/ 173500 | consumed samples: 19939840 | consumed tokens: 40836792320 | elapsed time per iteration (s): 0.43 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.964629E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.515 | TFLOPs: 31.40 | +7: iteration 77900/ 173500 | consumed samples: 19942400 | consumed tokens: 40842035200 | elapsed time per iteration (s): 0.43 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.952099E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.148 | TFLOPs: 31.38 | +7: iteration 77910/ 173500 | consumed samples: 19944960 | consumed tokens: 40847278080 | elapsed time per iteration (s): 0.42 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.968398E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.266 | TFLOPs: 31.91 | +7: iteration 77920/ 173500 | consumed samples: 19947520 | consumed tokens: 40852520960 | elapsed time per iteration (s): 0.42 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.951098E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.700 | TFLOPs: 31.89 | +7: iteration 77930/ 173500 | consumed samples: 19950080 | consumed tokens: 40857763840 | elapsed time per iteration (s): 0.42 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.948676E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.400 | TFLOPs: 31.87 | +7: iteration 77940/ 173500 | consumed samples: 19952640 | consumed tokens: 40863006720 | elapsed time per iteration (s): 0.42 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.953959E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.852 | TFLOPs: 31.89 | +7: iteration 77950/ 173500 | consumed samples: 19955200 | consumed tokens: 40868249600 | elapsed time per iteration (s): 0.42 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.951314E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.814 | TFLOPs: 31.89 | +7: iteration 77960/ 173500 | consumed samples: 19957760 | consumed tokens: 40873492480 | elapsed time per iteration (s): 0.42 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.958647E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.205 | TFLOPs: 31.75 | +7: iteration 77970/ 173500 | consumed samples: 19960320 | consumed tokens: 40878735360 | elapsed time per iteration (s): 0.42 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.956823E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.948 | TFLOPs: 31.90 | +7: iteration 77980/ 173500 | consumed samples: 19962880 | consumed tokens: 40883978240 | elapsed time per iteration (s): 0.42 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.958521E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.897 | TFLOPs: 31.90 | +7: iteration 77990/ 173500 | consumed samples: 19965440 | consumed tokens: 40889221120 | elapsed time per iteration (s): 0.42 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.952492E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.934 | TFLOPs: 31.90 | +0: [2023-03-17 08:26:42,464] [INFO] [logging.py:68:log_dist] [Rank 0] step=78000, skipped=0, lr=[0.00012575030905458257, 0.00012575030905458257, 0.00012575030905458257], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 78000/ 173500 | consumed samples: 19968000 | consumed tokens: 40894464000 | elapsed time per iteration (s): 0.42 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.959447E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.752 | TFLOPs: 31.89 | +0: steps: 78000 loss: 2.9684 iter time (s): 0.424 samples/sec: 603.099 +7: iteration 78010/ 173500 | consumed samples: 19970560 | consumed tokens: 40899706880 | elapsed time per iteration (s): 0.42 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.951557E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.242 | TFLOPs: 31.76 | +7: iteration 78020/ 173500 | consumed samples: 19973120 | consumed tokens: 40904949760 | elapsed time per iteration (s): 0.44 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.954237E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.628 | TFLOPs: 30.62 | +7: iteration 78030/ 173500 | consumed samples: 19975680 | consumed tokens: 40910192640 | elapsed time per iteration (s): 0.42 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.967734E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.614 | TFLOPs: 31.93 | +7: iteration 78040/ 173500 | consumed samples: 19978240 | consumed tokens: 40915435520 | elapsed time per iteration (s): 0.42 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.956785E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.375 | TFLOPs: 31.92 | +7: iteration 78050/ 173500 | consumed samples: 19980800 | consumed tokens: 40920678400 | elapsed time per iteration (s): 0.42 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.972202E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.418 | TFLOPs: 31.87 | +7: iteration 78060/ 173500 | consumed samples: 19983360 | consumed tokens: 40925921280 | elapsed time per iteration (s): 0.42 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.965217E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.931 | TFLOPs: 31.90 | +7: iteration 78070/ 173500 | consumed samples: 19985920 | consumed tokens: 40931164160 | elapsed time per iteration (s): 0.43 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.942788E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.427 | TFLOPs: 31.08 | +7: iteration 78080/ 173500 | consumed samples: 19988480 | consumed tokens: 40936407040 | elapsed time per iteration (s): 0.42 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.959538E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.338 | TFLOPs: 31.92 | +7: iteration 78090/ 173500 | consumed samples: 19991040 | consumed tokens: 40941649920 | elapsed time per iteration (s): 0.42 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.961466E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.854 | TFLOPs: 31.89 | +7: iteration 78100/ 173500 | consumed samples: 19993600 | consumed tokens: 40946892800 | elapsed time per iteration (s): 0.42 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.958786E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.818 | TFLOPs: 31.73 | +7: iteration 78110/ 173500 | consumed samples: 19996160 | consumed tokens: 40952135680 | elapsed time per iteration (s): 0.42 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.952303E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.939 | TFLOPs: 31.90 | +7: iteration 78120/ 173500 | consumed samples: 19998720 | consumed tokens: 40957378560 | elapsed time per iteration (s): 0.42 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.966468E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.178 | TFLOPs: 31.91 | +7: iteration 78130/ 173500 | consumed samples: 20001280 | consumed tokens: 40962621440 | elapsed time per iteration (s): 0.42 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.960192E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.985 | TFLOPs: 31.90 | +7: iteration 78140/ 173500 | consumed samples: 20003840 | consumed tokens: 40967864320 | elapsed time per iteration (s): 0.42 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.954953E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.194 | TFLOPs: 31.91 | +7: iteration 78150/ 173500 | consumed samples: 20006400 | consumed tokens: 40973107200 | elapsed time per iteration (s): 0.42 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.957079E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.403 | TFLOPs: 31.87 | +7: iteration 78160/ 173500 | consumed samples: 20008960 | consumed tokens: 40978350080 | elapsed time per iteration (s): 0.42 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.955467E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.993 | TFLOPs: 31.90 | +7: iteration 78170/ 173500 | consumed samples: 20011520 | consumed tokens: 40983592960 | elapsed time per iteration (s): 0.42 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.958827E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.758 | TFLOPs: 31.89 | +7: iteration 78180/ 173500 | consumed samples: 20014080 | consumed tokens: 40988835840 | elapsed time per iteration (s): 0.43 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.952588E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.755 | TFLOPs: 31.57 | +7: iteration 78190/ 173500 | consumed samples: 20016640 | consumed tokens: 40994078720 | elapsed time per iteration (s): 0.42 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.950089E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.169 | TFLOPs: 31.91 | +7: iteration 78200/ 173500 | consumed samples: 20019200 | consumed tokens: 40999321600 | elapsed time per iteration (s): 0.42 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.976093E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.624 | TFLOPs: 31.72 | +7: iteration 78210/ 173500 | consumed samples: 20021760 | consumed tokens: 41004564480 | elapsed time per iteration (s): 0.42 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.957941E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.613 | TFLOPs: 31.72 | +7: iteration 78220/ 173500 | consumed samples: 20024320 | consumed tokens: 41009807360 | elapsed time per iteration (s): 0.42 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.969028E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.172 | TFLOPs: 31.86 | +7: iteration 78230/ 173500 | consumed samples: 20026880 | consumed tokens: 41015050240 | elapsed time per iteration (s): 0.42 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.963815E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.098 | TFLOPs: 31.85 | +7: iteration 78240/ 173500 | consumed samples: 20029440 | consumed tokens: 41020293120 | elapsed time per iteration (s): 0.42 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.967014E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.852 | TFLOPs: 31.84 | +7: iteration 78250/ 173500 | consumed samples: 20032000 | consumed tokens: 41025536000 | elapsed time per iteration (s): 0.44 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.953004E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.010 | TFLOPs: 30.64 | +7: iteration 78260/ 173500 | consumed samples: 20034560 | consumed tokens: 41030778880 | elapsed time per iteration (s): 0.45 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.958128E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.854 | TFLOPs: 29.74 | +7: iteration 78270/ 173500 | consumed samples: 20037120 | consumed tokens: 41036021760 | elapsed time per iteration (s): 0.44 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.959389E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.130 | TFLOPs: 30.49 | +7: iteration 78280/ 173500 | consumed samples: 20039680 | consumed tokens: 41041264640 | elapsed time per iteration (s): 0.46 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.957414E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.930 | TFLOPs: 29.38 | +7: iteration 78290/ 173500 | consumed samples: 20042240 | consumed tokens: 41046507520 | elapsed time per iteration (s): 0.42 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.967192E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.722 | TFLOPs: 31.62 | +7: iteration 78300/ 173500 | consumed samples: 20044800 | consumed tokens: 41051750400 | elapsed time per iteration (s): 0.43 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.956808E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.387 | TFLOPs: 31.55 | +7: iteration 78310/ 173500 | consumed samples: 20047360 | consumed tokens: 41056993280 | elapsed time per iteration (s): 0.43 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.956889E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.904 | TFLOPs: 31.53 | +7: iteration 78320/ 173500 | consumed samples: 20049920 | consumed tokens: 41062236160 | elapsed time per iteration (s): 0.42 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.977101E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.109 | TFLOPs: 31.91 | +7: iteration 78330/ 173500 | consumed samples: 20052480 | consumed tokens: 41067479040 | elapsed time per iteration (s): 0.42 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.958447E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.218 | TFLOPs: 31.70 | +7: iteration 78340/ 173500 | consumed samples: 20055040 | consumed tokens: 41072721920 | elapsed time per iteration (s): 0.42 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.960643E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.675 | TFLOPs: 31.94 | +7: iteration 78350/ 173500 | consumed samples: 20057600 | consumed tokens: 41077964800 | elapsed time per iteration (s): 0.43 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.971350E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.845 | TFLOPs: 31.26 | +7: iteration 78360/ 173500 | consumed samples: 20060160 | consumed tokens: 41083207680 | elapsed time per iteration (s): 0.42 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.958108E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.493 | TFLOPs: 31.93 | +7: iteration 78370/ 173500 | consumed samples: 20062720 | consumed tokens: 41088450560 | elapsed time per iteration (s): 0.42 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.958887E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.100 | TFLOPs: 31.85 | +7: iteration 78380/ 173500 | consumed samples: 20065280 | consumed tokens: 41093693440 | elapsed time per iteration (s): 0.43 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.956385E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.005 | TFLOPs: 31.59 | +7: iteration 78390/ 173500 | consumed samples: 20067840 | consumed tokens: 41098936320 | elapsed time per iteration (s): 0.42 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.955293E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.312 | TFLOPs: 31.92 | +7: iteration 78400/ 173500 | consumed samples: 20070400 | consumed tokens: 41104179200 | elapsed time per iteration (s): 0.42 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.970342E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.828 | TFLOPs: 31.89 | +7: iteration 78410/ 173500 | consumed samples: 20072960 | consumed tokens: 41109422080 | elapsed time per iteration (s): 0.42 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.951572E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.708 | TFLOPs: 31.89 | +7: iteration 78420/ 173500 | consumed samples: 20075520 | consumed tokens: 41114664960 | elapsed time per iteration (s): 0.42 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.966526E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.661 | TFLOPs: 31.88 | +7: iteration 78430/ 173500 | consumed samples: 20078080 | consumed tokens: 41119907840 | elapsed time per iteration (s): 0.42 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.958180E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.036 | TFLOPs: 31.90 | +7: iteration 78440/ 173500 | consumed samples: 20080640 | consumed tokens: 41125150720 | elapsed time per iteration (s): 0.42 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.943058E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.848 | TFLOPs: 31.89 | +7: iteration 78450/ 173500 | consumed samples: 20083200 | consumed tokens: 41130393600 | elapsed time per iteration (s): 0.42 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.953351E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.708 | TFLOPs: 31.89 | +7: iteration 78460/ 173500 | consumed samples: 20085760 | consumed tokens: 41135636480 | elapsed time per iteration (s): 0.42 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.957066E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.585 | TFLOPs: 31.88 | +7: iteration 78470/ 173500 | consumed samples: 20088320 | consumed tokens: 41140879360 | elapsed time per iteration (s): 0.42 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.958326E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.137 | TFLOPs: 31.86 | +7: iteration 78480/ 173500 | consumed samples: 20090880 | consumed tokens: 41146122240 | elapsed time per iteration (s): 0.42 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.953843E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.717 | TFLOPs: 31.89 | +7: iteration 78490/ 173500 | consumed samples: 20093440 | consumed tokens: 41151365120 | elapsed time per iteration (s): 0.42 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.961593E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.077 | TFLOPs: 31.85 | +7: iteration 78500/ 173500 | consumed samples: 20096000 | consumed tokens: 41156608000 | elapsed time per iteration (s): 0.42 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.962518E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.255 | TFLOPs: 31.86 | +7: iteration 78510/ 173500 | consumed samples: 20098560 | consumed tokens: 41161850880 | elapsed time per iteration (s): 0.42 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.946150E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.177 | TFLOPs: 31.86 | +7: iteration 78520/ 173500 | consumed samples: 20101120 | consumed tokens: 41167093760 | elapsed time per iteration (s): 0.42 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.953134E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.539 | TFLOPs: 31.82 | +7: iteration 78530/ 173500 | consumed samples: 20103680 | consumed tokens: 41172336640 | elapsed time per iteration (s): 0.42 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.956179E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.146 | TFLOPs: 31.86 | +7: iteration 78540/ 173500 | consumed samples: 20106240 | consumed tokens: 41177579520 | elapsed time per iteration (s): 0.42 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.966378E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.120 | TFLOPs: 31.85 | +7: iteration 78550/ 173500 | consumed samples: 20108800 | consumed tokens: 41182822400 | elapsed time per iteration (s): 0.42 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.956252E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.167 | TFLOPs: 31.86 | +7: iteration 78560/ 173500 | consumed samples: 20111360 | consumed tokens: 41188065280 | elapsed time per iteration (s): 0.42 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.972256E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.863 | TFLOPs: 31.84 | +7: iteration 78570/ 173500 | consumed samples: 20113920 | consumed tokens: 41193308160 | elapsed time per iteration (s): 0.42 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.962473E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.268 | TFLOPs: 31.86 | +7: iteration 78580/ 173500 | consumed samples: 20116480 | consumed tokens: 41198551040 | elapsed time per iteration (s): 0.42 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.963642E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.358 | TFLOPs: 31.81 | +7: iteration 78590/ 173500 | consumed samples: 20119040 | consumed tokens: 41203793920 | elapsed time per iteration (s): 0.42 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.959382E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.315 | TFLOPs: 31.86 | +7: iteration 78600/ 173500 | consumed samples: 20121600 | consumed tokens: 41209036800 | elapsed time per iteration (s): 0.42 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.965073E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.438 | TFLOPs: 31.87 | +7: iteration 78610/ 173500 | consumed samples: 20124160 | consumed tokens: 41214279680 | elapsed time per iteration (s): 0.43 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.968853E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.004 | TFLOPs: 31.11 | +7: iteration 78620/ 173500 | consumed samples: 20126720 | consumed tokens: 41219522560 | elapsed time per iteration (s): 0.42 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.959766E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.483 | TFLOPs: 31.87 | +7: iteration 78630/ 173500 | consumed samples: 20129280 | consumed tokens: 41224765440 | elapsed time per iteration (s): 0.42 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.957269E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.176 | TFLOPs: 31.86 | +7: iteration 78640/ 173500 | consumed samples: 20131840 | consumed tokens: 41230008320 | elapsed time per iteration (s): 0.42 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.970210E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.829 | TFLOPs: 31.79 | +7: iteration 78650/ 173500 | consumed samples: 20134400 | consumed tokens: 41235251200 | elapsed time per iteration (s): 0.42 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.960148E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.183 | TFLOPs: 31.81 | +7: iteration 78660/ 173500 | consumed samples: 20136960 | consumed tokens: 41240494080 | elapsed time per iteration (s): 0.42 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.951795E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.473 | TFLOPs: 31.82 | +7: iteration 78670/ 173500 | consumed samples: 20139520 | consumed tokens: 41245736960 | elapsed time per iteration (s): 0.42 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.958659E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.068 | TFLOPs: 31.85 | +7: iteration 78680/ 173500 | consumed samples: 20142080 | consumed tokens: 41250979840 | elapsed time per iteration (s): 0.42 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.950384E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.390 | TFLOPs: 31.87 | +7: iteration 78690/ 173500 | consumed samples: 20144640 | consumed tokens: 41256222720 | elapsed time per iteration (s): 0.42 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.948821E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.249 | TFLOPs: 31.86 | +7: iteration 78700/ 173500 | consumed samples: 20147200 | consumed tokens: 41261465600 | elapsed time per iteration (s): 0.42 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.971630E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.823 | TFLOPs: 31.84 | +7: iteration 78710/ 173500 | consumed samples: 20149760 | consumed tokens: 41266708480 | elapsed time per iteration (s): 0.42 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.959974E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.099 | TFLOPs: 31.85 | +7: iteration 78720/ 173500 | consumed samples: 20152320 | consumed tokens: 41271951360 | elapsed time per iteration (s): 0.42 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.956971E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.035 | TFLOPs: 31.85 | +7: iteration 78730/ 173500 | consumed samples: 20154880 | consumed tokens: 41277194240 | elapsed time per iteration (s): 0.43 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.954167E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.197 | TFLOPs: 31.33 | +7: iteration 78740/ 173500 | consumed samples: 20157440 | consumed tokens: 41282437120 | elapsed time per iteration (s): 0.42 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.949210E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.070 | TFLOPs: 31.85 | +7: iteration 78750/ 173500 | consumed samples: 20160000 | consumed tokens: 41287680000 | elapsed time per iteration (s): 0.42 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.955845E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.669 | TFLOPs: 31.88 | +7: iteration 78760/ 173500 | consumed samples: 20162560 | consumed tokens: 41292922880 | elapsed time per iteration (s): 0.42 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.956506E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.954 | TFLOPs: 31.85 | +7: iteration 78770/ 173500 | consumed samples: 20165120 | consumed tokens: 41298165760 | elapsed time per iteration (s): 0.42 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.960948E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.249 | TFLOPs: 31.86 | +7: iteration 78780/ 173500 | consumed samples: 20167680 | consumed tokens: 41303408640 | elapsed time per iteration (s): 0.42 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.963541E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.558 | TFLOPs: 31.83 | +7: iteration 78790/ 173500 | consumed samples: 20170240 | consumed tokens: 41308651520 | elapsed time per iteration (s): 0.42 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.951932E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.019 | TFLOPs: 31.85 | +7: iteration 78800/ 173500 | consumed samples: 20172800 | consumed tokens: 41313894400 | elapsed time per iteration (s): 0.42 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.959759E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.037 | TFLOPs: 31.85 | +7: iteration 78810/ 173500 | consumed samples: 20175360 | consumed tokens: 41319137280 | elapsed time per iteration (s): 0.42 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.953637E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.895 | TFLOPs: 31.84 | +7: iteration 78820/ 173500 | consumed samples: 20177920 | consumed tokens: 41324380160 | elapsed time per iteration (s): 0.42 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.952442E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.214 | TFLOPs: 31.70 | +7: iteration 78830/ 173500 | consumed samples: 20180480 | consumed tokens: 41329623040 | elapsed time per iteration (s): 0.42 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.958464E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.115 | TFLOPs: 31.85 | +7: iteration 78840/ 173500 | consumed samples: 20183040 | consumed tokens: 41334865920 | elapsed time per iteration (s): 0.42 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.958588E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.301 | TFLOPs: 31.86 | +7: iteration 78850/ 173500 | consumed samples: 20185600 | consumed tokens: 41340108800 | elapsed time per iteration (s): 0.42 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.956605E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.433 | TFLOPs: 31.87 | +7: iteration 78860/ 173500 | consumed samples: 20188160 | consumed tokens: 41345351680 | elapsed time per iteration (s): 0.42 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.962651E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.307 | TFLOPs: 31.86 | +7: iteration 78870/ 173500 | consumed samples: 20190720 | consumed tokens: 41350594560 | elapsed time per iteration (s): 0.42 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.964988E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.366 | TFLOPs: 31.87 | +7: iteration 78880/ 173500 | consumed samples: 20193280 | consumed tokens: 41355837440 | elapsed time per iteration (s): 0.42 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.957242E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.591 | TFLOPs: 31.88 | +7: iteration 78890/ 173500 | consumed samples: 20195840 | consumed tokens: 41361080320 | elapsed time per iteration (s): 0.42 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.954408E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.395 | TFLOPs: 31.87 | +7: iteration 78900/ 173500 | consumed samples: 20198400 | consumed tokens: 41366323200 | elapsed time per iteration (s): 0.42 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.955728E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.621 | TFLOPs: 31.88 | +7: iteration 78910/ 173500 | consumed samples: 20200960 | consumed tokens: 41371566080 | elapsed time per iteration (s): 0.42 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.954733E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.529 | TFLOPs: 31.88 | +7: iteration 78920/ 173500 | consumed samples: 20203520 | consumed tokens: 41376808960 | elapsed time per iteration (s): 0.42 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.964445E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.538 | TFLOPs: 31.88 | +7: iteration 78930/ 173500 | consumed samples: 20206080 | consumed tokens: 41382051840 | elapsed time per iteration (s): 0.42 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.964205E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.384 | TFLOPs: 31.87 | +7: iteration 78940/ 173500 | consumed samples: 20208640 | consumed tokens: 41387294720 | elapsed time per iteration (s): 0.42 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.946680E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.119 | TFLOPs: 31.85 | +7: iteration 78950/ 173500 | consumed samples: 20211200 | consumed tokens: 41392537600 | elapsed time per iteration (s): 0.42 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.965901E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.403 | TFLOPs: 31.87 | +7: iteration 78960/ 173500 | consumed samples: 20213760 | consumed tokens: 41397780480 | elapsed time per iteration (s): 0.42 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.957935E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.831 | TFLOPs: 31.84 | +7: iteration 78970/ 173500 | consumed samples: 20216320 | consumed tokens: 41403023360 | elapsed time per iteration (s): 0.42 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.969069E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.353 | TFLOPs: 31.87 | +7: iteration 78980/ 173500 | consumed samples: 20218880 | consumed tokens: 41408266240 | elapsed time per iteration (s): 0.42 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.945754E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.385 | TFLOPs: 31.87 | +7: iteration 78990/ 173500 | consumed samples: 20221440 | consumed tokens: 41413509120 | elapsed time per iteration (s): 0.42 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.943500E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.094 | TFLOPs: 31.85 | +7: iteration 79000/ 173500 | consumed samples: 20224000 | consumed tokens: 41418752000 | elapsed time per iteration (s): 0.42 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.949953E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.030 | TFLOPs: 31.85 | +7: iteration 79010/ 173500 | consumed samples: 20226560 | consumed tokens: 41423994880 | elapsed time per iteration (s): 0.42 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.967516E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.291 | TFLOPs: 31.86 | +7: iteration 79020/ 173500 | consumed samples: 20229120 | consumed tokens: 41429237760 | elapsed time per iteration (s): 0.42 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.966293E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.468 | TFLOPs: 31.87 | +7: iteration 79030/ 173500 | consumed samples: 20231680 | consumed tokens: 41434480640 | elapsed time per iteration (s): 0.42 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.959956E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.248 | TFLOPs: 31.86 | +7: iteration 79040/ 173500 | consumed samples: 20234240 | consumed tokens: 41439723520 | elapsed time per iteration (s): 0.42 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.964987E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.425 | TFLOPs: 31.87 | +7: iteration 79050/ 173500 | consumed samples: 20236800 | consumed tokens: 41444966400 | elapsed time per iteration (s): 0.42 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.965778E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.114 | TFLOPs: 31.85 | +7: iteration 79060/ 173500 | consumed samples: 20239360 | consumed tokens: 41450209280 | elapsed time per iteration (s): 0.42 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.964388E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.360 | TFLOPs: 31.87 | +7: iteration 79070/ 173500 | consumed samples: 20241920 | consumed tokens: 41455452160 | elapsed time per iteration (s): 0.42 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.962506E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.137 | TFLOPs: 31.86 | +7: iteration 79080/ 173500 | consumed samples: 20244480 | consumed tokens: 41460695040 | elapsed time per iteration (s): 0.42 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.971583E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.711 | TFLOPs: 31.83 | +7: iteration 79090/ 173500 | consumed samples: 20247040 | consumed tokens: 41465937920 | elapsed time per iteration (s): 0.42 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.963000E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.341 | TFLOPs: 31.87 | +7: iteration 79100/ 173500 | consumed samples: 20249600 | consumed tokens: 41471180800 | elapsed time per iteration (s): 0.42 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.963297E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.279 | TFLOPs: 31.86 | +7: iteration 79110/ 173500 | consumed samples: 20252160 | consumed tokens: 41476423680 | elapsed time per iteration (s): 0.42 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.953588E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.689 | TFLOPs: 31.88 | +7: iteration 79120/ 173500 | consumed samples: 20254720 | consumed tokens: 41481666560 | elapsed time per iteration (s): 0.42 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.964964E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.381 | TFLOPs: 31.87 | +7: iteration 79130/ 173500 | consumed samples: 20257280 | consumed tokens: 41486909440 | elapsed time per iteration (s): 0.42 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.963219E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.477 | TFLOPs: 31.87 | +7: iteration 79140/ 173500 | consumed samples: 20259840 | consumed tokens: 41492152320 | elapsed time per iteration (s): 0.42 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.956698E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.184 | TFLOPs: 31.86 | +7: iteration 79150/ 173500 | consumed samples: 20262400 | consumed tokens: 41497395200 | elapsed time per iteration (s): 0.42 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.962474E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.483 | TFLOPs: 31.87 | +7: iteration 79160/ 173500 | consumed samples: 20264960 | consumed tokens: 41502638080 | elapsed time per iteration (s): 0.42 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.946159E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.365 | TFLOPs: 31.87 | +7: iteration 79170/ 173500 | consumed samples: 20267520 | consumed tokens: 41507880960 | elapsed time per iteration (s): 0.42 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.949671E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.641 | TFLOPs: 31.88 | +7: iteration 79180/ 173500 | consumed samples: 20270080 | consumed tokens: 41513123840 | elapsed time per iteration (s): 0.42 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.971204E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.307 | TFLOPs: 31.86 | +7: iteration 79190/ 173500 | consumed samples: 20272640 | consumed tokens: 41518366720 | elapsed time per iteration (s): 0.43 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.956889E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.816 | TFLOPs: 31.47 | +7: iteration 79200/ 173500 | consumed samples: 20275200 | consumed tokens: 41523609600 | elapsed time per iteration (s): 0.42 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.957484E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.489 | TFLOPs: 31.87 | +7: iteration 79210/ 173500 | consumed samples: 20277760 | consumed tokens: 41528852480 | elapsed time per iteration (s): 0.42 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.954594E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.529 | TFLOPs: 31.88 | +7: iteration 79220/ 173500 | consumed samples: 20280320 | consumed tokens: 41534095360 | elapsed time per iteration (s): 0.42 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.972186E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.387 | TFLOPs: 31.87 | +7: iteration 79230/ 173500 | consumed samples: 20282880 | consumed tokens: 41539338240 | elapsed time per iteration (s): 0.42 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.953406E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.417 | TFLOPs: 31.71 | +7: iteration 79240/ 173500 | consumed samples: 20285440 | consumed tokens: 41544581120 | elapsed time per iteration (s): 0.43 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.971879E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.818 | TFLOPs: 31.58 | +7: iteration 79250/ 173500 | consumed samples: 20288000 | consumed tokens: 41549824000 | elapsed time per iteration (s): 0.42 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.969813E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.505 | TFLOPs: 31.87 | +7: iteration 79260/ 173500 | consumed samples: 20290560 | consumed tokens: 41555066880 | elapsed time per iteration (s): 0.42 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.960705E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.255 | TFLOPs: 31.86 | +7: iteration 79270/ 173500 | consumed samples: 20293120 | consumed tokens: 41560309760 | elapsed time per iteration (s): 0.42 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.959071E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.703 | TFLOPs: 31.89 | +7: iteration 79280/ 173500 | consumed samples: 20295680 | consumed tokens: 41565552640 | elapsed time per iteration (s): 0.42 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.974950E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.419 | TFLOPs: 31.87 | +7: iteration 79290/ 173500 | consumed samples: 20298240 | consumed tokens: 41570795520 | elapsed time per iteration (s): 0.42 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.955484E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.451 | TFLOPs: 31.87 | +7: iteration 79300/ 173500 | consumed samples: 20300800 | consumed tokens: 41576038400 | elapsed time per iteration (s): 0.42 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.945766E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.345 | TFLOPs: 31.87 | +7: iteration 79310/ 173500 | consumed samples: 20303360 | consumed tokens: 41581281280 | elapsed time per iteration (s): 0.42 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.968509E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.324 | TFLOPs: 31.87 | +7: iteration 79320/ 173500 | consumed samples: 20305920 | consumed tokens: 41586524160 | elapsed time per iteration (s): 0.42 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.950095E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.374 | TFLOPs: 31.87 | +7: iteration 79330/ 173500 | consumed samples: 20308480 | consumed tokens: 41591767040 | elapsed time per iteration (s): 0.42 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.963314E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.264 | TFLOPs: 31.86 | +7: iteration 79340/ 173500 | consumed samples: 20311040 | consumed tokens: 41597009920 | elapsed time per iteration (s): 0.42 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.973578E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.225 | TFLOPs: 31.86 | +7: iteration 79350/ 173500 | consumed samples: 20313600 | consumed tokens: 41602252800 | elapsed time per iteration (s): 0.42 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.965511E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.219 | TFLOPs: 31.86 | +7: iteration 79360/ 173500 | consumed samples: 20316160 | consumed tokens: 41607495680 | elapsed time per iteration (s): 0.42 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.956249E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.331 | TFLOPs: 31.87 | +7: iteration 79370/ 173500 | consumed samples: 20318720 | consumed tokens: 41612738560 | elapsed time per iteration (s): 0.42 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.968295E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.936 | TFLOPs: 31.84 | +7: iteration 79380/ 173500 | consumed samples: 20321280 | consumed tokens: 41617981440 | elapsed time per iteration (s): 0.42 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.947896E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.035 | TFLOPs: 31.85 | +7: iteration 79390/ 173500 | consumed samples: 20323840 | consumed tokens: 41623224320 | elapsed time per iteration (s): 0.42 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.961180E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.785 | TFLOPs: 31.84 | +7: iteration 79400/ 173500 | consumed samples: 20326400 | consumed tokens: 41628467200 | elapsed time per iteration (s): 0.42 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.953294E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.266 | TFLOPs: 31.86 | +7: iteration 79410/ 173500 | consumed samples: 20328960 | consumed tokens: 41633710080 | elapsed time per iteration (s): 0.42 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.952147E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.780 | TFLOPs: 31.84 | +7: iteration 79420/ 173500 | consumed samples: 20331520 | consumed tokens: 41638952960 | elapsed time per iteration (s): 0.42 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.959372E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.463 | TFLOPs: 31.87 | +7: iteration 79430/ 173500 | consumed samples: 20334080 | consumed tokens: 41644195840 | elapsed time per iteration (s): 0.42 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.949488E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.476 | TFLOPs: 31.87 | +7: iteration 79440/ 173500 | consumed samples: 20336640 | consumed tokens: 41649438720 | elapsed time per iteration (s): 0.42 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.951553E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.446 | TFLOPs: 31.87 | +7: iteration 79450/ 173500 | consumed samples: 20339200 | consumed tokens: 41654681600 | elapsed time per iteration (s): 0.42 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.944117E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.872 | TFLOPs: 31.84 | +7: iteration 79460/ 173500 | consumed samples: 20341760 | consumed tokens: 41659924480 | elapsed time per iteration (s): 0.42 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.956392E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.918 | TFLOPs: 31.84 | +7: iteration 79470/ 173500 | consumed samples: 20344320 | consumed tokens: 41665167360 | elapsed time per iteration (s): 0.42 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.959216E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.097 | TFLOPs: 31.85 | +7: iteration 79480/ 173500 | consumed samples: 20346880 | consumed tokens: 41670410240 | elapsed time per iteration (s): 0.42 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.957641E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.125 | TFLOPs: 31.85 | +7: iteration 79490/ 173500 | consumed samples: 20349440 | consumed tokens: 41675653120 | elapsed time per iteration (s): 0.42 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.947790E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.834 | TFLOPs: 31.84 | +7: iteration 79500/ 173500 | consumed samples: 20352000 | consumed tokens: 41680896000 | elapsed time per iteration (s): 0.42 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.945229E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.462 | TFLOPs: 31.87 | +7: iteration 79510/ 173500 | consumed samples: 20354560 | consumed tokens: 41686138880 | elapsed time per iteration (s): 0.42 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.957160E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.079 | TFLOPs: 31.85 | +7: iteration 79520/ 173500 | consumed samples: 20357120 | consumed tokens: 41691381760 | elapsed time per iteration (s): 0.42 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.961916E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.314 | TFLOPs: 31.86 | +7: iteration 79530/ 173500 | consumed samples: 20359680 | consumed tokens: 41696624640 | elapsed time per iteration (s): 0.42 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.958529E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.264 | TFLOPs: 31.86 | +7: iteration 79540/ 173500 | consumed samples: 20362240 | consumed tokens: 41701867520 | elapsed time per iteration (s): 0.42 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.952827E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.999 | TFLOPs: 31.85 | +7: iteration 79550/ 173500 | consumed samples: 20364800 | consumed tokens: 41707110400 | elapsed time per iteration (s): 0.42 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.953353E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.000 | TFLOPs: 31.85 | +7: iteration 79560/ 173500 | consumed samples: 20367360 | consumed tokens: 41712353280 | elapsed time per iteration (s): 0.42 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.969931E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.370 | TFLOPs: 31.87 | +7: iteration 79570/ 173500 | consumed samples: 20369920 | consumed tokens: 41717596160 | elapsed time per iteration (s): 0.42 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.944270E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.171 | TFLOPs: 31.86 | +7: iteration 79580/ 173500 | consumed samples: 20372480 | consumed tokens: 41722839040 | elapsed time per iteration (s): 0.44 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.954436E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.762 | TFLOPs: 30.79 | +7: iteration 79590/ 173500 | consumed samples: 20375040 | consumed tokens: 41728081920 | elapsed time per iteration (s): 0.43 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.967365E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.646 | TFLOPs: 31.10 | +7: iteration 79600/ 173500 | consumed samples: 20377600 | consumed tokens: 41733324800 | elapsed time per iteration (s): 0.42 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.954417E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.201 | TFLOPs: 31.91 | +7: iteration 79610/ 173500 | consumed samples: 20380160 | consumed tokens: 41738567680 | elapsed time per iteration (s): 0.42 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.963353E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.591 | TFLOPs: 31.88 | +7: iteration 79620/ 173500 | consumed samples: 20382720 | consumed tokens: 41743810560 | elapsed time per iteration (s): 0.42 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.970932E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.839 | TFLOPs: 31.89 | +7: iteration 79630/ 173500 | consumed samples: 20385280 | consumed tokens: 41749053440 | elapsed time per iteration (s): 0.42 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.951468E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.604 | TFLOPs: 31.88 | +7: iteration 79640/ 173500 | consumed samples: 20387840 | consumed tokens: 41754296320 | elapsed time per iteration (s): 0.42 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.957553E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.356 | TFLOPs: 31.87 | +7: iteration 79650/ 173500 | consumed samples: 20390400 | consumed tokens: 41759539200 | elapsed time per iteration (s): 0.42 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.963124E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.194 | TFLOPs: 31.86 | +7: iteration 79660/ 173500 | consumed samples: 20392960 | consumed tokens: 41764782080 | elapsed time per iteration (s): 0.42 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.967717E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.437 | TFLOPs: 31.87 | +7: iteration 79670/ 173500 | consumed samples: 20395520 | consumed tokens: 41770024960 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.964005E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.780 | TFLOPs: 31.84 | +7: iteration 79680/ 173500 | consumed samples: 20398080 | consumed tokens: 41775267840 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.953320E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.286 | TFLOPs: 31.86 | +7: iteration 79690/ 173500 | consumed samples: 20400640 | consumed tokens: 41780510720 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.955562E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.254 | TFLOPs: 31.86 | +7: iteration 79700/ 173500 | consumed samples: 20403200 | consumed tokens: 41785753600 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.958216E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.532 | TFLOPs: 31.88 | +7: iteration 79710/ 173500 | consumed samples: 20405760 | consumed tokens: 41790996480 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.960410E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.540 | TFLOPs: 31.88 | +7: iteration 79720/ 173500 | consumed samples: 20408320 | consumed tokens: 41796239360 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.948335E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.379 | TFLOPs: 31.87 | +7: iteration 79730/ 173500 | consumed samples: 20410880 | consumed tokens: 41801482240 | elapsed time per iteration (s): 0.47 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.955385E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 549.880 | TFLOPs: 28.85 | +7: iteration 79740/ 173500 | consumed samples: 20413440 | consumed tokens: 41806725120 | elapsed time per iteration (s): 0.44 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.942855E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.830 | TFLOPs: 30.48 | +7: iteration 79750/ 173500 | consumed samples: 20416000 | consumed tokens: 41811968000 | elapsed time per iteration (s): 0.45 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.953125E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.585 | TFLOPs: 30.10 | +7: iteration 79760/ 173500 | consumed samples: 20418560 | consumed tokens: 41817210880 | elapsed time per iteration (s): 0.45 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.961472E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.192 | TFLOPs: 29.76 | +7: iteration 79770/ 173500 | consumed samples: 20421120 | consumed tokens: 41822453760 | elapsed time per iteration (s): 0.44 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.954967E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.756 | TFLOPs: 30.42 | +7: iteration 79780/ 173500 | consumed samples: 20423680 | consumed tokens: 41827696640 | elapsed time per iteration (s): 0.42 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.951140E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.072 | TFLOPs: 32.06 | +7: iteration 79790/ 173500 | consumed samples: 20426240 | consumed tokens: 41832939520 | elapsed time per iteration (s): 0.43 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.932539E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.551 | TFLOPs: 31.25 | +7: iteration 79800/ 173500 | consumed samples: 20428800 | consumed tokens: 41838182400 | elapsed time per iteration (s): 0.43 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.961077E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.955 | TFLOPs: 31.27 | +7: iteration 79810/ 173500 | consumed samples: 20431360 | consumed tokens: 41843425280 | elapsed time per iteration (s): 0.43 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.940367E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.264 | TFLOPs: 31.18 | +7: iteration 79820/ 173500 | consumed samples: 20433920 | consumed tokens: 41848668160 | elapsed time per iteration (s): 0.43 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.960049E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.888 | TFLOPs: 31.27 | +7: iteration 79830/ 173500 | consumed samples: 20436480 | consumed tokens: 41853911040 | elapsed time per iteration (s): 0.43 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.952104E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.794 | TFLOPs: 31.26 | +7: iteration 79840/ 173500 | consumed samples: 20439040 | consumed tokens: 41859153920 | elapsed time per iteration (s): 0.46 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.960462E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.444 | TFLOPs: 29.25 | +7: iteration 79850/ 173500 | consumed samples: 20441600 | consumed tokens: 41864396800 | elapsed time per iteration (s): 0.46 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.960814E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.459 | TFLOPs: 29.20 | +7: iteration 79860/ 173500 | consumed samples: 20444160 | consumed tokens: 41869639680 | elapsed time per iteration (s): 0.46 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.954233E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.830 | TFLOPs: 29.37 | +7: iteration 79870/ 173500 | consumed samples: 20446720 | consumed tokens: 41874882560 | elapsed time per iteration (s): 0.47 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.954023E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 543.831 | TFLOPs: 28.53 | +7: iteration 79880/ 173500 | consumed samples: 20449280 | consumed tokens: 41880125440 | elapsed time per iteration (s): 0.45 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.953432E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.929 | TFLOPs: 29.90 | +7: iteration 79890/ 173500 | consumed samples: 20451840 | consumed tokens: 41885368320 | elapsed time per iteration (s): 0.44 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.949927E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.127 | TFLOPs: 30.60 | +7: iteration 79900/ 173500 | consumed samples: 20454400 | consumed tokens: 41890611200 | elapsed time per iteration (s): 0.43 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.952831E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.598 | TFLOPs: 31.35 | +7: iteration 79910/ 173500 | consumed samples: 20456960 | consumed tokens: 41895854080 | elapsed time per iteration (s): 0.42 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.955097E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.449 | TFLOPs: 32.08 | +7: iteration 79920/ 173500 | consumed samples: 20459520 | consumed tokens: 41901096960 | elapsed time per iteration (s): 0.42 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.946044E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.500 | TFLOPs: 31.98 | +7: iteration 79930/ 173500 | consumed samples: 20462080 | consumed tokens: 41906339840 | elapsed time per iteration (s): 0.42 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.941350E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.022 | TFLOPs: 31.95 | +7: iteration 79940/ 173500 | consumed samples: 20464640 | consumed tokens: 41911582720 | elapsed time per iteration (s): 0.42 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.952999E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.470 | TFLOPs: 31.93 | +7: iteration 79950/ 173500 | consumed samples: 20467200 | consumed tokens: 41916825600 | elapsed time per iteration (s): 0.42 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.957132E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.476 | TFLOPs: 31.93 | +7: iteration 79960/ 173500 | consumed samples: 20469760 | consumed tokens: 41922068480 | elapsed time per iteration (s): 0.42 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.962495E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.597 | TFLOPs: 31.93 | +7: iteration 79970/ 173500 | consumed samples: 20472320 | consumed tokens: 41927311360 | elapsed time per iteration (s): 0.42 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.958348E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.461 | TFLOPs: 31.92 | +7: iteration 79980/ 173500 | consumed samples: 20474880 | consumed tokens: 41932554240 | elapsed time per iteration (s): 0.42 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.955869E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.081 | TFLOPs: 31.96 | +7: iteration 79990/ 173500 | consumed samples: 20477440 | consumed tokens: 41937797120 | elapsed time per iteration (s): 0.42 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.954388E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.387 | TFLOPs: 31.92 | +0: [2023-03-17 08:40:51,582] [INFO] [logging.py:68:log_dist] [Rank 0] step=80000, skipped=0, lr=[0.00012249910047811783, 0.00012249910047811783, 0.00012249910047811783], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 80000/ 173500 | consumed samples: 20480000 | consumed tokens: 41943040000 | elapsed time per iteration (s): 0.43 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.965105E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.941 | TFLOPs: 31.22 | +0: steps: 80000 loss: 2.9668 iter time (s): 0.422 samples/sec: 606.037 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 80000 | lm loss value: 3.300453E+00 | lm loss PPL: 2.712492E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 80000 to checkpoints_221m91b400m +0: [2023-03-17 08:40:51,745] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step80000 is begin to save! +0: [2023-03-17 08:40:51,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_01-model_00-model_states.pt... +0: [2023-03-17 08:40:51,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_01-model_00-model_states.pt. +0: [2023-03-17 08:40:51,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_03-model_00-model_states.pt... +0: [2023-03-17 08:40:51,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_03-model_00-model_states.pt. +0: [2023-03-17 08:40:51,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_04-model_00-model_states.pt... +0: [2023-03-17 08:40:51,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_04-model_00-model_states.pt. +0: [2023-03-17 08:40:51,922] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_05-model_00-model_states.pt... +0: [2023-03-17 08:40:51,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_05-model_00-model_states.pt. +0: [2023-03-17 08:40:51,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_06-model_00-model_states.pt... +0: [2023-03-17 08:40:51,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_06-model_00-model_states.pt. +0: [2023-03-17 08:40:51,972] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_07-model_00-model_states.pt... +0: [2023-03-17 08:40:51,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_07-model_00-model_states.pt. +0: [2023-03-17 08:40:51,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_08-model_00-model_states.pt... +0: [2023-03-17 08:40:52,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_08-model_00-model_states.pt. +0: [2023-03-17 08:40:52,022] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_09-model_00-model_states.pt... +0: [2023-03-17 08:40:52,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_09-model_00-model_states.pt. +0: [2023-03-17 08:40:52,046] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_10-model_00-model_states.pt... +0: [2023-03-17 08:40:52,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_10-model_00-model_states.pt. +0: [2023-03-17 08:40:52,069] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_11-model_00-model_states.pt... +0: [2023-03-17 08:40:52,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_11-model_00-model_states.pt. +0: [2023-03-17 08:40:52,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_12-model_00-model_states.pt... +0: [2023-03-17 08:40:52,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_12-model_00-model_states.pt. +0: [2023-03-17 08:40:52,119] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_13-model_00-model_states.pt... +0: [2023-03-17 08:40:52,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_13-model_00-model_states.pt. +0: [2023-03-17 08:40:52,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_14-model_00-model_states.pt... +0: [2023-03-17 08:40:52,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_14-model_00-model_states.pt. +0: [2023-03-17 08:40:52,168] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_15-model_00-model_states.pt... +0: [2023-03-17 08:40:52,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_15-model_00-model_states.pt. +0: [2023-03-17 08:40:52,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_16-model_00-model_states.pt... +0: [2023-03-17 08:40:52,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_16-model_00-model_states.pt. +0: [2023-03-17 08:40:52,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_17-model_00-model_states.pt... +0: [2023-03-17 08:40:52,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_17-model_00-model_states.pt. +0: [2023-03-17 08:40:52,241] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_18-model_00-model_states.pt... +0: [2023-03-17 08:40:52,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_18-model_00-model_states.pt. +0: [2023-03-17 08:40:52,265] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_19-model_00-model_states.pt... +0: [2023-03-17 08:40:52,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_19-model_00-model_states.pt. +0: [2023-03-17 08:40:52,289] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_20-model_00-model_states.pt... +0: [2023-03-17 08:40:52,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_20-model_00-model_states.pt. +0: [2023-03-17 08:40:52,314] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/layer_22-model_00-model_states.pt... +0: [2023-03-17 08:40:52,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/layer_22-model_00-model_states.pt. +0: [2023-03-17 08:40:52,319] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step80000/mp_rank_00_model_states.pt +0: [2023-03-17 08:40:52,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/mp_rank_00_model_states.pt... +0: [2023-03-17 08:40:52,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/mp_rank_00_model_states.pt. +0: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +7: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +6: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +4: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +0: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +1: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +7: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +6: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +2: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +4: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +5: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +0: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +3: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +1: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +2: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +5: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 08:40:52,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +0: [2023-03-17 08:40:52,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 08:40:52,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 08:40:52,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +0: [2023-03-17 08:40:52,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 08:40:52,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 08:40:52,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +0: [2023-03-17 08:40:52,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 08:40:52,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 08:40:52,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +0: [2023-03-17 08:40:52,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 08:40:52,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 08:40:52,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +0: [2023-03-17 08:40:52,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 08:40:52,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 08:40:52,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +0: [2023-03-17 08:40:52,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 08:40:52,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 08:40:52,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +0: [2023-03-17 08:40:52,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 08:40:52,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 08:40:52,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 08:40:52,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 08:40:52,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 08:40:52,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 08:40:52,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 08:40:52,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 08:40:52,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 08:40:52,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 08:40:52,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +4: [2023-03-17 08:40:52,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +4: [2023-03-17 08:40:52,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 08:40:52,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 08:40:52,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +2: [2023-03-17 08:40:52,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 08:40:52,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 08:40:52,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 08:40:52,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 08:40:52,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 08:40:52,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 08:40:52,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 08:40:52,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 08:40:52,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 08:40:52,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +3: [2023-03-17 08:40:52,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +2: [2023-03-17 08:40:52,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 08:40:52,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 08:40:52,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 08:40:52,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 08:40:52,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 08:40:52,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +2: [2023-03-17 08:40:52,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 08:40:52,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +2: [2023-03-17 08:40:52,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +2: [2023-03-17 08:40:52,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +2: [2023-03-17 08:40:52,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 08:40:52,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 08:40:52,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 08:40:52,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 08:40:52,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 08:40:52,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 08:40:52,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 08:40:52,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +2: [2023-03-17 08:40:52,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 08:40:52,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +2: [2023-03-17 08:40:52,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +2: [2023-03-17 08:40:52,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +5: [2023-03-17 08:40:52,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 08:40:52,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 08:40:52,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 08:40:52,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 08:40:52,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 08:40:52,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 08:40:52,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 08:40:52,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +5: [2023-03-17 08:40:52,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +5: [2023-03-17 08:40:52,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +5: [2023-03-17 08:40:52,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 08:40:52,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +5: [2023-03-17 08:40:52,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 08:40:52,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 08:40:52,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +5: [2023-03-17 08:40:52,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 08:40:52,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 08:40:52,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 08:40:52,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 08:40:52,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 08:40:52,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 08:40:52,448] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +5: [2023-03-17 08:40:52,448] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +5: [2023-03-17 08:40:52,448] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 08:40:52,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 08:40:52,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 08:40:52,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 08:40:52,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 08:40:52,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +1: [2023-03-17 08:40:52,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +1: [2023-03-17 08:40:52,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +1: [2023-03-17 08:40:52,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 08:40:52,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +0: [2023-03-17 08:40:52,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 08:40:52,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 08:40:52,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 08:40:52,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 08:40:52,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 08:40:52,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 08:40:52,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 08:40:52,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 08:40:52,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 08:40:52,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +6: [2023-03-17 08:40:52,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 08:40:52,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 08:40:52,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 08:40:52,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 08:40:52,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 08:40:52,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 08:40:52,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +7: [2023-03-17 08:40:52,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 08:40:52,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +7: [2023-03-17 08:40:52,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! +0: successfully saved checkpoint at iteration 80000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 752.72 +7: iteration 80010/ 173500 | consumed samples: 20482560 | consumed tokens: 41948282880 | elapsed time per iteration (s): 0.51 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.957614E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 505.481 | TFLOPs: 26.52 | +7: iteration 80020/ 173500 | consumed samples: 20485120 | consumed tokens: 41953525760 | elapsed time per iteration (s): 0.43 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.957528E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.258 | TFLOPs: 31.60 | +7: iteration 80030/ 173500 | consumed samples: 20487680 | consumed tokens: 41958768640 | elapsed time per iteration (s): 0.42 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.963587E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.490 | TFLOPs: 31.66 | +7: iteration 80040/ 173500 | consumed samples: 20490240 | consumed tokens: 41964011520 | elapsed time per iteration (s): 0.42 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.958352E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.912 | TFLOPs: 31.69 | +7: iteration 80050/ 173500 | consumed samples: 20492800 | consumed tokens: 41969254400 | elapsed time per iteration (s): 0.42 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.961392E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.364 | TFLOPs: 31.81 | +7: iteration 80060/ 173500 | consumed samples: 20495360 | consumed tokens: 41974497280 | elapsed time per iteration (s): 0.42 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.951234E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.355 | TFLOPs: 32.02 | +7: iteration 80070/ 173500 | consumed samples: 20497920 | consumed tokens: 41979740160 | elapsed time per iteration (s): 0.43 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.949150E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.128 | TFLOPs: 31.54 | +7: iteration 80080/ 173500 | consumed samples: 20500480 | consumed tokens: 41984983040 | elapsed time per iteration (s): 0.42 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.944996E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.692 | TFLOPs: 31.99 | +7: iteration 80090/ 173500 | consumed samples: 20503040 | consumed tokens: 41990225920 | elapsed time per iteration (s): 0.42 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.960727E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.387 | TFLOPs: 31.66 | +7: iteration 80100/ 173500 | consumed samples: 20505600 | consumed tokens: 41995468800 | elapsed time per iteration (s): 0.42 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.967809E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.762 | TFLOPs: 31.99 | +7: iteration 80110/ 173500 | consumed samples: 20508160 | consumed tokens: 42000711680 | elapsed time per iteration (s): 0.43 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.943653E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.876 | TFLOPs: 31.53 | +7: iteration 80120/ 173500 | consumed samples: 20510720 | consumed tokens: 42005954560 | elapsed time per iteration (s): 0.43 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.952427E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.655 | TFLOPs: 31.57 | +7: iteration 80130/ 173500 | consumed samples: 20513280 | consumed tokens: 42011197440 | elapsed time per iteration (s): 0.43 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.947252E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.298 | TFLOPs: 31.55 | +7: iteration 80140/ 173500 | consumed samples: 20515840 | consumed tokens: 42016440320 | elapsed time per iteration (s): 0.43 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.954724E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.415 | TFLOPs: 31.56 | +7: iteration 80150/ 173500 | consumed samples: 20518400 | consumed tokens: 42021683200 | elapsed time per iteration (s): 0.43 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.963059E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.794 | TFLOPs: 31.47 | +7: iteration 80160/ 173500 | consumed samples: 20520960 | consumed tokens: 42026926080 | elapsed time per iteration (s): 0.43 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.948546E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.636 | TFLOPs: 31.41 | +7: iteration 80170/ 173500 | consumed samples: 20523520 | consumed tokens: 42032168960 | elapsed time per iteration (s): 0.43 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.963567E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.943 | TFLOPs: 31.43 | +7: iteration 80180/ 173500 | consumed samples: 20526080 | consumed tokens: 42037411840 | elapsed time per iteration (s): 0.43 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.954929E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.252 | TFLOPs: 31.07 | +7: iteration 80190/ 173500 | consumed samples: 20528640 | consumed tokens: 42042654720 | elapsed time per iteration (s): 0.43 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.968834E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.477 | TFLOPs: 31.40 | +7: iteration 80200/ 173500 | consumed samples: 20531200 | consumed tokens: 42047897600 | elapsed time per iteration (s): 0.42 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.954487E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.416 | TFLOPs: 31.82 | +7: iteration 80210/ 173500 | consumed samples: 20533760 | consumed tokens: 42053140480 | elapsed time per iteration (s): 0.43 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.949524E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.397 | TFLOPs: 31.45 | +7: iteration 80220/ 173500 | consumed samples: 20536320 | consumed tokens: 42058383360 | elapsed time per iteration (s): 0.43 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.950856E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.143 | TFLOPs: 31.49 | +7: iteration 80230/ 173500 | consumed samples: 20538880 | consumed tokens: 42063626240 | elapsed time per iteration (s): 0.42 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.965821E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.979 | TFLOPs: 31.69 | +7: iteration 80240/ 173500 | consumed samples: 20541440 | consumed tokens: 42068869120 | elapsed time per iteration (s): 0.42 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.949672E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.398 | TFLOPs: 31.66 | +7: iteration 80250/ 173500 | consumed samples: 20544000 | consumed tokens: 42074112000 | elapsed time per iteration (s): 0.42 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.958108E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.837 | TFLOPs: 31.79 | +7: iteration 80260/ 173500 | consumed samples: 20546560 | consumed tokens: 42079354880 | elapsed time per iteration (s): 0.42 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.959811E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.287 | TFLOPs: 31.76 | +7: iteration 80270/ 173500 | consumed samples: 20549120 | consumed tokens: 42084597760 | elapsed time per iteration (s): 0.42 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.961001E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.459 | TFLOPs: 31.66 | +7: iteration 80280/ 173500 | consumed samples: 20551680 | consumed tokens: 42089840640 | elapsed time per iteration (s): 0.42 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.952645E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.492 | TFLOPs: 31.98 | +7: iteration 80290/ 173500 | consumed samples: 20554240 | consumed tokens: 42095083520 | elapsed time per iteration (s): 0.42 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.953368E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.186 | TFLOPs: 31.75 | +7: iteration 80300/ 173500 | consumed samples: 20556800 | consumed tokens: 42100326400 | elapsed time per iteration (s): 0.43 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.950829E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.574 | TFLOPs: 31.56 | +7: iteration 80310/ 173500 | consumed samples: 20559360 | consumed tokens: 42105569280 | elapsed time per iteration (s): 0.42 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.960749E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.798 | TFLOPs: 31.79 | +7: iteration 80320/ 173500 | consumed samples: 20561920 | consumed tokens: 42110812160 | elapsed time per iteration (s): 0.42 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.953878E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.515 | TFLOPs: 31.98 | +7: iteration 80330/ 173500 | consumed samples: 20564480 | consumed tokens: 42116055040 | elapsed time per iteration (s): 0.42 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.960335E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.006 | TFLOPs: 31.95 | +7: iteration 80340/ 173500 | consumed samples: 20567040 | consumed tokens: 42121297920 | elapsed time per iteration (s): 0.42 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.948427E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.432 | TFLOPs: 31.71 | +7: iteration 80350/ 173500 | consumed samples: 20569600 | consumed tokens: 42126540800 | elapsed time per iteration (s): 0.42 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.949282E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.256 | TFLOPs: 31.65 | +7: iteration 80360/ 173500 | consumed samples: 20572160 | consumed tokens: 42131783680 | elapsed time per iteration (s): 0.43 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.954446E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.217 | TFLOPs: 31.54 | +7: iteration 80370/ 173500 | consumed samples: 20574720 | consumed tokens: 42137026560 | elapsed time per iteration (s): 0.42 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.956107E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.590 | TFLOPs: 31.67 | +7: iteration 80380/ 173500 | consumed samples: 20577280 | consumed tokens: 42142269440 | elapsed time per iteration (s): 0.43 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.938545E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.895 | TFLOPs: 31.21 | +7: iteration 80390/ 173500 | consumed samples: 20579840 | consumed tokens: 42147512320 | elapsed time per iteration (s): 0.42 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.941241E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.728 | TFLOPs: 31.62 | +7: iteration 80400/ 173500 | consumed samples: 20582400 | consumed tokens: 42152755200 | elapsed time per iteration (s): 0.42 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.941657E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.541 | TFLOPs: 31.77 | +7: iteration 80410/ 173500 | consumed samples: 20584960 | consumed tokens: 42157998080 | elapsed time per iteration (s): 0.43 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.947120E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.493 | TFLOPs: 31.56 | +7: iteration 80420/ 173500 | consumed samples: 20587520 | consumed tokens: 42163240960 | elapsed time per iteration (s): 0.42 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.964944E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.704 | TFLOPs: 31.83 | +7: iteration 80430/ 173500 | consumed samples: 20590080 | consumed tokens: 42168483840 | elapsed time per iteration (s): 0.42 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.957259E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.277 | TFLOPs: 31.76 | +7: iteration 80440/ 173500 | consumed samples: 20592640 | consumed tokens: 42173726720 | elapsed time per iteration (s): 0.43 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.960064E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.093 | TFLOPs: 31.33 | +7: iteration 80450/ 173500 | consumed samples: 20595200 | consumed tokens: 42178969600 | elapsed time per iteration (s): 0.43 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.946178E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.603 | TFLOPs: 31.57 | +7: iteration 80460/ 173500 | consumed samples: 20597760 | consumed tokens: 42184212480 | elapsed time per iteration (s): 0.42 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.955715E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.902 | TFLOPs: 31.63 | +7: iteration 80470/ 173500 | consumed samples: 20600320 | consumed tokens: 42189455360 | elapsed time per iteration (s): 0.43 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.966010E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.273 | TFLOPs: 31.55 | +7: iteration 80480/ 173500 | consumed samples: 20602880 | consumed tokens: 42194698240 | elapsed time per iteration (s): 0.42 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.959472E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.272 | TFLOPs: 31.76 | +7: iteration 80490/ 173500 | consumed samples: 20605440 | consumed tokens: 42199941120 | elapsed time per iteration (s): 0.43 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.953292E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.422 | TFLOPs: 31.56 | +7: iteration 80500/ 173500 | consumed samples: 20608000 | consumed tokens: 42205184000 | elapsed time per iteration (s): 0.42 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.974824E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.636 | TFLOPs: 31.78 | +7: iteration 80510/ 173500 | consumed samples: 20610560 | consumed tokens: 42210426880 | elapsed time per iteration (s): 0.42 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.953218E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.583 | TFLOPs: 31.77 | +7: iteration 80520/ 173500 | consumed samples: 20613120 | consumed tokens: 42215669760 | elapsed time per iteration (s): 0.42 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.955758E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.067 | TFLOPs: 31.75 | +7: iteration 80530/ 173500 | consumed samples: 20615680 | consumed tokens: 42220912640 | elapsed time per iteration (s): 0.42 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.955974E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.161 | TFLOPs: 31.96 | +7: iteration 80540/ 173500 | consumed samples: 20618240 | consumed tokens: 42226155520 | elapsed time per iteration (s): 0.42 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.956698E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.481 | TFLOPs: 31.98 | +7: iteration 80550/ 173500 | consumed samples: 20620800 | consumed tokens: 42231398400 | elapsed time per iteration (s): 0.42 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.942514E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.370 | TFLOPs: 31.92 | +7: iteration 80560/ 173500 | consumed samples: 20623360 | consumed tokens: 42236641280 | elapsed time per iteration (s): 0.43 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.954155E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.802 | TFLOPs: 31.52 | +7: iteration 80570/ 173500 | consumed samples: 20625920 | consumed tokens: 42241884160 | elapsed time per iteration (s): 0.42 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.962770E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.883 | TFLOPs: 31.68 | +7: iteration 80580/ 173500 | consumed samples: 20628480 | consumed tokens: 42247127040 | elapsed time per iteration (s): 0.42 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.953433E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.017 | TFLOPs: 31.74 | +7: iteration 80590/ 173500 | consumed samples: 20631040 | consumed tokens: 42252369920 | elapsed time per iteration (s): 0.43 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.955993E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.819 | TFLOPs: 31.52 | +7: iteration 80600/ 173500 | consumed samples: 20633600 | consumed tokens: 42257612800 | elapsed time per iteration (s): 0.42 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.948793E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.021 | TFLOPs: 31.74 | +7: iteration 80610/ 173500 | consumed samples: 20636160 | consumed tokens: 42262855680 | elapsed time per iteration (s): 0.42 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.953777E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.544 | TFLOPs: 31.77 | +7: iteration 80620/ 173500 | consumed samples: 20638720 | consumed tokens: 42268098560 | elapsed time per iteration (s): 0.42 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.964168E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.010 | TFLOPs: 31.90 | +7: iteration 80630/ 173500 | consumed samples: 20641280 | consumed tokens: 42273341440 | elapsed time per iteration (s): 0.42 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.957335E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.476 | TFLOPs: 31.61 | +7: iteration 80640/ 173500 | consumed samples: 20643840 | consumed tokens: 42278584320 | elapsed time per iteration (s): 0.42 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.943308E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.919 | TFLOPs: 31.95 | +7: iteration 80650/ 173500 | consumed samples: 20646400 | consumed tokens: 42283827200 | elapsed time per iteration (s): 0.42 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.954731E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.937 | TFLOPs: 31.79 | +7: iteration 80660/ 173500 | consumed samples: 20648960 | consumed tokens: 42289070080 | elapsed time per iteration (s): 0.42 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.950695E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.502 | TFLOPs: 31.72 | +7: iteration 80670/ 173500 | consumed samples: 20651520 | consumed tokens: 42294312960 | elapsed time per iteration (s): 0.43 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.939819E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.290 | TFLOPs: 31.29 | +7: iteration 80680/ 173500 | consumed samples: 20654080 | consumed tokens: 42299555840 | elapsed time per iteration (s): 0.42 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.956256E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.312 | TFLOPs: 31.71 | +7: iteration 80690/ 173500 | consumed samples: 20656640 | consumed tokens: 42304798720 | elapsed time per iteration (s): 0.42 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.954227E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.047 | TFLOPs: 31.96 | +7: iteration 80700/ 173500 | consumed samples: 20659200 | consumed tokens: 42310041600 | elapsed time per iteration (s): 0.42 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.955013E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.589 | TFLOPs: 31.77 | +7: iteration 80710/ 173500 | consumed samples: 20661760 | consumed tokens: 42315284480 | elapsed time per iteration (s): 0.42 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.955786E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.218 | TFLOPs: 31.86 | +7: iteration 80720/ 173500 | consumed samples: 20664320 | consumed tokens: 42320527360 | elapsed time per iteration (s): 0.43 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.966994E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.624 | TFLOPs: 31.36 | +7: iteration 80730/ 173500 | consumed samples: 20666880 | consumed tokens: 42325770240 | elapsed time per iteration (s): 0.42 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.965043E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.580 | TFLOPs: 31.77 | +7: iteration 80740/ 173500 | consumed samples: 20669440 | consumed tokens: 42331013120 | elapsed time per iteration (s): 0.42 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.929251E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.805 | TFLOPs: 31.63 | +7: iteration 80750/ 173500 | consumed samples: 20672000 | consumed tokens: 42336256000 | elapsed time per iteration (s): 0.42 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.947040E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.096 | TFLOPs: 31.96 | +7: iteration 80760/ 173500 | consumed samples: 20674560 | consumed tokens: 42341498880 | elapsed time per iteration (s): 0.43 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.951304E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.129 | TFLOPs: 31.54 | +7: iteration 80770/ 173500 | consumed samples: 20677120 | consumed tokens: 42346741760 | elapsed time per iteration (s): 0.42 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.948440E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.649 | TFLOPs: 31.88 | +7: iteration 80780/ 173500 | consumed samples: 20679680 | consumed tokens: 42351984640 | elapsed time per iteration (s): 0.43 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.949312E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.942 | TFLOPs: 31.53 | +7: iteration 80790/ 173500 | consumed samples: 20682240 | consumed tokens: 42357227520 | elapsed time per iteration (s): 0.42 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.953459E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.249 | TFLOPs: 31.81 | +7: iteration 80800/ 173500 | consumed samples: 20684800 | consumed tokens: 42362470400 | elapsed time per iteration (s): 0.42 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.946806E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.591 | TFLOPs: 31.72 | +7: iteration 80810/ 173500 | consumed samples: 20687360 | consumed tokens: 42367713280 | elapsed time per iteration (s): 0.43 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.969664E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.447 | TFLOPs: 31.56 | +7: iteration 80820/ 173500 | consumed samples: 20689920 | consumed tokens: 42372956160 | elapsed time per iteration (s): 0.42 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.958715E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.671 | TFLOPs: 31.83 | +7: iteration 80830/ 173500 | consumed samples: 20692480 | consumed tokens: 42378199040 | elapsed time per iteration (s): 0.43 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.965260E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.148 | TFLOPs: 31.54 | +7: iteration 80840/ 173500 | consumed samples: 20695040 | consumed tokens: 42383441920 | elapsed time per iteration (s): 0.43 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.955781E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.807 | TFLOPs: 31.47 | +7: iteration 80850/ 173500 | consumed samples: 20697600 | consumed tokens: 42388684800 | elapsed time per iteration (s): 0.42 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.955385E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.505 | TFLOPs: 31.98 | +7: iteration 80860/ 173500 | consumed samples: 20700160 | consumed tokens: 42393927680 | elapsed time per iteration (s): 0.43 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.964877E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.971 | TFLOPs: 31.48 | +7: iteration 80870/ 173500 | consumed samples: 20702720 | consumed tokens: 42399170560 | elapsed time per iteration (s): 0.42 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.953080E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.502 | TFLOPs: 31.98 | +7: iteration 80880/ 173500 | consumed samples: 20705280 | consumed tokens: 42404413440 | elapsed time per iteration (s): 0.42 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.953662E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.548 | TFLOPs: 31.72 | +7: iteration 80890/ 173500 | consumed samples: 20707840 | consumed tokens: 42409656320 | elapsed time per iteration (s): 0.42 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.948768E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.547 | TFLOPs: 31.88 | +7: iteration 80900/ 173500 | consumed samples: 20710400 | consumed tokens: 42414899200 | elapsed time per iteration (s): 0.43 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.941150E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.691 | TFLOPs: 31.46 | +7: iteration 80910/ 173500 | consumed samples: 20712960 | consumed tokens: 42420142080 | elapsed time per iteration (s): 0.42 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.944682E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.941 | TFLOPs: 31.69 | +7: iteration 80920/ 173500 | consumed samples: 20715520 | consumed tokens: 42425384960 | elapsed time per iteration (s): 0.42 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.953959E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.695 | TFLOPs: 31.62 | +7: iteration 80930/ 173500 | consumed samples: 20718080 | consumed tokens: 42430627840 | elapsed time per iteration (s): 0.42 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.958183E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.941 | TFLOPs: 31.74 | +7: iteration 80940/ 173500 | consumed samples: 20720640 | consumed tokens: 42435870720 | elapsed time per iteration (s): 0.42 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.962597E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.454 | TFLOPs: 31.77 | +7: iteration 80950/ 173500 | consumed samples: 20723200 | consumed tokens: 42441113600 | elapsed time per iteration (s): 0.42 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.958737E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.385 | TFLOPs: 31.76 | +7: iteration 80960/ 173500 | consumed samples: 20725760 | consumed tokens: 42446356480 | elapsed time per iteration (s): 0.43 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.954607E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.747 | TFLOPs: 31.52 | +7: iteration 80970/ 173500 | consumed samples: 20728320 | consumed tokens: 42451599360 | elapsed time per iteration (s): 0.43 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.961913E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.459 | TFLOPs: 31.24 | +7: iteration 80980/ 173500 | consumed samples: 20730880 | consumed tokens: 42456842240 | elapsed time per iteration (s): 0.43 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.967031E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.609 | TFLOPs: 31.57 | +7: iteration 80990/ 173500 | consumed samples: 20733440 | consumed tokens: 42462085120 | elapsed time per iteration (s): 0.43 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.942701E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.321 | TFLOPs: 31.29 | +7: iteration 81000/ 173500 | consumed samples: 20736000 | consumed tokens: 42467328000 | elapsed time per iteration (s): 0.42 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.956462E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.306 | TFLOPs: 31.97 | +7: iteration 81010/ 173500 | consumed samples: 20738560 | consumed tokens: 42472570880 | elapsed time per iteration (s): 0.43 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.967271E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.920 | TFLOPs: 31.53 | +7: iteration 81020/ 173500 | consumed samples: 20741120 | consumed tokens: 42477813760 | elapsed time per iteration (s): 0.43 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.955929E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.905 | TFLOPs: 30.90 | +7: iteration 81030/ 173500 | consumed samples: 20743680 | consumed tokens: 42483056640 | elapsed time per iteration (s): 0.42 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.945764E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.075 | TFLOPs: 32.01 | +7: iteration 81040/ 173500 | consumed samples: 20746240 | consumed tokens: 42488299520 | elapsed time per iteration (s): 0.42 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.962176E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.496 | TFLOPs: 31.98 | +7: iteration 81050/ 173500 | consumed samples: 20748800 | consumed tokens: 42493542400 | elapsed time per iteration (s): 0.43 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.943366E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.218 | TFLOPs: 31.23 | +7: iteration 81060/ 173500 | consumed samples: 20751360 | consumed tokens: 42498785280 | elapsed time per iteration (s): 0.43 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.952113E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.854 | TFLOPs: 31.42 | +7: iteration 81070/ 173500 | consumed samples: 20753920 | consumed tokens: 42504028160 | elapsed time per iteration (s): 0.43 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.945990E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.171 | TFLOPs: 31.49 | +7: iteration 81080/ 173500 | consumed samples: 20756480 | consumed tokens: 42509271040 | elapsed time per iteration (s): 0.43 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.948332E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.375 | TFLOPs: 31.45 | +7: iteration 81090/ 173500 | consumed samples: 20759040 | consumed tokens: 42514513920 | elapsed time per iteration (s): 0.42 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.954522E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.505 | TFLOPs: 31.98 | +7: iteration 81100/ 173500 | consumed samples: 20761600 | consumed tokens: 42519756800 | elapsed time per iteration (s): 0.43 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.953327E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.983 | TFLOPs: 31.48 | +7: iteration 81110/ 173500 | consumed samples: 20764160 | consumed tokens: 42524999680 | elapsed time per iteration (s): 0.42 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.961487E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.797 | TFLOPs: 31.73 | +7: iteration 81120/ 173500 | consumed samples: 20766720 | consumed tokens: 42530242560 | elapsed time per iteration (s): 0.43 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.957112E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.439 | TFLOPs: 31.40 | +7: iteration 81130/ 173500 | consumed samples: 20769280 | consumed tokens: 42535485440 | elapsed time per iteration (s): 0.42 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.959715E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.434 | TFLOPs: 31.98 | +7: iteration 81140/ 173500 | consumed samples: 20771840 | consumed tokens: 42540728320 | elapsed time per iteration (s): 0.43 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.952941E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.313 | TFLOPs: 31.13 | +7: iteration 81150/ 173500 | consumed samples: 20774400 | consumed tokens: 42545971200 | elapsed time per iteration (s): 0.43 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.950961E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.200 | TFLOPs: 31.23 | +7: iteration 81160/ 173500 | consumed samples: 20776960 | consumed tokens: 42551214080 | elapsed time per iteration (s): 0.42 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.955258E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.936 | TFLOPs: 32.00 | +7: iteration 81170/ 173500 | consumed samples: 20779520 | consumed tokens: 42556456960 | elapsed time per iteration (s): 0.42 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.958749E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.795 | TFLOPs: 31.68 | +7: iteration 81180/ 173500 | consumed samples: 20782080 | consumed tokens: 42561699840 | elapsed time per iteration (s): 0.42 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.946909E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.626 | TFLOPs: 31.93 | +7: iteration 81190/ 173500 | consumed samples: 20784640 | consumed tokens: 42566942720 | elapsed time per iteration (s): 0.42 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.949886E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.533 | TFLOPs: 31.88 | +7: iteration 81200/ 173500 | consumed samples: 20787200 | consumed tokens: 42572185600 | elapsed time per iteration (s): 0.42 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.952078E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.925 | TFLOPs: 31.69 | +7: iteration 81210/ 173500 | consumed samples: 20789760 | consumed tokens: 42577428480 | elapsed time per iteration (s): 0.42 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.949869E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.369 | TFLOPs: 31.61 | +7: iteration 81220/ 173500 | consumed samples: 20792320 | consumed tokens: 42582671360 | elapsed time per iteration (s): 0.43 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.949836E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.814 | TFLOPs: 31.37 | +7: iteration 81230/ 173500 | consumed samples: 20794880 | consumed tokens: 42587914240 | elapsed time per iteration (s): 0.42 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.938750E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.691 | TFLOPs: 31.67 | +7: iteration 81240/ 173500 | consumed samples: 20797440 | consumed tokens: 42593157120 | elapsed time per iteration (s): 0.43 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.958221E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.360 | TFLOPs: 31.50 | +7: iteration 81250/ 173500 | consumed samples: 20800000 | consumed tokens: 42598400000 | elapsed time per iteration (s): 0.42 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.970244E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.782 | TFLOPs: 31.73 | +7: iteration 81260/ 173500 | consumed samples: 20802560 | consumed tokens: 42603642880 | elapsed time per iteration (s): 0.42 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.949584E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.116 | TFLOPs: 31.80 | +7: iteration 81270/ 173500 | consumed samples: 20805120 | consumed tokens: 42608885760 | elapsed time per iteration (s): 0.42 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.958964E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.785 | TFLOPs: 31.73 | +7: iteration 81280/ 173500 | consumed samples: 20807680 | consumed tokens: 42614128640 | elapsed time per iteration (s): 0.43 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.952389E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.700 | TFLOPs: 31.57 | +7: iteration 81290/ 173500 | consumed samples: 20810240 | consumed tokens: 42619371520 | elapsed time per iteration (s): 0.42 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.950956E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.633 | TFLOPs: 31.93 | +7: iteration 81300/ 173500 | consumed samples: 20812800 | consumed tokens: 42624614400 | elapsed time per iteration (s): 0.42 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.955185E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.779 | TFLOPs: 31.68 | +7: iteration 81310/ 173500 | consumed samples: 20815360 | consumed tokens: 42629857280 | elapsed time per iteration (s): 0.43 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.924138E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.485 | TFLOPs: 31.56 | +7: iteration 81320/ 173500 | consumed samples: 20817920 | consumed tokens: 42635100160 | elapsed time per iteration (s): 0.43 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.959384E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.778 | TFLOPs: 31.52 | +7: iteration 81330/ 173500 | consumed samples: 20820480 | consumed tokens: 42640343040 | elapsed time per iteration (s): 0.43 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.951884E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.417 | TFLOPs: 31.40 | +7: iteration 81340/ 173500 | consumed samples: 20823040 | consumed tokens: 42645585920 | elapsed time per iteration (s): 0.42 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.951019E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.482 | TFLOPs: 31.93 | +7: iteration 81350/ 173500 | consumed samples: 20825600 | consumed tokens: 42650828800 | elapsed time per iteration (s): 0.42 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.953643E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.066 | TFLOPs: 31.80 | +7: iteration 81360/ 173500 | consumed samples: 20828160 | consumed tokens: 42656071680 | elapsed time per iteration (s): 0.43 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.950241E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.963 | TFLOPs: 31.48 | +7: iteration 81370/ 173500 | consumed samples: 20830720 | consumed tokens: 42661314560 | elapsed time per iteration (s): 0.42 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.948639E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.564 | TFLOPs: 31.62 | +7: iteration 81380/ 173500 | consumed samples: 20833280 | consumed tokens: 42666557440 | elapsed time per iteration (s): 0.43 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.954309E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.617 | TFLOPs: 31.36 | +7: iteration 81390/ 173500 | consumed samples: 20835840 | consumed tokens: 42671800320 | elapsed time per iteration (s): 0.42 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.955803E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.336 | TFLOPs: 31.71 | +7: iteration 81400/ 173500 | consumed samples: 20838400 | consumed tokens: 42677043200 | elapsed time per iteration (s): 0.42 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.955477E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.714 | TFLOPs: 31.94 | +7: iteration 81410/ 173500 | consumed samples: 20840960 | consumed tokens: 42682286080 | elapsed time per iteration (s): 0.42 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.948924E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.238 | TFLOPs: 31.91 | +7: iteration 81420/ 173500 | consumed samples: 20843520 | consumed tokens: 42687528960 | elapsed time per iteration (s): 0.43 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.953249E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.867 | TFLOPs: 31.58 | +7: iteration 81430/ 173500 | consumed samples: 20846080 | consumed tokens: 42692771840 | elapsed time per iteration (s): 0.42 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.965072E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.807 | TFLOPs: 31.63 | +7: iteration 81440/ 173500 | consumed samples: 20848640 | consumed tokens: 42698014720 | elapsed time per iteration (s): 0.42 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.947022E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.781 | TFLOPs: 31.68 | +7: iteration 81450/ 173500 | consumed samples: 20851200 | consumed tokens: 42703257600 | elapsed time per iteration (s): 0.43 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.952134E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.784 | TFLOPs: 31.52 | +7: iteration 81460/ 173500 | consumed samples: 20853760 | consumed tokens: 42708500480 | elapsed time per iteration (s): 0.43 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.948140E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.841 | TFLOPs: 31.53 | +7: iteration 81470/ 173500 | consumed samples: 20856320 | consumed tokens: 42713743360 | elapsed time per iteration (s): 0.43 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.959199E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.466 | TFLOPs: 31.56 | +7: iteration 81480/ 173500 | consumed samples: 20858880 | consumed tokens: 42718986240 | elapsed time per iteration (s): 0.43 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.957152E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.153 | TFLOPs: 31.54 | +7: iteration 81490/ 173500 | consumed samples: 20861440 | consumed tokens: 42724229120 | elapsed time per iteration (s): 0.42 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.946170E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.771 | TFLOPs: 31.78 | +7: iteration 81500/ 173500 | consumed samples: 20864000 | consumed tokens: 42729472000 | elapsed time per iteration (s): 0.43 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.951974E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.123 | TFLOPs: 31.59 | +7: iteration 81510/ 173500 | consumed samples: 20866560 | consumed tokens: 42734714880 | elapsed time per iteration (s): 0.43 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.954004E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.997 | TFLOPs: 31.59 | +7: iteration 81520/ 173500 | consumed samples: 20869120 | consumed tokens: 42739957760 | elapsed time per iteration (s): 0.42 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.971732E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.399 | TFLOPs: 31.61 | +7: iteration 81530/ 173500 | consumed samples: 20871680 | consumed tokens: 42745200640 | elapsed time per iteration (s): 0.42 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.936083E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.073 | TFLOPs: 31.64 | +7: iteration 81540/ 173500 | consumed samples: 20874240 | consumed tokens: 42750443520 | elapsed time per iteration (s): 0.43 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.948771E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.107 | TFLOPs: 31.12 | +7: iteration 81550/ 173500 | consumed samples: 20876800 | consumed tokens: 42755686400 | elapsed time per iteration (s): 0.42 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.959757E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.244 | TFLOPs: 31.91 | +7: iteration 81560/ 173500 | consumed samples: 20879360 | consumed tokens: 42760929280 | elapsed time per iteration (s): 0.43 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.950980E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.955 | TFLOPs: 31.48 | +7: iteration 81570/ 173500 | consumed samples: 20881920 | consumed tokens: 42766172160 | elapsed time per iteration (s): 0.42 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.936643E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.139 | TFLOPs: 31.91 | +7: iteration 81580/ 173500 | consumed samples: 20884480 | consumed tokens: 42771415040 | elapsed time per iteration (s): 0.43 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.952689E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.118 | TFLOPs: 31.33 | +7: iteration 81590/ 173500 | consumed samples: 20887040 | consumed tokens: 42776657920 | elapsed time per iteration (s): 0.43 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.949522E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.431 | TFLOPs: 31.56 | +7: iteration 81600/ 173500 | consumed samples: 20889600 | consumed tokens: 42781900800 | elapsed time per iteration (s): 0.43 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.945548E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.455 | TFLOPs: 31.40 | +7: iteration 81610/ 173500 | consumed samples: 20892160 | consumed tokens: 42787143680 | elapsed time per iteration (s): 0.43 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.951283E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.381 | TFLOPs: 31.24 | +7: iteration 81620/ 173500 | consumed samples: 20894720 | consumed tokens: 42792386560 | elapsed time per iteration (s): 0.42 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.944290E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.032 | TFLOPs: 31.69 | +7: iteration 81630/ 173500 | consumed samples: 20897280 | consumed tokens: 42797629440 | elapsed time per iteration (s): 0.43 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.953645E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.211 | TFLOPs: 31.07 | +7: iteration 81640/ 173500 | consumed samples: 20899840 | consumed tokens: 42802872320 | elapsed time per iteration (s): 0.43 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.945722E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.493 | TFLOPs: 31.30 | +7: iteration 81650/ 173500 | consumed samples: 20902400 | consumed tokens: 42808115200 | elapsed time per iteration (s): 0.42 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.953164E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.935 | TFLOPs: 31.79 | +7: iteration 81660/ 173500 | consumed samples: 20904960 | consumed tokens: 42813358080 | elapsed time per iteration (s): 0.42 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.944582E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.000 | TFLOPs: 31.64 | +7: iteration 81670/ 173500 | consumed samples: 20907520 | consumed tokens: 42818600960 | elapsed time per iteration (s): 0.43 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.948851E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.687 | TFLOPs: 31.41 | +7: iteration 81680/ 173500 | consumed samples: 20910080 | consumed tokens: 42823843840 | elapsed time per iteration (s): 0.42 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.964065E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.936 | TFLOPs: 31.79 | +7: iteration 81690/ 173500 | consumed samples: 20912640 | consumed tokens: 42829086720 | elapsed time per iteration (s): 0.42 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.945766E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.003 | TFLOPs: 31.74 | +7: iteration 81700/ 173500 | consumed samples: 20915200 | consumed tokens: 42834329600 | elapsed time per iteration (s): 0.42 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.958830E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.104 | TFLOPs: 31.70 | +7: iteration 81710/ 173500 | consumed samples: 20917760 | consumed tokens: 42839572480 | elapsed time per iteration (s): 0.42 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.961554E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.424 | TFLOPs: 31.82 | +7: iteration 81720/ 173500 | consumed samples: 20920320 | consumed tokens: 42844815360 | elapsed time per iteration (s): 0.42 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.942416E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.188 | TFLOPs: 31.81 | +7: iteration 81730/ 173500 | consumed samples: 20922880 | consumed tokens: 42850058240 | elapsed time per iteration (s): 0.42 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.957238E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.863 | TFLOPs: 31.89 | +7: iteration 81740/ 173500 | consumed samples: 20925440 | consumed tokens: 42855301120 | elapsed time per iteration (s): 0.42 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.950102E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.016 | TFLOPs: 31.80 | +7: iteration 81750/ 173500 | consumed samples: 20928000 | consumed tokens: 42860544000 | elapsed time per iteration (s): 0.42 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.941937E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.669 | TFLOPs: 31.88 | +7: iteration 81760/ 173500 | consumed samples: 20930560 | consumed tokens: 42865786880 | elapsed time per iteration (s): 0.42 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.949239E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.391 | TFLOPs: 31.87 | +7: iteration 81770/ 173500 | consumed samples: 20933120 | consumed tokens: 42871029760 | elapsed time per iteration (s): 0.42 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.953882E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.633 | TFLOPs: 31.72 | +7: iteration 81780/ 173500 | consumed samples: 20935680 | consumed tokens: 42876272640 | elapsed time per iteration (s): 0.43 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.955503E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.130 | TFLOPs: 31.59 | +7: iteration 81790/ 173500 | consumed samples: 20938240 | consumed tokens: 42881515520 | elapsed time per iteration (s): 0.43 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.953251E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.842 | TFLOPs: 31.42 | +7: iteration 81800/ 173500 | consumed samples: 20940800 | consumed tokens: 42886758400 | elapsed time per iteration (s): 0.42 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.958338E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.110 | TFLOPs: 31.64 | +7: iteration 81810/ 173500 | consumed samples: 20943360 | consumed tokens: 42892001280 | elapsed time per iteration (s): 0.42 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.964835E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.971 | TFLOPs: 31.85 | +7: iteration 81820/ 173500 | consumed samples: 20945920 | consumed tokens: 42897244160 | elapsed time per iteration (s): 0.42 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.964002E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.930 | TFLOPs: 31.84 | +7: iteration 81830/ 173500 | consumed samples: 20948480 | consumed tokens: 42902487040 | elapsed time per iteration (s): 0.43 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.949239E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.094 | TFLOPs: 31.54 | +7: iteration 81840/ 173500 | consumed samples: 20951040 | consumed tokens: 42907729920 | elapsed time per iteration (s): 0.43 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.949197E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.144 | TFLOPs: 31.49 | +7: iteration 81850/ 173500 | consumed samples: 20953600 | consumed tokens: 42912972800 | elapsed time per iteration (s): 0.42 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.958423E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.608 | TFLOPs: 31.83 | +7: iteration 81860/ 173500 | consumed samples: 20956160 | consumed tokens: 42918215680 | elapsed time per iteration (s): 0.43 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.942097E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.671 | TFLOPs: 31.57 | +7: iteration 81870/ 173500 | consumed samples: 20958720 | consumed tokens: 42923458560 | elapsed time per iteration (s): 0.43 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.952243E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.065 | TFLOPs: 31.59 | +7: iteration 81880/ 173500 | consumed samples: 20961280 | consumed tokens: 42928701440 | elapsed time per iteration (s): 0.43 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.951130E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.815 | TFLOPs: 31.58 | +7: iteration 81890/ 173500 | consumed samples: 20963840 | consumed tokens: 42933944320 | elapsed time per iteration (s): 0.42 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.958141E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.859 | TFLOPs: 31.79 | +7: iteration 81900/ 173500 | consumed samples: 20966400 | consumed tokens: 42939187200 | elapsed time per iteration (s): 0.42 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.944689E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.371 | TFLOPs: 31.87 | +7: iteration 81910/ 173500 | consumed samples: 20968960 | consumed tokens: 42944430080 | elapsed time per iteration (s): 0.43 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.952186E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.610 | TFLOPs: 31.25 | +7: iteration 81920/ 173500 | consumed samples: 20971520 | consumed tokens: 42949672960 | elapsed time per iteration (s): 0.43 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.947924E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.697 | TFLOPs: 31.15 | +7: iteration 81930/ 173500 | consumed samples: 20974080 | consumed tokens: 42954915840 | elapsed time per iteration (s): 0.42 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.939389E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.175 | TFLOPs: 31.70 | +7: iteration 81940/ 173500 | consumed samples: 20976640 | consumed tokens: 42960158720 | elapsed time per iteration (s): 0.43 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.950621E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.847 | TFLOPs: 31.21 | +7: iteration 81950/ 173500 | consumed samples: 20979200 | consumed tokens: 42965401600 | elapsed time per iteration (s): 0.42 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.943184E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.227 | TFLOPs: 31.65 | +7: iteration 81960/ 173500 | consumed samples: 20981760 | consumed tokens: 42970644480 | elapsed time per iteration (s): 0.42 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.944855E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.470 | TFLOPs: 31.61 | +7: iteration 81970/ 173500 | consumed samples: 20984320 | consumed tokens: 42975887360 | elapsed time per iteration (s): 0.42 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.940647E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.742 | TFLOPs: 31.89 | +7: iteration 81980/ 173500 | consumed samples: 20986880 | consumed tokens: 42981130240 | elapsed time per iteration (s): 0.43 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.963241E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.758 | TFLOPs: 31.57 | +7: iteration 81990/ 173500 | consumed samples: 20989440 | consumed tokens: 42986373120 | elapsed time per iteration (s): 0.42 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.963371E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.846 | TFLOPs: 31.63 | +0: [2023-03-17 08:55:01,286] [INFO] [logging.py:68:log_dist] [Rank 0] step=82000, skipped=0, lr=[0.00011923116875818059, 0.00011923116875818059, 0.00011923116875818059], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 82000/ 173500 | consumed samples: 20992000 | consumed tokens: 42991616000 | elapsed time per iteration (s): 0.43 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.948826E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.276 | TFLOPs: 31.60 | +0: steps: 82000 loss: 2.9590 iter time (s): 0.423 samples/sec: 605.736 +7: iteration 82010/ 173500 | consumed samples: 20994560 | consumed tokens: 42996858880 | elapsed time per iteration (s): 0.42 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.959068E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.707 | TFLOPs: 31.62 | +7: iteration 82020/ 173500 | consumed samples: 20997120 | consumed tokens: 43002101760 | elapsed time per iteration (s): 0.43 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.945456E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.483 | TFLOPs: 31.24 | +7: iteration 82030/ 173500 | consumed samples: 20999680 | consumed tokens: 43007344640 | elapsed time per iteration (s): 0.42 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.941829E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.199 | TFLOPs: 31.91 | +7: iteration 82040/ 173500 | consumed samples: 21002240 | consumed tokens: 43012587520 | elapsed time per iteration (s): 0.43 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.961322E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.211 | TFLOPs: 31.60 | +7: iteration 82050/ 173500 | consumed samples: 21004800 | consumed tokens: 43017830400 | elapsed time per iteration (s): 0.42 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.961260E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.349 | TFLOPs: 31.76 | +7: iteration 82060/ 173500 | consumed samples: 21007360 | consumed tokens: 43023073280 | elapsed time per iteration (s): 0.43 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.960279E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.738 | TFLOPs: 31.47 | +7: iteration 82070/ 173500 | consumed samples: 21009920 | consumed tokens: 43028316160 | elapsed time per iteration (s): 0.42 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.970180E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.951 | TFLOPs: 31.69 | +7: iteration 82080/ 173500 | consumed samples: 21012480 | consumed tokens: 43033559040 | elapsed time per iteration (s): 0.42 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.942957E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.188 | TFLOPs: 31.75 | +7: iteration 82090/ 173500 | consumed samples: 21015040 | consumed tokens: 43038801920 | elapsed time per iteration (s): 0.44 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.956127E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.900 | TFLOPs: 30.22 | +7: iteration 82100/ 173500 | consumed samples: 21017600 | consumed tokens: 43044044800 | elapsed time per iteration (s): 0.43 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.955675E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.047 | TFLOPs: 31.48 | +7: iteration 82110/ 173500 | consumed samples: 21020160 | consumed tokens: 43049287680 | elapsed time per iteration (s): 0.44 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.960577E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.133 | TFLOPs: 30.86 | +7: iteration 82120/ 173500 | consumed samples: 21022720 | consumed tokens: 43054530560 | elapsed time per iteration (s): 0.44 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.941944E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.521 | TFLOPs: 30.30 | +7: iteration 82130/ 173500 | consumed samples: 21025280 | consumed tokens: 43059773440 | elapsed time per iteration (s): 0.43 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.961370E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.534 | TFLOPs: 31.30 | +7: iteration 82140/ 173500 | consumed samples: 21027840 | consumed tokens: 43065016320 | elapsed time per iteration (s): 0.44 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.957200E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.093 | TFLOPs: 30.49 | +7: iteration 82150/ 173500 | consumed samples: 21030400 | consumed tokens: 43070259200 | elapsed time per iteration (s): 0.43 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.941829E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.864 | TFLOPs: 31.26 | +7: iteration 82160/ 173500 | consumed samples: 21032960 | consumed tokens: 43075502080 | elapsed time per iteration (s): 0.43 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.952696E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.172 | TFLOPs: 31.07 | +7: iteration 82170/ 173500 | consumed samples: 21035520 | consumed tokens: 43080744960 | elapsed time per iteration (s): 0.44 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.931465E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.548 | TFLOPs: 30.57 | +7: iteration 82180/ 173500 | consumed samples: 21038080 | consumed tokens: 43085987840 | elapsed time per iteration (s): 0.43 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.953254E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.773 | TFLOPs: 31.57 | +7: iteration 82190/ 173500 | consumed samples: 21040640 | consumed tokens: 43091230720 | elapsed time per iteration (s): 0.43 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.946278E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.228 | TFLOPs: 31.60 | +7: iteration 82200/ 173500 | consumed samples: 21043200 | consumed tokens: 43096473600 | elapsed time per iteration (s): 0.43 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.955964E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.049 | TFLOPs: 31.54 | +7: iteration 82210/ 173500 | consumed samples: 21045760 | consumed tokens: 43101716480 | elapsed time per iteration (s): 0.43 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.950439E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.149 | TFLOPs: 31.59 | +7: iteration 82220/ 173500 | consumed samples: 21048320 | consumed tokens: 43106959360 | elapsed time per iteration (s): 0.42 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.947789E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.661 | TFLOPs: 31.73 | +7: iteration 82230/ 173500 | consumed samples: 21050880 | consumed tokens: 43112202240 | elapsed time per iteration (s): 0.43 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.953823E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.602 | TFLOPs: 31.25 | +7: iteration 82240/ 173500 | consumed samples: 21053440 | consumed tokens: 43117445120 | elapsed time per iteration (s): 0.43 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.947292E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.191 | TFLOPs: 31.23 | +7: iteration 82250/ 173500 | consumed samples: 21056000 | consumed tokens: 43122688000 | elapsed time per iteration (s): 0.44 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.955211E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.808 | TFLOPs: 30.42 | +7: iteration 82260/ 173500 | consumed samples: 21058560 | consumed tokens: 43127930880 | elapsed time per iteration (s): 0.44 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.940831E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.429 | TFLOPs: 30.72 | +7: iteration 82270/ 173500 | consumed samples: 21061120 | consumed tokens: 43133173760 | elapsed time per iteration (s): 0.44 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.956316E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.240 | TFLOPs: 30.55 | +7: iteration 82280/ 173500 | consumed samples: 21063680 | consumed tokens: 43138416640 | elapsed time per iteration (s): 0.43 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.957222E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.593 | TFLOPs: 31.04 | +7: iteration 82290/ 173500 | consumed samples: 21066240 | consumed tokens: 43143659520 | elapsed time per iteration (s): 0.43 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.963063E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.260 | TFLOPs: 30.97 | +7: iteration 82300/ 173500 | consumed samples: 21068800 | consumed tokens: 43148902400 | elapsed time per iteration (s): 0.44 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.941618E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.911 | TFLOPs: 30.32 | +7: iteration 82310/ 173500 | consumed samples: 21071360 | consumed tokens: 43154145280 | elapsed time per iteration (s): 0.42 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.951069E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.945 | TFLOPs: 31.69 | +7: iteration 82320/ 173500 | consumed samples: 21073920 | consumed tokens: 43159388160 | elapsed time per iteration (s): 0.43 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.954254E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.523 | TFLOPs: 30.88 | +7: iteration 82330/ 173500 | consumed samples: 21076480 | consumed tokens: 43164631040 | elapsed time per iteration (s): 0.44 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.948620E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.318 | TFLOPs: 30.76 | +7: iteration 82340/ 173500 | consumed samples: 21079040 | consumed tokens: 43169873920 | elapsed time per iteration (s): 0.43 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.948426E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.597 | TFLOPs: 31.09 | +7: iteration 82350/ 173500 | consumed samples: 21081600 | consumed tokens: 43175116800 | elapsed time per iteration (s): 0.43 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.951173E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.634 | TFLOPs: 31.51 | +7: iteration 82360/ 173500 | consumed samples: 21084160 | consumed tokens: 43180359680 | elapsed time per iteration (s): 0.43 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.955030E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.823 | TFLOPs: 31.26 | +7: iteration 82370/ 173500 | consumed samples: 21086720 | consumed tokens: 43185602560 | elapsed time per iteration (s): 0.44 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.933768E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.076 | TFLOPs: 30.28 | +7: iteration 82380/ 173500 | consumed samples: 21089280 | consumed tokens: 43190845440 | elapsed time per iteration (s): 0.43 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.960341E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.936 | TFLOPs: 31.11 | +7: iteration 82390/ 173500 | consumed samples: 21091840 | consumed tokens: 43196088320 | elapsed time per iteration (s): 0.43 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.964202E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.111 | TFLOPs: 31.07 | +7: iteration 82400/ 173500 | consumed samples: 21094400 | consumed tokens: 43201331200 | elapsed time per iteration (s): 0.44 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.966153E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.648 | TFLOPs: 30.83 | +7: iteration 82410/ 173500 | consumed samples: 21096960 | consumed tokens: 43206574080 | elapsed time per iteration (s): 0.44 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.954091E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.908 | TFLOPs: 30.64 | +7: iteration 82420/ 173500 | consumed samples: 21099520 | consumed tokens: 43211816960 | elapsed time per iteration (s): 0.42 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.960990E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.670 | TFLOPs: 31.62 | +7: iteration 82430/ 173500 | consumed samples: 21102080 | consumed tokens: 43217059840 | elapsed time per iteration (s): 0.43 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.941257E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.639 | TFLOPs: 31.04 | +7: iteration 82440/ 173500 | consumed samples: 21104640 | consumed tokens: 43222302720 | elapsed time per iteration (s): 0.44 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.974345E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.161 | TFLOPs: 30.86 | +7: iteration 82450/ 173500 | consumed samples: 21107200 | consumed tokens: 43227545600 | elapsed time per iteration (s): 0.43 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.952314E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.382 | TFLOPs: 31.08 | +7: iteration 82460/ 173500 | consumed samples: 21109760 | consumed tokens: 43232788480 | elapsed time per iteration (s): 0.43 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.959743E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.342 | TFLOPs: 31.08 | +7: iteration 82470/ 173500 | consumed samples: 21112320 | consumed tokens: 43238031360 | elapsed time per iteration (s): 0.43 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.937155E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.583 | TFLOPs: 31.14 | +7: iteration 82480/ 173500 | consumed samples: 21114880 | consumed tokens: 43243274240 | elapsed time per iteration (s): 0.44 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.943247E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.389 | TFLOPs: 30.24 | +7: iteration 82490/ 173500 | consumed samples: 21117440 | consumed tokens: 43248517120 | elapsed time per iteration (s): 0.44 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.942563E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.928 | TFLOPs: 30.85 | +7: iteration 82500/ 173500 | consumed samples: 21120000 | consumed tokens: 43253760000 | elapsed time per iteration (s): 0.44 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.962223E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.395 | TFLOPs: 30.56 | +7: iteration 82510/ 173500 | consumed samples: 21122560 | consumed tokens: 43259002880 | elapsed time per iteration (s): 0.43 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.942762E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.180 | TFLOPs: 31.28 | +7: iteration 82520/ 173500 | consumed samples: 21125120 | consumed tokens: 43264245760 | elapsed time per iteration (s): 0.43 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.949267E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.555 | TFLOPs: 31.14 | +7: iteration 82530/ 173500 | consumed samples: 21127680 | consumed tokens: 43269488640 | elapsed time per iteration (s): 0.43 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.952814E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.691 | TFLOPs: 31.20 | +7: iteration 82540/ 173500 | consumed samples: 21130240 | consumed tokens: 43274731520 | elapsed time per iteration (s): 0.43 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.963465E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.698 | TFLOPs: 31.20 | +7: iteration 82550/ 173500 | consumed samples: 21132800 | consumed tokens: 43279974400 | elapsed time per iteration (s): 0.43 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.950463E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.437 | TFLOPs: 31.08 | +7: iteration 82560/ 173500 | consumed samples: 21135360 | consumed tokens: 43285217280 | elapsed time per iteration (s): 0.43 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.952472E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.486 | TFLOPs: 31.30 | +7: iteration 82570/ 173500 | consumed samples: 21137920 | consumed tokens: 43290460160 | elapsed time per iteration (s): 0.43 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.970218E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.365 | TFLOPs: 31.45 | +7: iteration 82580/ 173500 | consumed samples: 21140480 | consumed tokens: 43295703040 | elapsed time per iteration (s): 0.43 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.949941E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.782 | TFLOPs: 31.47 | +7: iteration 82590/ 173500 | consumed samples: 21143040 | consumed tokens: 43300945920 | elapsed time per iteration (s): 0.44 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.949424E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.040 | TFLOPs: 30.38 | +7: iteration 82600/ 173500 | consumed samples: 21145600 | consumed tokens: 43306188800 | elapsed time per iteration (s): 0.45 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.947029E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.155 | TFLOPs: 30.07 | +7: iteration 82610/ 173500 | consumed samples: 21148160 | consumed tokens: 43311431680 | elapsed time per iteration (s): 0.42 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.949735E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.403 | TFLOPs: 31.61 | +7: iteration 82620/ 173500 | consumed samples: 21150720 | consumed tokens: 43316674560 | elapsed time per iteration (s): 0.44 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.945094E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.771 | TFLOPs: 30.58 | +7: iteration 82630/ 173500 | consumed samples: 21153280 | consumed tokens: 43321917440 | elapsed time per iteration (s): 0.43 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.929649E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.202 | TFLOPs: 31.33 | +7: iteration 82640/ 173500 | consumed samples: 21155840 | consumed tokens: 43327160320 | elapsed time per iteration (s): 0.43 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.961985E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.214 | TFLOPs: 30.92 | +7: iteration 82650/ 173500 | consumed samples: 21158400 | consumed tokens: 43332403200 | elapsed time per iteration (s): 0.44 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.955907E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.006 | TFLOPs: 30.33 | +7: iteration 82660/ 173500 | consumed samples: 21160960 | consumed tokens: 43337646080 | elapsed time per iteration (s): 0.44 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.953430E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.579 | TFLOPs: 30.78 | +7: iteration 82670/ 173500 | consumed samples: 21163520 | consumed tokens: 43342888960 | elapsed time per iteration (s): 0.43 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.956987E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.531 | TFLOPs: 31.40 | +7: iteration 82680/ 173500 | consumed samples: 21166080 | consumed tokens: 43348131840 | elapsed time per iteration (s): 0.43 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.939482E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.462 | TFLOPs: 31.03 | +7: iteration 82690/ 173500 | consumed samples: 21168640 | consumed tokens: 43353374720 | elapsed time per iteration (s): 0.44 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.945235E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.354 | TFLOPs: 30.87 | +7: iteration 82700/ 173500 | consumed samples: 21171200 | consumed tokens: 43358617600 | elapsed time per iteration (s): 0.43 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.944081E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.362 | TFLOPs: 30.98 | +7: iteration 82710/ 173500 | consumed samples: 21173760 | consumed tokens: 43363860480 | elapsed time per iteration (s): 0.43 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.956854E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.152 | TFLOPs: 31.17 | +7: iteration 82720/ 173500 | consumed samples: 21176320 | consumed tokens: 43369103360 | elapsed time per iteration (s): 0.43 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.963039E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.745 | TFLOPs: 31.31 | +7: iteration 82730/ 173500 | consumed samples: 21178880 | consumed tokens: 43374346240 | elapsed time per iteration (s): 0.43 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.942214E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.239 | TFLOPs: 31.60 | +7: iteration 82740/ 173500 | consumed samples: 21181440 | consumed tokens: 43379589120 | elapsed time per iteration (s): 0.43 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.966249E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.357 | TFLOPs: 31.18 | +7: iteration 82750/ 173500 | consumed samples: 21184000 | consumed tokens: 43384832000 | elapsed time per iteration (s): 0.43 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.948984E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.912 | TFLOPs: 31.11 | +7: iteration 82760/ 173500 | consumed samples: 21186560 | consumed tokens: 43390074880 | elapsed time per iteration (s): 0.44 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.960818E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.549 | TFLOPs: 30.67 | +7: iteration 82770/ 173500 | consumed samples: 21189120 | consumed tokens: 43395317760 | elapsed time per iteration (s): 0.43 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.956006E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.273 | TFLOPs: 31.39 | +7: iteration 82780/ 173500 | consumed samples: 21191680 | consumed tokens: 43400560640 | elapsed time per iteration (s): 0.44 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.953127E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.181 | TFLOPs: 30.34 | +7: iteration 82790/ 173500 | consumed samples: 21194240 | consumed tokens: 43405803520 | elapsed time per iteration (s): 0.43 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.946599E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.923 | TFLOPs: 31.53 | +7: iteration 82800/ 173500 | consumed samples: 21196800 | consumed tokens: 43411046400 | elapsed time per iteration (s): 0.43 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.948352E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.625 | TFLOPs: 31.25 | +7: iteration 82810/ 173500 | consumed samples: 21199360 | consumed tokens: 43416289280 | elapsed time per iteration (s): 0.44 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.955606E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.733 | TFLOPs: 30.63 | +7: iteration 82820/ 173500 | consumed samples: 21201920 | consumed tokens: 43421532160 | elapsed time per iteration (s): 0.43 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.958599E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.040 | TFLOPs: 30.96 | +7: iteration 82830/ 173500 | consumed samples: 21204480 | consumed tokens: 43426775040 | elapsed time per iteration (s): 0.45 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.940253E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.838 | TFLOPs: 29.64 | +7: iteration 82840/ 173500 | consumed samples: 21207040 | consumed tokens: 43432017920 | elapsed time per iteration (s): 0.43 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.947103E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.812 | TFLOPs: 30.89 | +7: iteration 82850/ 173500 | consumed samples: 21209600 | consumed tokens: 43437260800 | elapsed time per iteration (s): 0.44 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.945727E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.039 | TFLOPs: 30.70 | +7: iteration 82860/ 173500 | consumed samples: 21212160 | consumed tokens: 43442503680 | elapsed time per iteration (s): 0.43 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.958820E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.829 | TFLOPs: 30.95 | +7: iteration 82870/ 173500 | consumed samples: 21214720 | consumed tokens: 43447746560 | elapsed time per iteration (s): 0.42 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.949605E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.799 | TFLOPs: 31.68 | +7: iteration 82880/ 173500 | consumed samples: 21217280 | consumed tokens: 43452989440 | elapsed time per iteration (s): 0.43 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.941845E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.305 | TFLOPs: 31.08 | +7: iteration 82890/ 173500 | consumed samples: 21219840 | consumed tokens: 43458232320 | elapsed time per iteration (s): 0.44 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.948697E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.352 | TFLOPs: 30.76 | +7: iteration 82900/ 173500 | consumed samples: 21222400 | consumed tokens: 43463475200 | elapsed time per iteration (s): 0.43 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.966940E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.938 | TFLOPs: 31.43 | +7: iteration 82910/ 173500 | consumed samples: 21224960 | consumed tokens: 43468718080 | elapsed time per iteration (s): 0.43 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.959486E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.127 | TFLOPs: 31.12 | +7: iteration 82920/ 173500 | consumed samples: 21227520 | consumed tokens: 43473960960 | elapsed time per iteration (s): 0.43 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.951246E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.149 | TFLOPs: 31.12 | +7: iteration 82930/ 173500 | consumed samples: 21230080 | consumed tokens: 43479203840 | elapsed time per iteration (s): 0.43 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.956165E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.929 | TFLOPs: 30.90 | +7: iteration 82940/ 173500 | consumed samples: 21232640 | consumed tokens: 43484446720 | elapsed time per iteration (s): 0.43 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.948676E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.307 | TFLOPs: 31.13 | +7: iteration 82950/ 173500 | consumed samples: 21235200 | consumed tokens: 43489689600 | elapsed time per iteration (s): 0.46 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.951664E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.378 | TFLOPs: 29.51 | +7: iteration 82960/ 173500 | consumed samples: 21237760 | consumed tokens: 43494932480 | elapsed time per iteration (s): 0.43 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.941929E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.682 | TFLOPs: 30.94 | +7: iteration 82970/ 173500 | consumed samples: 21240320 | consumed tokens: 43500175360 | elapsed time per iteration (s): 0.43 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.947701E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.873 | TFLOPs: 31.00 | +7: iteration 82980/ 173500 | consumed samples: 21242880 | consumed tokens: 43505418240 | elapsed time per iteration (s): 0.43 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.945696E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.334 | TFLOPs: 31.45 | +7: iteration 82990/ 173500 | consumed samples: 21245440 | consumed tokens: 43510661120 | elapsed time per iteration (s): 0.43 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.966558E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.483 | TFLOPs: 30.93 | +7: iteration 83000/ 173500 | consumed samples: 21248000 | consumed tokens: 43515904000 | elapsed time per iteration (s): 0.44 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.958170E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.577 | TFLOPs: 30.72 | +7: iteration 83010/ 173500 | consumed samples: 21250560 | consumed tokens: 43521146880 | elapsed time per iteration (s): 0.43 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.952512E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.040 | TFLOPs: 31.22 | +7: iteration 83020/ 173500 | consumed samples: 21253120 | consumed tokens: 43526389760 | elapsed time per iteration (s): 0.43 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.945651E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.002 | TFLOPs: 31.17 | +7: iteration 83030/ 173500 | consumed samples: 21255680 | consumed tokens: 43531632640 | elapsed time per iteration (s): 0.42 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.953996E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.006 | TFLOPs: 31.64 | +7: iteration 83040/ 173500 | consumed samples: 21258240 | consumed tokens: 43536875520 | elapsed time per iteration (s): 0.43 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.954393E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.109 | TFLOPs: 31.49 | +7: iteration 83050/ 173500 | consumed samples: 21260800 | consumed tokens: 43542118400 | elapsed time per iteration (s): 0.43 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.967469E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.676 | TFLOPs: 30.89 | +7: iteration 83060/ 173500 | consumed samples: 21263360 | consumed tokens: 43547361280 | elapsed time per iteration (s): 0.43 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.942927E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.389 | TFLOPs: 31.40 | +7: iteration 83070/ 173500 | consumed samples: 21265920 | consumed tokens: 43552604160 | elapsed time per iteration (s): 0.43 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.944688E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.854 | TFLOPs: 31.47 | +7: iteration 83080/ 173500 | consumed samples: 21268480 | consumed tokens: 43557847040 | elapsed time per iteration (s): 0.42 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.945635E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.053 | TFLOPs: 31.69 | +7: iteration 83090/ 173500 | consumed samples: 21271040 | consumed tokens: 43563089920 | elapsed time per iteration (s): 0.45 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.954626E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.258 | TFLOPs: 30.08 | +7: iteration 83100/ 173500 | consumed samples: 21273600 | consumed tokens: 43568332800 | elapsed time per iteration (s): 0.42 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.944160E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.847 | TFLOPs: 31.84 | +7: iteration 83110/ 173500 | consumed samples: 21276160 | consumed tokens: 43573575680 | elapsed time per iteration (s): 0.43 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.945754E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.016 | TFLOPs: 31.11 | +7: iteration 83120/ 173500 | consumed samples: 21278720 | consumed tokens: 43578818560 | elapsed time per iteration (s): 0.44 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.938911E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.810 | TFLOPs: 30.32 | +7: iteration 83130/ 173500 | consumed samples: 21281280 | consumed tokens: 43584061440 | elapsed time per iteration (s): 0.44 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.954266E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.853 | TFLOPs: 30.74 | +7: iteration 83140/ 173500 | consumed samples: 21283840 | consumed tokens: 43589304320 | elapsed time per iteration (s): 0.44 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.958528E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.390 | TFLOPs: 30.82 | +7: iteration 83150/ 173500 | consumed samples: 21286400 | consumed tokens: 43594547200 | elapsed time per iteration (s): 0.42 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.956825E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.132 | TFLOPs: 31.86 | +7: iteration 83160/ 173500 | consumed samples: 21288960 | consumed tokens: 43599790080 | elapsed time per iteration (s): 0.43 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.935641E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.521 | TFLOPs: 31.46 | +7: iteration 83170/ 173500 | consumed samples: 21291520 | consumed tokens: 43605032960 | elapsed time per iteration (s): 0.42 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.945895E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.684 | TFLOPs: 31.78 | +7: iteration 83180/ 173500 | consumed samples: 21294080 | consumed tokens: 43610275840 | elapsed time per iteration (s): 0.43 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.943428E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.501 | TFLOPs: 31.30 | +7: iteration 83190/ 173500 | consumed samples: 21296640 | consumed tokens: 43615518720 | elapsed time per iteration (s): 0.42 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.951685E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.736 | TFLOPs: 31.78 | +7: iteration 83200/ 173500 | consumed samples: 21299200 | consumed tokens: 43620761600 | elapsed time per iteration (s): 0.42 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.946077E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.019 | TFLOPs: 31.95 | +7: iteration 83210/ 173500 | consumed samples: 21301760 | consumed tokens: 43626004480 | elapsed time per iteration (s): 0.43 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.950079E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.042 | TFLOPs: 31.59 | +7: iteration 83220/ 173500 | consumed samples: 21304320 | consumed tokens: 43631247360 | elapsed time per iteration (s): 0.45 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.948433E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.202 | TFLOPs: 29.66 | +7: iteration 83230/ 173500 | consumed samples: 21306880 | consumed tokens: 43636490240 | elapsed time per iteration (s): 0.46 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.951901E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.499 | TFLOPs: 28.99 | +7: iteration 83240/ 173500 | consumed samples: 21309440 | consumed tokens: 43641733120 | elapsed time per iteration (s): 0.44 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.950937E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.354 | TFLOPs: 30.56 | +7: iteration 83250/ 173500 | consumed samples: 21312000 | consumed tokens: 43646976000 | elapsed time per iteration (s): 0.47 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.944408E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 545.486 | TFLOPs: 28.62 | +7: iteration 83260/ 173500 | consumed samples: 21314560 | consumed tokens: 43652218880 | elapsed time per iteration (s): 0.45 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.943573E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.672 | TFLOPs: 29.68 | +7: iteration 83270/ 173500 | consumed samples: 21317120 | consumed tokens: 43657461760 | elapsed time per iteration (s): 0.42 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.959112E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.856 | TFLOPs: 31.89 | +7: iteration 83280/ 173500 | consumed samples: 21319680 | consumed tokens: 43662704640 | elapsed time per iteration (s): 0.45 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.954328E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.875 | TFLOPs: 29.69 | +7: iteration 83290/ 173500 | consumed samples: 21322240 | consumed tokens: 43667947520 | elapsed time per iteration (s): 0.42 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.931625E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.770 | TFLOPs: 31.89 | +7: iteration 83300/ 173500 | consumed samples: 21324800 | consumed tokens: 43673190400 | elapsed time per iteration (s): 0.44 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.936080E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.121 | TFLOPs: 30.75 | +7: iteration 83310/ 173500 | consumed samples: 21327360 | consumed tokens: 43678433280 | elapsed time per iteration (s): 0.43 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.952118E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.914 | TFLOPs: 31.32 | +7: iteration 83320/ 173500 | consumed samples: 21329920 | consumed tokens: 43683676160 | elapsed time per iteration (s): 0.44 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.953457E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.566 | TFLOPs: 30.25 | +7: iteration 83330/ 173500 | consumed samples: 21332480 | consumed tokens: 43688919040 | elapsed time per iteration (s): 0.44 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.944481E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.793 | TFLOPs: 30.84 | +7: iteration 83340/ 173500 | consumed samples: 21335040 | consumed tokens: 43694161920 | elapsed time per iteration (s): 0.43 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.941411E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.839 | TFLOPs: 31.05 | +7: iteration 83350/ 173500 | consumed samples: 21337600 | consumed tokens: 43699404800 | elapsed time per iteration (s): 0.48 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.935125E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.214 | TFLOPs: 27.77 | +7: iteration 83360/ 173500 | consumed samples: 21340160 | consumed tokens: 43704647680 | elapsed time per iteration (s): 0.47 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.954223E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 547.772 | TFLOPs: 28.74 | +7: iteration 83370/ 173500 | consumed samples: 21342720 | consumed tokens: 43709890560 | elapsed time per iteration (s): 0.49 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.948099E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 527.065 | TFLOPs: 27.65 | +7: iteration 83380/ 173500 | consumed samples: 21345280 | consumed tokens: 43715133440 | elapsed time per iteration (s): 0.47 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.952295E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 545.449 | TFLOPs: 28.62 | +7: iteration 83390/ 173500 | consumed samples: 21347840 | consumed tokens: 43720376320 | elapsed time per iteration (s): 0.45 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.948124E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.674 | TFLOPs: 30.15 | +7: iteration 83400/ 173500 | consumed samples: 21350400 | consumed tokens: 43725619200 | elapsed time per iteration (s): 0.43 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.933712E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.010 | TFLOPs: 31.11 | +7: iteration 83410/ 173500 | consumed samples: 21352960 | consumed tokens: 43730862080 | elapsed time per iteration (s): 0.42 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.957305E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.404 | TFLOPs: 31.92 | +7: iteration 83420/ 173500 | consumed samples: 21355520 | consumed tokens: 43736104960 | elapsed time per iteration (s): 0.42 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.950695E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.029 | TFLOPs: 31.80 | +7: iteration 83430/ 173500 | consumed samples: 21358080 | consumed tokens: 43741347840 | elapsed time per iteration (s): 0.42 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.965677E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.991 | TFLOPs: 31.64 | +7: iteration 83440/ 173500 | consumed samples: 21360640 | consumed tokens: 43746590720 | elapsed time per iteration (s): 0.42 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.931086E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.047 | TFLOPs: 31.85 | +7: iteration 83450/ 173500 | consumed samples: 21363200 | consumed tokens: 43751833600 | elapsed time per iteration (s): 0.42 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.943808E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.779 | TFLOPs: 31.84 | +7: iteration 83460/ 173500 | consumed samples: 21365760 | consumed tokens: 43757076480 | elapsed time per iteration (s): 0.42 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.940192E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.869 | TFLOPs: 31.74 | +7: iteration 83470/ 173500 | consumed samples: 21368320 | consumed tokens: 43762319360 | elapsed time per iteration (s): 0.43 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.949709E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.297 | TFLOPs: 31.55 | +7: iteration 83480/ 173500 | consumed samples: 21370880 | consumed tokens: 43767562240 | elapsed time per iteration (s): 0.43 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.957712E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.784 | TFLOPs: 31.57 | +7: iteration 83490/ 173500 | consumed samples: 21373440 | consumed tokens: 43772805120 | elapsed time per iteration (s): 0.43 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.953667E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.408 | TFLOPs: 30.98 | +7: iteration 83500/ 173500 | consumed samples: 21376000 | consumed tokens: 43778048000 | elapsed time per iteration (s): 0.42 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.951808E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.088 | TFLOPs: 31.64 | +7: iteration 83510/ 173500 | consumed samples: 21378560 | consumed tokens: 43783290880 | elapsed time per iteration (s): 0.42 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.943808E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.324 | TFLOPs: 31.76 | +7: iteration 83520/ 173500 | consumed samples: 21381120 | consumed tokens: 43788533760 | elapsed time per iteration (s): 0.43 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.949125E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.071 | TFLOPs: 31.48 | +7: iteration 83530/ 173500 | consumed samples: 21383680 | consumed tokens: 43793776640 | elapsed time per iteration (s): 0.42 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.951079E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.628 | TFLOPs: 31.78 | +7: iteration 83540/ 173500 | consumed samples: 21386240 | consumed tokens: 43799019520 | elapsed time per iteration (s): 0.42 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.934865E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.515 | TFLOPs: 31.77 | +7: iteration 83550/ 173500 | consumed samples: 21388800 | consumed tokens: 43804262400 | elapsed time per iteration (s): 0.43 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.938714E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.776 | TFLOPs: 31.57 | +7: iteration 83560/ 173500 | consumed samples: 21391360 | consumed tokens: 43809505280 | elapsed time per iteration (s): 0.42 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.939777E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.630 | TFLOPs: 31.67 | +7: iteration 83570/ 173500 | consumed samples: 21393920 | consumed tokens: 43814748160 | elapsed time per iteration (s): 0.44 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.944373E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.998 | TFLOPs: 30.85 | +7: iteration 83580/ 173500 | consumed samples: 21396480 | consumed tokens: 43819991040 | elapsed time per iteration (s): 0.42 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.939142E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.415 | TFLOPs: 31.82 | +7: iteration 83590/ 173500 | consumed samples: 21399040 | consumed tokens: 43825233920 | elapsed time per iteration (s): 0.43 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.954555E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.261 | TFLOPs: 31.44 | +7: iteration 83600/ 173500 | consumed samples: 21401600 | consumed tokens: 43830476800 | elapsed time per iteration (s): 0.42 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.962338E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.352 | TFLOPs: 32.02 | +7: iteration 83610/ 173500 | consumed samples: 21404160 | consumed tokens: 43835719680 | elapsed time per iteration (s): 0.42 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.943755E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.732 | TFLOPs: 31.99 | +7: iteration 83620/ 173500 | consumed samples: 21406720 | consumed tokens: 43840962560 | elapsed time per iteration (s): 0.43 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.947191E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.529 | TFLOPs: 30.88 | +7: iteration 83630/ 173500 | consumed samples: 21409280 | consumed tokens: 43846205440 | elapsed time per iteration (s): 0.42 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.931580E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.029 | TFLOPs: 31.85 | +7: iteration 83640/ 173500 | consumed samples: 21411840 | consumed tokens: 43851448320 | elapsed time per iteration (s): 0.43 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.963673E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.580 | TFLOPs: 31.51 | +7: iteration 83650/ 173500 | consumed samples: 21414400 | consumed tokens: 43856691200 | elapsed time per iteration (s): 0.42 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.951619E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.117 | TFLOPs: 32.01 | +7: iteration 83660/ 173500 | consumed samples: 21416960 | consumed tokens: 43861934080 | elapsed time per iteration (s): 0.42 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.962017E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.670 | TFLOPs: 31.88 | +7: iteration 83670/ 173500 | consumed samples: 21419520 | consumed tokens: 43867176960 | elapsed time per iteration (s): 0.43 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.945294E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.523 | TFLOPs: 31.56 | +7: iteration 83680/ 173500 | consumed samples: 21422080 | consumed tokens: 43872419840 | elapsed time per iteration (s): 0.43 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.926700E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.514 | TFLOPs: 31.51 | +7: iteration 83690/ 173500 | consumed samples: 21424640 | consumed tokens: 43877662720 | elapsed time per iteration (s): 0.43 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.938889E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.555 | TFLOPs: 31.35 | +7: iteration 83700/ 173500 | consumed samples: 21427200 | consumed tokens: 43882905600 | elapsed time per iteration (s): 0.43 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.953124E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.403 | TFLOPs: 31.55 | +7: iteration 83710/ 173500 | consumed samples: 21429760 | consumed tokens: 43888148480 | elapsed time per iteration (s): 0.43 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.946141E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.285 | TFLOPs: 31.39 | +7: iteration 83720/ 173500 | consumed samples: 21432320 | consumed tokens: 43893391360 | elapsed time per iteration (s): 0.42 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.962265E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.602 | TFLOPs: 31.72 | +7: iteration 83730/ 173500 | consumed samples: 21434880 | consumed tokens: 43898634240 | elapsed time per iteration (s): 0.43 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.951282E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.418 | TFLOPs: 31.45 | +7: iteration 83740/ 173500 | consumed samples: 21437440 | consumed tokens: 43903877120 | elapsed time per iteration (s): 0.42 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.952278E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.058 | TFLOPs: 31.69 | +7: iteration 83750/ 173500 | consumed samples: 21440000 | consumed tokens: 43909120000 | elapsed time per iteration (s): 0.42 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.956915E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.748 | TFLOPs: 31.68 | +7: iteration 83760/ 173500 | consumed samples: 21442560 | consumed tokens: 43914362880 | elapsed time per iteration (s): 0.42 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.946627E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.733 | TFLOPs: 31.99 | +7: iteration 83770/ 173500 | consumed samples: 21445120 | consumed tokens: 43919605760 | elapsed time per iteration (s): 0.42 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.943791E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.989 | TFLOPs: 31.64 | +7: iteration 83780/ 173500 | consumed samples: 21447680 | consumed tokens: 43924848640 | elapsed time per iteration (s): 0.43 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.946978E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.260 | TFLOPs: 31.55 | +7: iteration 83790/ 173500 | consumed samples: 21450240 | consumed tokens: 43930091520 | elapsed time per iteration (s): 0.42 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.945573E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.861 | TFLOPs: 32.00 | +7: iteration 83800/ 173500 | consumed samples: 21452800 | consumed tokens: 43935334400 | elapsed time per iteration (s): 0.42 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.945124E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.764 | TFLOPs: 31.68 | +7: iteration 83810/ 173500 | consumed samples: 21455360 | consumed tokens: 43940577280 | elapsed time per iteration (s): 0.42 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.955219E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.456 | TFLOPs: 31.77 | +7: iteration 83820/ 173500 | consumed samples: 21457920 | consumed tokens: 43945820160 | elapsed time per iteration (s): 0.42 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.952825E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.364 | TFLOPs: 31.97 | +7: iteration 83830/ 173500 | consumed samples: 21460480 | consumed tokens: 43951063040 | elapsed time per iteration (s): 0.42 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.953757E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.716 | TFLOPs: 31.68 | +7: iteration 83840/ 173500 | consumed samples: 21463040 | consumed tokens: 43956305920 | elapsed time per iteration (s): 0.42 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.943832E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.421 | TFLOPs: 31.77 | +7: iteration 83850/ 173500 | consumed samples: 21465600 | consumed tokens: 43961548800 | elapsed time per iteration (s): 0.42 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.963939E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.400 | TFLOPs: 31.92 | +7: iteration 83860/ 173500 | consumed samples: 21468160 | consumed tokens: 43966791680 | elapsed time per iteration (s): 0.43 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.923855E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.087 | TFLOPs: 31.28 | +7: iteration 83870/ 173500 | consumed samples: 21470720 | consumed tokens: 43972034560 | elapsed time per iteration (s): 0.44 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.948514E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.424 | TFLOPs: 30.40 | +7: iteration 83880/ 173500 | consumed samples: 21473280 | consumed tokens: 43977277440 | elapsed time per iteration (s): 0.43 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.945129E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.954 | TFLOPs: 31.53 | +7: iteration 83890/ 173500 | consumed samples: 21475840 | consumed tokens: 43982520320 | elapsed time per iteration (s): 0.43 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.954310E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.813 | TFLOPs: 31.42 | +7: iteration 83900/ 173500 | consumed samples: 21478400 | consumed tokens: 43987763200 | elapsed time per iteration (s): 0.42 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.931692E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.068 | TFLOPs: 31.75 | +7: iteration 83910/ 173500 | consumed samples: 21480960 | consumed tokens: 43993006080 | elapsed time per iteration (s): 0.43 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.937561E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.244 | TFLOPs: 31.23 | +7: iteration 83920/ 173500 | consumed samples: 21483520 | consumed tokens: 43998248960 | elapsed time per iteration (s): 0.42 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.938677E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.832 | TFLOPs: 32.00 | +7: iteration 83930/ 173500 | consumed samples: 21486080 | consumed tokens: 44003491840 | elapsed time per iteration (s): 0.42 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.948215E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.120 | TFLOPs: 32.01 | +7: iteration 83940/ 173500 | consumed samples: 21488640 | consumed tokens: 44008734720 | elapsed time per iteration (s): 0.42 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.947543E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.662 | TFLOPs: 31.99 | +7: iteration 83950/ 173500 | consumed samples: 21491200 | consumed tokens: 44013977600 | elapsed time per iteration (s): 0.43 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.943202E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.731 | TFLOPs: 31.31 | +7: iteration 83960/ 173500 | consumed samples: 21493760 | consumed tokens: 44019220480 | elapsed time per iteration (s): 0.44 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.940691E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.901 | TFLOPs: 30.79 | +7: iteration 83970/ 173500 | consumed samples: 21496320 | consumed tokens: 44024463360 | elapsed time per iteration (s): 0.43 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.962053E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.667 | TFLOPs: 31.41 | +7: iteration 83980/ 173500 | consumed samples: 21498880 | consumed tokens: 44029706240 | elapsed time per iteration (s): 0.43 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.957065E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.459 | TFLOPs: 31.51 | +7: iteration 83990/ 173500 | consumed samples: 21501440 | consumed tokens: 44034949120 | elapsed time per iteration (s): 0.43 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.927190E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.497 | TFLOPs: 31.30 | +0: [2023-03-17 09:09:24,641] [INFO] [logging.py:68:log_dist] [Rank 0] step=84000, skipped=0, lr=[0.00011595088621669176, 0.00011595088621669176, 0.00011595088621669176], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 84000/ 173500 | consumed samples: 21504000 | consumed tokens: 44040192000 | elapsed time per iteration (s): 0.43 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.937550E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.930 | TFLOPs: 31.01 | +0: steps: 84000 loss: 2.9717 iter time (s): 0.430 samples/sec: 595.838 +7: iteration 84010/ 173500 | consumed samples: 21506560 | consumed tokens: 44045434880 | elapsed time per iteration (s): 0.43 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.956774E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.227 | TFLOPs: 31.34 | +7: iteration 84020/ 173500 | consumed samples: 21509120 | consumed tokens: 44050677760 | elapsed time per iteration (s): 0.42 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.962138E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.284 | TFLOPs: 31.76 | +7: iteration 84030/ 173500 | consumed samples: 21511680 | consumed tokens: 44055920640 | elapsed time per iteration (s): 0.43 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.965519E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.912 | TFLOPs: 31.53 | +7: iteration 84040/ 173500 | consumed samples: 21514240 | consumed tokens: 44061163520 | elapsed time per iteration (s): 0.42 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.951726E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.348 | TFLOPs: 31.97 | +7: iteration 84050/ 173500 | consumed samples: 21516800 | consumed tokens: 44066406400 | elapsed time per iteration (s): 0.42 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.965582E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.563 | TFLOPs: 31.72 | +7: iteration 84060/ 173500 | consumed samples: 21519360 | consumed tokens: 44071649280 | elapsed time per iteration (s): 0.43 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.953145E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.415 | TFLOPs: 31.24 | +7: iteration 84070/ 173500 | consumed samples: 21521920 | consumed tokens: 44076892160 | elapsed time per iteration (s): 0.43 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.936659E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.588 | TFLOPs: 31.35 | +7: iteration 84080/ 173500 | consumed samples: 21524480 | consumed tokens: 44082135040 | elapsed time per iteration (s): 0.43 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.939111E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.866 | TFLOPs: 31.47 | +7: iteration 84090/ 173500 | consumed samples: 21527040 | consumed tokens: 44087377920 | elapsed time per iteration (s): 0.42 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.938666E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.837 | TFLOPs: 31.79 | +7: iteration 84100/ 173500 | consumed samples: 21529600 | consumed tokens: 44092620800 | elapsed time per iteration (s): 0.42 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.939008E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.091 | TFLOPs: 31.91 | +7: iteration 84110/ 173500 | consumed samples: 21532160 | consumed tokens: 44097863680 | elapsed time per iteration (s): 0.43 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.935059E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.763 | TFLOPs: 31.15 | +7: iteration 84120/ 173500 | consumed samples: 21534720 | consumed tokens: 44103106560 | elapsed time per iteration (s): 0.43 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.950249E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.657 | TFLOPs: 31.36 | +7: iteration 84130/ 173500 | consumed samples: 21537280 | consumed tokens: 44108349440 | elapsed time per iteration (s): 0.43 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.941588E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.158 | TFLOPs: 31.38 | +7: iteration 84140/ 173500 | consumed samples: 21539840 | consumed tokens: 44113592320 | elapsed time per iteration (s): 0.43 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.943835E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.721 | TFLOPs: 31.52 | +7: iteration 84150/ 173500 | consumed samples: 21542400 | consumed tokens: 44118835200 | elapsed time per iteration (s): 0.42 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.953627E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.654 | TFLOPs: 31.83 | +7: iteration 84160/ 173500 | consumed samples: 21544960 | consumed tokens: 44124078080 | elapsed time per iteration (s): 0.42 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.950860E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.421 | TFLOPs: 31.98 | +7: iteration 84170/ 173500 | consumed samples: 21547520 | consumed tokens: 44129320960 | elapsed time per iteration (s): 0.42 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.942077E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.300 | TFLOPs: 31.76 | +7: iteration 84180/ 173500 | consumed samples: 21550080 | consumed tokens: 44134563840 | elapsed time per iteration (s): 0.42 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.943330E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.209 | TFLOPs: 31.81 | +7: iteration 84190/ 173500 | consumed samples: 21552640 | consumed tokens: 44139806720 | elapsed time per iteration (s): 0.43 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.934395E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.826 | TFLOPs: 31.52 | +7: iteration 84200/ 173500 | consumed samples: 21555200 | consumed tokens: 44145049600 | elapsed time per iteration (s): 0.43 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.948696E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.122 | TFLOPs: 31.54 | +7: iteration 84210/ 173500 | consumed samples: 21557760 | consumed tokens: 44150292480 | elapsed time per iteration (s): 0.43 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.963194E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.066 | TFLOPs: 31.54 | +7: iteration 84220/ 173500 | consumed samples: 21560320 | consumed tokens: 44155535360 | elapsed time per iteration (s): 0.43 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.946163E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.874 | TFLOPs: 31.53 | +7: iteration 84230/ 173500 | consumed samples: 21562880 | consumed tokens: 44160778240 | elapsed time per iteration (s): 0.43 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.945349E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.110 | TFLOPs: 31.49 | +7: iteration 84240/ 173500 | consumed samples: 21565440 | consumed tokens: 44166021120 | elapsed time per iteration (s): 0.43 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.942325E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.351 | TFLOPs: 31.24 | +7: iteration 84250/ 173500 | consumed samples: 21568000 | consumed tokens: 44171264000 | elapsed time per iteration (s): 0.43 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.958203E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.611 | TFLOPs: 31.09 | +7: iteration 84260/ 173500 | consumed samples: 21570560 | consumed tokens: 44176506880 | elapsed time per iteration (s): 0.43 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.939959E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.686 | TFLOPs: 31.52 | +7: iteration 84270/ 173500 | consumed samples: 21573120 | consumed tokens: 44181749760 | elapsed time per iteration (s): 0.43 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.939507E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.500 | TFLOPs: 31.30 | +7: iteration 84280/ 173500 | consumed samples: 21575680 | consumed tokens: 44186992640 | elapsed time per iteration (s): 0.42 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.916043E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.354 | TFLOPs: 31.66 | +7: iteration 84290/ 173500 | consumed samples: 21578240 | consumed tokens: 44192235520 | elapsed time per iteration (s): 0.42 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.950955E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.496 | TFLOPs: 31.82 | +7: iteration 84300/ 173500 | consumed samples: 21580800 | consumed tokens: 44197478400 | elapsed time per iteration (s): 0.42 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.967447E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.065 | TFLOPs: 31.96 | +7: iteration 84310/ 173500 | consumed samples: 21583360 | consumed tokens: 44202721280 | elapsed time per iteration (s): 0.42 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.947921E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.115 | TFLOPs: 31.75 | +7: iteration 84320/ 173500 | consumed samples: 21585920 | consumed tokens: 44207964160 | elapsed time per iteration (s): 0.42 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.946075E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.946 | TFLOPs: 31.95 | +7: iteration 84330/ 173500 | consumed samples: 21588480 | consumed tokens: 44213207040 | elapsed time per iteration (s): 0.43 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.952010E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.477 | TFLOPs: 31.45 | +7: iteration 84340/ 173500 | consumed samples: 21591040 | consumed tokens: 44218449920 | elapsed time per iteration (s): 0.44 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.952345E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.876 | TFLOPs: 30.58 | +7: iteration 84350/ 173500 | consumed samples: 21593600 | consumed tokens: 44223692800 | elapsed time per iteration (s): 0.43 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.951047E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.251 | TFLOPs: 31.49 | +7: iteration 84360/ 173500 | consumed samples: 21596160 | consumed tokens: 44228935680 | elapsed time per iteration (s): 0.43 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.952999E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.678 | TFLOPs: 31.41 | +7: iteration 84370/ 173500 | consumed samples: 21598720 | consumed tokens: 44234178560 | elapsed time per iteration (s): 0.42 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.934808E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.617 | TFLOPs: 31.99 | +7: iteration 84380/ 173500 | consumed samples: 21601280 | consumed tokens: 44239421440 | elapsed time per iteration (s): 0.43 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.955176E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.526 | TFLOPs: 31.14 | +7: iteration 84390/ 173500 | consumed samples: 21603840 | consumed tokens: 44244664320 | elapsed time per iteration (s): 0.43 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.929790E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.214 | TFLOPs: 31.12 | +7: iteration 84400/ 173500 | consumed samples: 21606400 | consumed tokens: 44249907200 | elapsed time per iteration (s): 0.42 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.950551E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.923 | TFLOPs: 31.74 | +7: iteration 84410/ 173500 | consumed samples: 21608960 | consumed tokens: 44255150080 | elapsed time per iteration (s): 0.43 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.954724E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.479 | TFLOPs: 31.14 | +7: iteration 84420/ 173500 | consumed samples: 21611520 | consumed tokens: 44260392960 | elapsed time per iteration (s): 0.42 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.960690E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.713 | TFLOPs: 31.99 | +7: iteration 84430/ 173500 | consumed samples: 21614080 | consumed tokens: 44265635840 | elapsed time per iteration (s): 0.43 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.957643E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.845 | TFLOPs: 31.53 | +7: iteration 84440/ 173500 | consumed samples: 21616640 | consumed tokens: 44270878720 | elapsed time per iteration (s): 0.42 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.945713E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.313 | TFLOPs: 31.97 | +7: iteration 84450/ 173500 | consumed samples: 21619200 | consumed tokens: 44276121600 | elapsed time per iteration (s): 0.43 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.955691E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.895 | TFLOPs: 31.53 | +7: iteration 84460/ 173500 | consumed samples: 21621760 | consumed tokens: 44281364480 | elapsed time per iteration (s): 0.43 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.955528E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.273 | TFLOPs: 31.50 | +7: iteration 84470/ 173500 | consumed samples: 21624320 | consumed tokens: 44286607360 | elapsed time per iteration (s): 0.43 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.944056E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.542 | TFLOPs: 30.93 | +7: iteration 84480/ 173500 | consumed samples: 21626880 | consumed tokens: 44291850240 | elapsed time per iteration (s): 0.43 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.936991E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.963 | TFLOPs: 31.22 | +7: iteration 84490/ 173500 | consumed samples: 21629440 | consumed tokens: 44297093120 | elapsed time per iteration (s): 0.42 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.952084E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.251 | TFLOPs: 31.70 | +7: iteration 84500/ 173500 | consumed samples: 21632000 | consumed tokens: 44302336000 | elapsed time per iteration (s): 0.42 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.957872E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.071 | TFLOPs: 31.75 | +7: iteration 84510/ 173500 | consumed samples: 21634560 | consumed tokens: 44307578880 | elapsed time per iteration (s): 0.43 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.956792E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.372 | TFLOPs: 31.45 | +7: iteration 84520/ 173500 | consumed samples: 21637120 | consumed tokens: 44312821760 | elapsed time per iteration (s): 0.43 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.947029E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.329 | TFLOPs: 31.55 | +7: iteration 84530/ 173500 | consumed samples: 21639680 | consumed tokens: 44318064640 | elapsed time per iteration (s): 0.42 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.941704E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.539 | TFLOPs: 31.72 | +7: iteration 84540/ 173500 | consumed samples: 21642240 | consumed tokens: 44323307520 | elapsed time per iteration (s): 0.42 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.962827E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.726 | TFLOPs: 31.73 | +7: iteration 84550/ 173500 | consumed samples: 21644800 | consumed tokens: 44328550400 | elapsed time per iteration (s): 0.43 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.923653E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.190 | TFLOPs: 31.23 | +7: iteration 84560/ 173500 | consumed samples: 21647360 | consumed tokens: 44333793280 | elapsed time per iteration (s): 0.42 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.955603E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.374 | TFLOPs: 31.61 | +7: iteration 84570/ 173500 | consumed samples: 21649920 | consumed tokens: 44339036160 | elapsed time per iteration (s): 0.42 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.948558E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.346 | TFLOPs: 31.71 | +7: iteration 84580/ 173500 | consumed samples: 21652480 | consumed tokens: 44344279040 | elapsed time per iteration (s): 0.43 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.957850E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.459 | TFLOPs: 31.03 | +7: iteration 84590/ 173500 | consumed samples: 21655040 | consumed tokens: 44349521920 | elapsed time per iteration (s): 0.42 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.946965E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.598 | TFLOPs: 31.72 | +7: iteration 84600/ 173500 | consumed samples: 21657600 | consumed tokens: 44354764800 | elapsed time per iteration (s): 0.43 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.953611E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.104 | TFLOPs: 31.49 | +7: iteration 84610/ 173500 | consumed samples: 21660160 | consumed tokens: 44360007680 | elapsed time per iteration (s): 0.43 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.955590E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.009 | TFLOPs: 31.53 | +7: iteration 84620/ 173500 | consumed samples: 21662720 | consumed tokens: 44365250560 | elapsed time per iteration (s): 0.42 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.944844E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.503 | TFLOPs: 31.87 | +7: iteration 84630/ 173500 | consumed samples: 21665280 | consumed tokens: 44370493440 | elapsed time per iteration (s): 0.43 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.948036E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.267 | TFLOPs: 31.44 | +7: iteration 84640/ 173500 | consumed samples: 21667840 | consumed tokens: 44375736320 | elapsed time per iteration (s): 0.43 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.935711E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.748 | TFLOPs: 30.89 | +7: iteration 84650/ 173500 | consumed samples: 21670400 | consumed tokens: 44380979200 | elapsed time per iteration (s): 0.42 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.941318E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.877 | TFLOPs: 31.68 | +7: iteration 84660/ 173500 | consumed samples: 21672960 | consumed tokens: 44386222080 | elapsed time per iteration (s): 0.42 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.937918E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.457 | TFLOPs: 31.71 | +7: iteration 84670/ 173500 | consumed samples: 21675520 | consumed tokens: 44391464960 | elapsed time per iteration (s): 0.42 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.953885E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.782 | TFLOPs: 31.68 | +7: iteration 84680/ 173500 | consumed samples: 21678080 | consumed tokens: 44396707840 | elapsed time per iteration (s): 0.43 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.963046E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.403 | TFLOPs: 31.40 | +7: iteration 84690/ 173500 | consumed samples: 21680640 | consumed tokens: 44401950720 | elapsed time per iteration (s): 0.42 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.949116E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.280 | TFLOPs: 31.76 | +7: iteration 84700/ 173500 | consumed samples: 21683200 | consumed tokens: 44407193600 | elapsed time per iteration (s): 0.42 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.937514E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.004 | TFLOPs: 31.74 | +7: iteration 84710/ 173500 | consumed samples: 21685760 | consumed tokens: 44412436480 | elapsed time per iteration (s): 0.43 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.946870E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.828 | TFLOPs: 31.52 | +7: iteration 84720/ 173500 | consumed samples: 21688320 | consumed tokens: 44417679360 | elapsed time per iteration (s): 0.43 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.949966E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.181 | TFLOPs: 31.60 | +7: iteration 84730/ 173500 | consumed samples: 21690880 | consumed tokens: 44422922240 | elapsed time per iteration (s): 0.42 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.919067E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.945 | TFLOPs: 31.74 | +7: iteration 84740/ 173500 | consumed samples: 21693440 | consumed tokens: 44428165120 | elapsed time per iteration (s): 0.43 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.950743E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.469 | TFLOPs: 31.30 | +7: iteration 84750/ 173500 | consumed samples: 21696000 | consumed tokens: 44433408000 | elapsed time per iteration (s): 0.42 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.943020E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.901 | TFLOPs: 31.74 | +7: iteration 84760/ 173500 | consumed samples: 21698560 | consumed tokens: 44438650880 | elapsed time per iteration (s): 0.42 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.948763E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.141 | TFLOPs: 31.75 | +7: iteration 84770/ 173500 | consumed samples: 21701120 | consumed tokens: 44443893760 | elapsed time per iteration (s): 0.42 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.935155E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.959 | TFLOPs: 31.95 | +7: iteration 84780/ 173500 | consumed samples: 21703680 | consumed tokens: 44449136640 | elapsed time per iteration (s): 0.42 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.948394E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.958 | TFLOPs: 31.95 | +7: iteration 84790/ 173500 | consumed samples: 21706240 | consumed tokens: 44454379520 | elapsed time per iteration (s): 0.43 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.954954E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.688 | TFLOPs: 31.57 | +7: iteration 84800/ 173500 | consumed samples: 21708800 | consumed tokens: 44459622400 | elapsed time per iteration (s): 0.42 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.957200E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.477 | TFLOPs: 31.72 | +7: iteration 84810/ 173500 | consumed samples: 21711360 | consumed tokens: 44464865280 | elapsed time per iteration (s): 0.43 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.947548E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.105 | TFLOPs: 31.38 | +7: iteration 84820/ 173500 | consumed samples: 21713920 | consumed tokens: 44470108160 | elapsed time per iteration (s): 0.42 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.950678E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.669 | TFLOPs: 31.62 | +7: iteration 84830/ 173500 | consumed samples: 21716480 | consumed tokens: 44475351040 | elapsed time per iteration (s): 0.43 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.951500E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.646 | TFLOPs: 31.41 | +7: iteration 84840/ 173500 | consumed samples: 21719040 | consumed tokens: 44480593920 | elapsed time per iteration (s): 0.42 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.954783E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.964 | TFLOPs: 31.74 | +7: iteration 84850/ 173500 | consumed samples: 21721600 | consumed tokens: 44485836800 | elapsed time per iteration (s): 0.42 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.955837E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.776 | TFLOPs: 31.73 | +7: iteration 84860/ 173500 | consumed samples: 21724160 | consumed tokens: 44491079680 | elapsed time per iteration (s): 0.42 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.962426E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.019 | TFLOPs: 31.80 | +7: iteration 84870/ 173500 | consumed samples: 21726720 | consumed tokens: 44496322560 | elapsed time per iteration (s): 0.43 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.935318E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.733 | TFLOPs: 31.41 | +7: iteration 84880/ 173500 | consumed samples: 21729280 | consumed tokens: 44501565440 | elapsed time per iteration (s): 0.42 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.946182E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.986 | TFLOPs: 31.69 | +7: iteration 84890/ 173500 | consumed samples: 21731840 | consumed tokens: 44506808320 | elapsed time per iteration (s): 0.44 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.953409E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.373 | TFLOPs: 30.66 | +7: iteration 84900/ 173500 | consumed samples: 21734400 | consumed tokens: 44512051200 | elapsed time per iteration (s): 0.42 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.952015E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.714 | TFLOPs: 31.73 | +7: iteration 84910/ 173500 | consumed samples: 21736960 | consumed tokens: 44517294080 | elapsed time per iteration (s): 0.42 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.946160E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.768 | TFLOPs: 31.63 | +7: iteration 84920/ 173500 | consumed samples: 21739520 | consumed tokens: 44522536960 | elapsed time per iteration (s): 0.43 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.950783E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.203 | TFLOPs: 31.44 | +7: iteration 84930/ 173500 | consumed samples: 21742080 | consumed tokens: 44527779840 | elapsed time per iteration (s): 0.43 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.951212E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.741 | TFLOPs: 31.36 | +7: iteration 84940/ 173500 | consumed samples: 21744640 | consumed tokens: 44533022720 | elapsed time per iteration (s): 0.43 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.951749E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.657 | TFLOPs: 31.52 | +7: iteration 84950/ 173500 | consumed samples: 21747200 | consumed tokens: 44538265600 | elapsed time per iteration (s): 0.43 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.946624E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.269 | TFLOPs: 31.60 | +7: iteration 84960/ 173500 | consumed samples: 21749760 | consumed tokens: 44543508480 | elapsed time per iteration (s): 0.42 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.949145E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.775 | TFLOPs: 31.94 | +7: iteration 84970/ 173500 | consumed samples: 21752320 | consumed tokens: 44548751360 | elapsed time per iteration (s): 0.42 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.947529E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.329 | TFLOPs: 31.92 | +7: iteration 84980/ 173500 | consumed samples: 21754880 | consumed tokens: 44553994240 | elapsed time per iteration (s): 0.42 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.945138E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.419 | TFLOPs: 31.92 | +7: iteration 84990/ 173500 | consumed samples: 21757440 | consumed tokens: 44559237120 | elapsed time per iteration (s): 0.42 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.945520E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.602 | TFLOPs: 31.93 | +7: iteration 85000/ 173500 | consumed samples: 21760000 | consumed tokens: 44564480000 | elapsed time per iteration (s): 0.42 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.944540E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.051 | TFLOPs: 31.75 | +7: iteration 85010/ 173500 | consumed samples: 21762560 | consumed tokens: 44569722880 | elapsed time per iteration (s): 0.43 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.950588E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.062 | TFLOPs: 31.17 | +7: iteration 85020/ 173500 | consumed samples: 21765120 | consumed tokens: 44574965760 | elapsed time per iteration (s): 0.43 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.943228E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.910 | TFLOPs: 31.53 | +7: iteration 85030/ 173500 | consumed samples: 21767680 | consumed tokens: 44580208640 | elapsed time per iteration (s): 0.43 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.942881E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.433 | TFLOPs: 31.29 | +7: iteration 85040/ 173500 | consumed samples: 21770240 | consumed tokens: 44585451520 | elapsed time per iteration (s): 0.43 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.933231E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.193 | TFLOPs: 31.18 | +7: iteration 85050/ 173500 | consumed samples: 21772800 | consumed tokens: 44590694400 | elapsed time per iteration (s): 0.43 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.940397E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.453 | TFLOPs: 31.35 | +7: iteration 85060/ 173500 | consumed samples: 21775360 | consumed tokens: 44595937280 | elapsed time per iteration (s): 0.43 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.947283E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.061 | TFLOPs: 31.54 | +7: iteration 85070/ 173500 | consumed samples: 21777920 | consumed tokens: 44601180160 | elapsed time per iteration (s): 0.42 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.947025E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.328 | TFLOPs: 31.92 | +7: iteration 85080/ 173500 | consumed samples: 21780480 | consumed tokens: 44606423040 | elapsed time per iteration (s): 0.43 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.954230E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.999 | TFLOPs: 31.32 | +7: iteration 85090/ 173500 | consumed samples: 21783040 | consumed tokens: 44611665920 | elapsed time per iteration (s): 0.43 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.943191E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.586 | TFLOPs: 30.99 | +7: iteration 85100/ 173500 | consumed samples: 21785600 | consumed tokens: 44616908800 | elapsed time per iteration (s): 0.43 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.942603E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.368 | TFLOPs: 30.98 | +7: iteration 85110/ 173500 | consumed samples: 21788160 | consumed tokens: 44622151680 | elapsed time per iteration (s): 0.43 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.954620E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.443 | TFLOPs: 31.56 | +7: iteration 85120/ 173500 | consumed samples: 21790720 | consumed tokens: 44627394560 | elapsed time per iteration (s): 0.42 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.952961E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.647 | TFLOPs: 31.72 | +7: iteration 85130/ 173500 | consumed samples: 21793280 | consumed tokens: 44632637440 | elapsed time per iteration (s): 0.42 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.950223E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.402 | TFLOPs: 31.71 | +7: iteration 85140/ 173500 | consumed samples: 21795840 | consumed tokens: 44637880320 | elapsed time per iteration (s): 0.42 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.948184E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.244 | TFLOPs: 31.81 | +7: iteration 85150/ 173500 | consumed samples: 21798400 | consumed tokens: 44643123200 | elapsed time per iteration (s): 0.42 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.947104E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.487 | TFLOPs: 31.77 | +7: iteration 85160/ 173500 | consumed samples: 21800960 | consumed tokens: 44648366080 | elapsed time per iteration (s): 0.42 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.948863E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.657 | TFLOPs: 31.94 | +7: iteration 85170/ 173500 | consumed samples: 21803520 | consumed tokens: 44653608960 | elapsed time per iteration (s): 0.42 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.935257E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.702 | TFLOPs: 31.68 | +7: iteration 85180/ 173500 | consumed samples: 21806080 | consumed tokens: 44658851840 | elapsed time per iteration (s): 0.43 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.948164E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.164 | TFLOPs: 31.44 | +7: iteration 85190/ 173500 | consumed samples: 21808640 | consumed tokens: 44664094720 | elapsed time per iteration (s): 0.42 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.941375E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.404 | TFLOPs: 31.61 | +7: iteration 85200/ 173500 | consumed samples: 21811200 | consumed tokens: 44669337600 | elapsed time per iteration (s): 0.43 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.937563E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.648 | TFLOPs: 31.57 | +7: iteration 85210/ 173500 | consumed samples: 21813760 | consumed tokens: 44674580480 | elapsed time per iteration (s): 0.42 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.952537E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.945 | TFLOPs: 31.74 | +7: iteration 85220/ 173500 | consumed samples: 21816320 | consumed tokens: 44679823360 | elapsed time per iteration (s): 0.42 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.942850E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.618 | TFLOPs: 31.62 | +7: iteration 85230/ 173500 | consumed samples: 21818880 | consumed tokens: 44685066240 | elapsed time per iteration (s): 0.43 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.956136E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.163 | TFLOPs: 31.38 | +7: iteration 85240/ 173500 | consumed samples: 21821440 | consumed tokens: 44690309120 | elapsed time per iteration (s): 0.42 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.962325E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.413 | TFLOPs: 31.92 | +7: iteration 85250/ 173500 | consumed samples: 21824000 | consumed tokens: 44695552000 | elapsed time per iteration (s): 0.43 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.947747E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.316 | TFLOPs: 31.55 | +7: iteration 85260/ 173500 | consumed samples: 21826560 | consumed tokens: 44700794880 | elapsed time per iteration (s): 0.42 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.946547E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.427 | TFLOPs: 31.71 | +7: iteration 85270/ 173500 | consumed samples: 21829120 | consumed tokens: 44706037760 | elapsed time per iteration (s): 0.43 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.946961E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.850 | TFLOPs: 31.16 | +7: iteration 85280/ 173500 | consumed samples: 21831680 | consumed tokens: 44711280640 | elapsed time per iteration (s): 0.42 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.952547E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.921 | TFLOPs: 31.74 | +7: iteration 85290/ 173500 | consumed samples: 21834240 | consumed tokens: 44716523520 | elapsed time per iteration (s): 0.42 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.952772E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.818 | TFLOPs: 31.94 | +7: iteration 85300/ 173500 | consumed samples: 21836800 | consumed tokens: 44721766400 | elapsed time per iteration (s): 0.42 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.952839E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.736 | TFLOPs: 31.73 | +7: iteration 85310/ 173500 | consumed samples: 21839360 | consumed tokens: 44727009280 | elapsed time per iteration (s): 0.43 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.956909E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.390 | TFLOPs: 31.55 | +7: iteration 85320/ 173500 | consumed samples: 21841920 | consumed tokens: 44732252160 | elapsed time per iteration (s): 0.42 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.945669E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.244 | TFLOPs: 31.81 | +7: iteration 85330/ 173500 | consumed samples: 21844480 | consumed tokens: 44737495040 | elapsed time per iteration (s): 0.43 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.934758E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.235 | TFLOPs: 31.60 | +7: iteration 85340/ 173500 | consumed samples: 21847040 | consumed tokens: 44742737920 | elapsed time per iteration (s): 0.42 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.944447E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.694 | TFLOPs: 31.62 | +7: iteration 85350/ 173500 | consumed samples: 21849600 | consumed tokens: 44747980800 | elapsed time per iteration (s): 0.43 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.955459E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.020 | TFLOPs: 31.43 | +7: iteration 85360/ 173500 | consumed samples: 21852160 | consumed tokens: 44753223680 | elapsed time per iteration (s): 0.42 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.954941E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.214 | TFLOPs: 31.75 | +7: iteration 85370/ 173500 | consumed samples: 21854720 | consumed tokens: 44758466560 | elapsed time per iteration (s): 0.42 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.950567E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.407 | TFLOPs: 31.97 | +7: iteration 85380/ 173500 | consumed samples: 21857280 | consumed tokens: 44763709440 | elapsed time per iteration (s): 0.42 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.937573E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.907 | TFLOPs: 31.63 | +7: iteration 85390/ 173500 | consumed samples: 21859840 | consumed tokens: 44768952320 | elapsed time per iteration (s): 0.43 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.953859E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.240 | TFLOPs: 31.49 | +7: iteration 85400/ 173500 | consumed samples: 21862400 | consumed tokens: 44774195200 | elapsed time per iteration (s): 0.42 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.937943E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.896 | TFLOPs: 31.74 | +7: iteration 85410/ 173500 | consumed samples: 21864960 | consumed tokens: 44779438080 | elapsed time per iteration (s): 0.42 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.942300E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.296 | TFLOPs: 31.76 | +7: iteration 85420/ 173500 | consumed samples: 21867520 | consumed tokens: 44784680960 | elapsed time per iteration (s): 0.42 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.938431E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.956 | TFLOPs: 31.64 | +7: iteration 85430/ 173500 | consumed samples: 21870080 | consumed tokens: 44789923840 | elapsed time per iteration (s): 0.42 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.958672E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.135 | TFLOPs: 31.75 | +7: iteration 85440/ 173500 | consumed samples: 21872640 | consumed tokens: 44795166720 | elapsed time per iteration (s): 0.43 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.960592E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.499 | TFLOPs: 30.98 | +7: iteration 85450/ 173500 | consumed samples: 21875200 | consumed tokens: 44800409600 | elapsed time per iteration (s): 0.43 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.948565E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.468 | TFLOPs: 31.30 | +7: iteration 85460/ 173500 | consumed samples: 21877760 | consumed tokens: 44805652480 | elapsed time per iteration (s): 0.42 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.946519E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.530 | TFLOPs: 31.77 | +7: iteration 85470/ 173500 | consumed samples: 21880320 | consumed tokens: 44810895360 | elapsed time per iteration (s): 0.43 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.946608E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.551 | TFLOPs: 31.56 | +7: iteration 85480/ 173500 | consumed samples: 21882880 | consumed tokens: 44816138240 | elapsed time per iteration (s): 0.43 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.947646E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.860 | TFLOPs: 31.53 | +7: iteration 85490/ 173500 | consumed samples: 21885440 | consumed tokens: 44821381120 | elapsed time per iteration (s): 0.42 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.943727E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.036 | TFLOPs: 31.64 | +7: iteration 85500/ 173500 | consumed samples: 21888000 | consumed tokens: 44826624000 | elapsed time per iteration (s): 0.42 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.955724E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.708 | TFLOPs: 31.78 | +7: iteration 85510/ 173500 | consumed samples: 21890560 | consumed tokens: 44831866880 | elapsed time per iteration (s): 0.43 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.955222E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.337 | TFLOPs: 31.34 | +7: iteration 85520/ 173500 | consumed samples: 21893120 | consumed tokens: 44837109760 | elapsed time per iteration (s): 0.42 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.948569E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.464 | TFLOPs: 31.66 | +7: iteration 85530/ 173500 | consumed samples: 21895680 | consumed tokens: 44842352640 | elapsed time per iteration (s): 0.42 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.961408E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.755 | TFLOPs: 31.63 | +7: iteration 85540/ 173500 | consumed samples: 21898240 | consumed tokens: 44847595520 | elapsed time per iteration (s): 0.43 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.946098E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.246 | TFLOPs: 31.60 | +7: iteration 85550/ 173500 | consumed samples: 21900800 | consumed tokens: 44852838400 | elapsed time per iteration (s): 0.43 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.956532E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.813 | TFLOPs: 31.16 | +7: iteration 85560/ 173500 | consumed samples: 21903360 | consumed tokens: 44858081280 | elapsed time per iteration (s): 0.42 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.947363E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.107 | TFLOPs: 31.75 | +7: iteration 85570/ 173500 | consumed samples: 21905920 | consumed tokens: 44863324160 | elapsed time per iteration (s): 0.44 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.932388E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.355 | TFLOPs: 30.87 | +7: iteration 85580/ 173500 | consumed samples: 21908480 | consumed tokens: 44868567040 | elapsed time per iteration (s): 0.42 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.953630E+00 | grad norm: 0.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.385 | TFLOPs: 31.97 | +7: iteration 85590/ 173500 | consumed samples: 21911040 | consumed tokens: 44873809920 | elapsed time per iteration (s): 0.42 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.949417E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.799 | TFLOPs: 31.63 | +7: iteration 85600/ 173500 | consumed samples: 21913600 | consumed tokens: 44879052800 | elapsed time per iteration (s): 0.42 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.943696E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.839 | TFLOPs: 31.73 | +7: iteration 85610/ 173500 | consumed samples: 21916160 | consumed tokens: 44884295680 | elapsed time per iteration (s): 0.43 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.945078E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.130 | TFLOPs: 31.49 | +7: iteration 85620/ 173500 | consumed samples: 21918720 | consumed tokens: 44889538560 | elapsed time per iteration (s): 0.43 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.943218E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.902 | TFLOPs: 31.27 | +7: iteration 85630/ 173500 | consumed samples: 21921280 | consumed tokens: 44894781440 | elapsed time per iteration (s): 0.42 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.946887E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.955 | TFLOPs: 31.85 | +7: iteration 85640/ 173500 | consumed samples: 21923840 | consumed tokens: 44900024320 | elapsed time per iteration (s): 0.43 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.925644E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.359 | TFLOPs: 31.24 | +7: iteration 85650/ 173500 | consumed samples: 21926400 | consumed tokens: 44905267200 | elapsed time per iteration (s): 0.43 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.946297E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.372 | TFLOPs: 31.34 | +7: iteration 85660/ 173500 | consumed samples: 21928960 | consumed tokens: 44910510080 | elapsed time per iteration (s): 0.42 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.945479E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.732 | TFLOPs: 31.73 | +7: iteration 85670/ 173500 | consumed samples: 21931520 | consumed tokens: 44915752960 | elapsed time per iteration (s): 0.42 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.929888E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.637 | TFLOPs: 31.67 | +7: iteration 85680/ 173500 | consumed samples: 21934080 | consumed tokens: 44920995840 | elapsed time per iteration (s): 0.43 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.951538E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.085 | TFLOPs: 31.17 | +7: iteration 85690/ 173500 | consumed samples: 21936640 | consumed tokens: 44926238720 | elapsed time per iteration (s): 0.42 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.955742E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.363 | TFLOPs: 31.81 | +7: iteration 85700/ 173500 | consumed samples: 21939200 | consumed tokens: 44931481600 | elapsed time per iteration (s): 0.42 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.942175E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.744 | TFLOPs: 31.63 | +7: iteration 85710/ 173500 | consumed samples: 21941760 | consumed tokens: 44936724480 | elapsed time per iteration (s): 0.44 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.942527E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.228 | TFLOPs: 30.34 | +7: iteration 85720/ 173500 | consumed samples: 21944320 | consumed tokens: 44941967360 | elapsed time per iteration (s): 0.42 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.938433E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.585 | TFLOPs: 31.77 | +7: iteration 85730/ 173500 | consumed samples: 21946880 | consumed tokens: 44947210240 | elapsed time per iteration (s): 0.43 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.936027E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.159 | TFLOPs: 31.23 | +7: iteration 85740/ 173500 | consumed samples: 21949440 | consumed tokens: 44952453120 | elapsed time per iteration (s): 0.43 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.945408E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.860 | TFLOPs: 31.00 | +7: iteration 85750/ 173500 | consumed samples: 21952000 | consumed tokens: 44957696000 | elapsed time per iteration (s): 0.43 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.942854E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.021 | TFLOPs: 31.48 | +7: iteration 85760/ 173500 | consumed samples: 21954560 | consumed tokens: 44962938880 | elapsed time per iteration (s): 0.42 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.946325E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.686 | TFLOPs: 31.78 | +7: iteration 85770/ 173500 | consumed samples: 21957120 | consumed tokens: 44968181760 | elapsed time per iteration (s): 0.42 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.941470E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.592 | TFLOPs: 31.83 | +7: iteration 85780/ 173500 | consumed samples: 21959680 | consumed tokens: 44973424640 | elapsed time per iteration (s): 0.43 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.948681E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.445 | TFLOPs: 31.40 | +7: iteration 85790/ 173500 | consumed samples: 21962240 | consumed tokens: 44978667520 | elapsed time per iteration (s): 0.42 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.933403E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.882 | TFLOPs: 31.74 | +7: iteration 85800/ 173500 | consumed samples: 21964800 | consumed tokens: 44983910400 | elapsed time per iteration (s): 0.43 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.950607E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.637 | TFLOPs: 31.51 | +7: iteration 85810/ 173500 | consumed samples: 21967360 | consumed tokens: 44989153280 | elapsed time per iteration (s): 0.42 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.938844E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.415 | TFLOPs: 31.61 | +7: iteration 85820/ 173500 | consumed samples: 21969920 | consumed tokens: 44994396160 | elapsed time per iteration (s): 0.42 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.950772E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.070 | TFLOPs: 31.69 | +7: iteration 85830/ 173500 | consumed samples: 21972480 | consumed tokens: 44999639040 | elapsed time per iteration (s): 0.43 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.953523E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.143 | TFLOPs: 31.23 | +7: iteration 85840/ 173500 | consumed samples: 21975040 | consumed tokens: 45004881920 | elapsed time per iteration (s): 0.42 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.940219E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.650 | TFLOPs: 31.83 | +7: iteration 85850/ 173500 | consumed samples: 21977600 | consumed tokens: 45010124800 | elapsed time per iteration (s): 0.42 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.945420E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.974 | TFLOPs: 31.74 | +7: iteration 85860/ 173500 | consumed samples: 21980160 | consumed tokens: 45015367680 | elapsed time per iteration (s): 0.42 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.932677E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.965 | TFLOPs: 31.74 | +7: iteration 85870/ 173500 | consumed samples: 21982720 | consumed tokens: 45020610560 | elapsed time per iteration (s): 0.43 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.945889E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.125 | TFLOPs: 31.54 | +7: iteration 85880/ 173500 | consumed samples: 21985280 | consumed tokens: 45025853440 | elapsed time per iteration (s): 0.42 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.951033E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.148 | TFLOPs: 31.96 | +7: iteration 85890/ 173500 | consumed samples: 21987840 | consumed tokens: 45031096320 | elapsed time per iteration (s): 0.42 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.936485E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.455 | TFLOPs: 31.92 | +7: iteration 85900/ 173500 | consumed samples: 21990400 | consumed tokens: 45036339200 | elapsed time per iteration (s): 0.42 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.939472E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.581 | TFLOPs: 31.67 | +7: iteration 85910/ 173500 | consumed samples: 21992960 | consumed tokens: 45041582080 | elapsed time per iteration (s): 0.42 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.937922E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.356 | TFLOPs: 31.71 | +7: iteration 85920/ 173500 | consumed samples: 21995520 | consumed tokens: 45046824960 | elapsed time per iteration (s): 0.42 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.942846E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.723 | TFLOPs: 31.73 | +7: iteration 85930/ 173500 | consumed samples: 21998080 | consumed tokens: 45052067840 | elapsed time per iteration (s): 0.43 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.938973E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.699 | TFLOPs: 31.57 | +7: iteration 85940/ 173500 | consumed samples: 22000640 | consumed tokens: 45057310720 | elapsed time per iteration (s): 0.43 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.945745E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.204 | TFLOPs: 31.60 | +7: iteration 85950/ 173500 | consumed samples: 22003200 | consumed tokens: 45062553600 | elapsed time per iteration (s): 0.43 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.932216E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.431 | TFLOPs: 31.50 | +7: iteration 85960/ 173500 | consumed samples: 22005760 | consumed tokens: 45067796480 | elapsed time per iteration (s): 0.43 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.958951E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.255 | TFLOPs: 31.39 | +7: iteration 85970/ 173500 | consumed samples: 22008320 | consumed tokens: 45073039360 | elapsed time per iteration (s): 0.42 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.937543E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.073 | TFLOPs: 31.75 | +7: iteration 85980/ 173500 | consumed samples: 22010880 | consumed tokens: 45078282240 | elapsed time per iteration (s): 0.43 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.938269E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.880 | TFLOPs: 31.53 | +7: iteration 85990/ 173500 | consumed samples: 22013440 | consumed tokens: 45083525120 | elapsed time per iteration (s): 0.43 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.933965E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.855 | TFLOPs: 31.47 | +0: [2023-03-17 09:23:35,763] [INFO] [logging.py:68:log_dist] [Rank 0] step=86000, skipped=0, lr=[0.0001126626417003261, 0.0001126626417003261, 0.0001126626417003261], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 86000/ 173500 | consumed samples: 22016000 | consumed tokens: 45088768000 | elapsed time per iteration (s): 0.43 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.948622E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.927 | TFLOPs: 31.53 | +0: steps: 86000 loss: 2.9644 iter time (s): 0.423 samples/sec: 604.972 +7: iteration 86010/ 173500 | consumed samples: 22018560 | consumed tokens: 45094010880 | elapsed time per iteration (s): 0.42 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.948941E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.406 | TFLOPs: 31.76 | +7: iteration 86020/ 173500 | consumed samples: 22021120 | consumed tokens: 45099253760 | elapsed time per iteration (s): 0.42 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.940625E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.419 | TFLOPs: 31.66 | +7: iteration 86030/ 173500 | consumed samples: 22023680 | consumed tokens: 45104496640 | elapsed time per iteration (s): 0.42 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.946135E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.874 | TFLOPs: 31.95 | +7: iteration 86040/ 173500 | consumed samples: 22026240 | consumed tokens: 45109739520 | elapsed time per iteration (s): 0.43 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.939335E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.442 | TFLOPs: 31.08 | +7: iteration 86050/ 173500 | consumed samples: 22028800 | consumed tokens: 45114982400 | elapsed time per iteration (s): 0.42 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.959981E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.520 | TFLOPs: 31.98 | +7: iteration 86060/ 173500 | consumed samples: 22031360 | consumed tokens: 45120225280 | elapsed time per iteration (s): 0.43 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.953261E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.893 | TFLOPs: 31.32 | +7: iteration 86070/ 173500 | consumed samples: 22033920 | consumed tokens: 45125468160 | elapsed time per iteration (s): 0.42 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.932593E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.102 | TFLOPs: 31.96 | +7: iteration 86080/ 173500 | consumed samples: 22036480 | consumed tokens: 45130711040 | elapsed time per iteration (s): 0.42 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.935070E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.809 | TFLOPs: 31.94 | +7: iteration 86090/ 173500 | consumed samples: 22039040 | consumed tokens: 45135953920 | elapsed time per iteration (s): 0.42 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.948203E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.886 | TFLOPs: 31.95 | +7: iteration 86100/ 173500 | consumed samples: 22041600 | consumed tokens: 45141196800 | elapsed time per iteration (s): 0.42 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.952408E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.917 | TFLOPs: 31.79 | +7: iteration 86110/ 173500 | consumed samples: 22044160 | consumed tokens: 45146439680 | elapsed time per iteration (s): 0.42 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.942920E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.984 | TFLOPs: 31.74 | +7: iteration 86120/ 173500 | consumed samples: 22046720 | consumed tokens: 45151682560 | elapsed time per iteration (s): 0.42 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.945605E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.958 | TFLOPs: 31.95 | +7: iteration 86130/ 173500 | consumed samples: 22049280 | consumed tokens: 45156925440 | elapsed time per iteration (s): 0.42 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.944917E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.355 | TFLOPs: 31.92 | +7: iteration 86140/ 173500 | consumed samples: 22051840 | consumed tokens: 45162168320 | elapsed time per iteration (s): 0.43 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.943448E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.329 | TFLOPs: 31.50 | +7: iteration 86150/ 173500 | consumed samples: 22054400 | consumed tokens: 45167411200 | elapsed time per iteration (s): 0.42 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.936004E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.007 | TFLOPs: 31.90 | +7: iteration 86160/ 173500 | consumed samples: 22056960 | consumed tokens: 45172654080 | elapsed time per iteration (s): 0.42 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.942515E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.753 | TFLOPs: 31.89 | +7: iteration 86170/ 173500 | consumed samples: 22059520 | consumed tokens: 45177896960 | elapsed time per iteration (s): 0.42 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.969222E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.580 | TFLOPs: 31.62 | +7: iteration 86180/ 173500 | consumed samples: 22062080 | consumed tokens: 45183139840 | elapsed time per iteration (s): 0.43 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.947017E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.308 | TFLOPs: 31.39 | +7: iteration 86190/ 173500 | consumed samples: 22064640 | consumed tokens: 45188382720 | elapsed time per iteration (s): 0.42 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.942084E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.740 | TFLOPs: 31.89 | +7: iteration 86200/ 173500 | consumed samples: 22067200 | consumed tokens: 45193625600 | elapsed time per iteration (s): 0.43 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.946762E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.980 | TFLOPs: 31.27 | +7: iteration 86210/ 173500 | consumed samples: 22069760 | consumed tokens: 45198868480 | elapsed time per iteration (s): 0.42 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.951742E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.585 | TFLOPs: 31.67 | +7: iteration 86220/ 173500 | consumed samples: 22072320 | consumed tokens: 45204111360 | elapsed time per iteration (s): 0.42 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.942498E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.782 | TFLOPs: 31.73 | +7: iteration 86230/ 173500 | consumed samples: 22074880 | consumed tokens: 45209354240 | elapsed time per iteration (s): 0.42 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.928121E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.101 | TFLOPs: 31.91 | +7: iteration 86240/ 173500 | consumed samples: 22077440 | consumed tokens: 45214597120 | elapsed time per iteration (s): 0.43 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.940986E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.741 | TFLOPs: 31.57 | +7: iteration 86250/ 173500 | consumed samples: 22080000 | consumed tokens: 45219840000 | elapsed time per iteration (s): 0.42 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.950775E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.894 | TFLOPs: 31.90 | +7: iteration 86260/ 173500 | consumed samples: 22082560 | consumed tokens: 45225082880 | elapsed time per iteration (s): 0.42 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.939779E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.271 | TFLOPs: 31.71 | +7: iteration 86270/ 173500 | consumed samples: 22085120 | consumed tokens: 45230325760 | elapsed time per iteration (s): 0.42 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.946222E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.114 | TFLOPs: 31.64 | +7: iteration 86280/ 173500 | consumed samples: 22087680 | consumed tokens: 45235568640 | elapsed time per iteration (s): 0.42 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.951362E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.664 | TFLOPs: 31.88 | +7: iteration 86290/ 173500 | consumed samples: 22090240 | consumed tokens: 45240811520 | elapsed time per iteration (s): 0.42 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.953083E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.627 | TFLOPs: 31.88 | +7: iteration 86300/ 173500 | consumed samples: 22092800 | consumed tokens: 45246054400 | elapsed time per iteration (s): 0.43 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.943043E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.994 | TFLOPs: 31.11 | +7: iteration 86310/ 173500 | consumed samples: 22095360 | consumed tokens: 45251297280 | elapsed time per iteration (s): 0.42 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.935369E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.110 | TFLOPs: 31.91 | +7: iteration 86320/ 173500 | consumed samples: 22097920 | consumed tokens: 45256540160 | elapsed time per iteration (s): 0.43 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.949117E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.666 | TFLOPs: 30.94 | +7: iteration 86330/ 173500 | consumed samples: 22100480 | consumed tokens: 45261783040 | elapsed time per iteration (s): 0.42 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.929188E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.935 | TFLOPs: 31.69 | +7: iteration 86340/ 173500 | consumed samples: 22103040 | consumed tokens: 45267025920 | elapsed time per iteration (s): 0.42 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.933952E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.973 | TFLOPs: 31.90 | +7: iteration 86350/ 173500 | consumed samples: 22105600 | consumed tokens: 45272268800 | elapsed time per iteration (s): 0.44 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.932314E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.368 | TFLOPs: 30.61 | +7: iteration 86360/ 173500 | consumed samples: 22108160 | consumed tokens: 45277511680 | elapsed time per iteration (s): 0.43 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.929651E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.496 | TFLOPs: 31.56 | +7: iteration 86370/ 173500 | consumed samples: 22110720 | consumed tokens: 45282754560 | elapsed time per iteration (s): 0.43 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.923804E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.259 | TFLOPs: 31.60 | +7: iteration 86380/ 173500 | consumed samples: 22113280 | consumed tokens: 45287997440 | elapsed time per iteration (s): 0.43 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.942585E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.714 | TFLOPs: 31.41 | +7: iteration 86390/ 173500 | consumed samples: 22115840 | consumed tokens: 45293240320 | elapsed time per iteration (s): 0.43 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.935110E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.872 | TFLOPs: 31.16 | +7: iteration 86400/ 173500 | consumed samples: 22118400 | consumed tokens: 45298483200 | elapsed time per iteration (s): 0.42 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.939614E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.703 | TFLOPs: 31.99 | +7: iteration 86410/ 173500 | consumed samples: 22120960 | consumed tokens: 45303726080 | elapsed time per iteration (s): 0.43 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.942924E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.851 | TFLOPs: 31.58 | +7: iteration 86420/ 173500 | consumed samples: 22123520 | consumed tokens: 45308968960 | elapsed time per iteration (s): 0.42 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.950844E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.841 | TFLOPs: 31.94 | +7: iteration 86430/ 173500 | consumed samples: 22126080 | consumed tokens: 45314211840 | elapsed time per iteration (s): 0.42 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.928116E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.143 | TFLOPs: 31.80 | +7: iteration 86440/ 173500 | consumed samples: 22128640 | consumed tokens: 45319454720 | elapsed time per iteration (s): 0.43 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.957631E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.049 | TFLOPs: 31.59 | +7: iteration 86450/ 173500 | consumed samples: 22131200 | consumed tokens: 45324697600 | elapsed time per iteration (s): 0.42 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.933085E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.481 | TFLOPs: 31.93 | +7: iteration 86460/ 173500 | consumed samples: 22133760 | consumed tokens: 45329940480 | elapsed time per iteration (s): 0.42 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.961298E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.463 | TFLOPs: 31.93 | +7: iteration 86470/ 173500 | consumed samples: 22136320 | consumed tokens: 45335183360 | elapsed time per iteration (s): 0.42 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.936461E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.859 | TFLOPs: 31.84 | +7: iteration 86480/ 173500 | consumed samples: 22138880 | consumed tokens: 45340426240 | elapsed time per iteration (s): 0.42 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.951195E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.269 | TFLOPs: 31.71 | +7: iteration 86490/ 173500 | consumed samples: 22141440 | consumed tokens: 45345669120 | elapsed time per iteration (s): 0.42 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.951289E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.098 | TFLOPs: 31.75 | +7: iteration 86500/ 173500 | consumed samples: 22144000 | consumed tokens: 45350912000 | elapsed time per iteration (s): 0.42 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.944474E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.408 | TFLOPs: 31.71 | +7: iteration 86510/ 173500 | consumed samples: 22146560 | consumed tokens: 45356154880 | elapsed time per iteration (s): 0.42 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.947155E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.561 | TFLOPs: 31.88 | +7: iteration 86520/ 173500 | consumed samples: 22149120 | consumed tokens: 45361397760 | elapsed time per iteration (s): 0.42 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.963249E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.072 | TFLOPs: 31.64 | +7: iteration 86530/ 173500 | consumed samples: 22151680 | consumed tokens: 45366640640 | elapsed time per iteration (s): 0.43 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.949736E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.662 | TFLOPs: 31.31 | +7: iteration 86540/ 173500 | consumed samples: 22154240 | consumed tokens: 45371883520 | elapsed time per iteration (s): 0.42 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.952552E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.639 | TFLOPs: 31.72 | +7: iteration 86550/ 173500 | consumed samples: 22156800 | consumed tokens: 45377126400 | elapsed time per iteration (s): 0.42 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.948215E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.628 | TFLOPs: 31.67 | +7: iteration 86560/ 173500 | consumed samples: 22159360 | consumed tokens: 45382369280 | elapsed time per iteration (s): 0.42 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.934793E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.126 | TFLOPs: 31.91 | +7: iteration 86570/ 173500 | consumed samples: 22161920 | consumed tokens: 45387612160 | elapsed time per iteration (s): 0.42 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.940389E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.610 | TFLOPs: 31.88 | +7: iteration 86580/ 173500 | consumed samples: 22164480 | consumed tokens: 45392855040 | elapsed time per iteration (s): 0.42 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.939827E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.151 | TFLOPs: 31.65 | +7: iteration 86590/ 173500 | consumed samples: 22167040 | consumed tokens: 45398097920 | elapsed time per iteration (s): 0.42 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.942406E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.726 | TFLOPs: 31.89 | +7: iteration 86600/ 173500 | consumed samples: 22169600 | consumed tokens: 45403340800 | elapsed time per iteration (s): 0.42 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.950882E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.525 | TFLOPs: 31.88 | +7: iteration 86610/ 173500 | consumed samples: 22172160 | consumed tokens: 45408583680 | elapsed time per iteration (s): 0.43 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.951089E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.898 | TFLOPs: 31.58 | +7: iteration 86620/ 173500 | consumed samples: 22174720 | consumed tokens: 45413826560 | elapsed time per iteration (s): 0.42 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.951777E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.761 | TFLOPs: 31.68 | +7: iteration 86630/ 173500 | consumed samples: 22177280 | consumed tokens: 45419069440 | elapsed time per iteration (s): 0.42 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.942963E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.366 | TFLOPs: 31.87 | +7: iteration 86640/ 173500 | consumed samples: 22179840 | consumed tokens: 45424312320 | elapsed time per iteration (s): 0.42 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.947158E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.820 | TFLOPs: 31.73 | +7: iteration 86650/ 173500 | consumed samples: 22182400 | consumed tokens: 45429555200 | elapsed time per iteration (s): 0.43 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.948312E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.565 | TFLOPs: 31.25 | +7: iteration 86660/ 173500 | consumed samples: 22184960 | consumed tokens: 45434798080 | elapsed time per iteration (s): 0.44 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.963830E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.729 | TFLOPs: 30.84 | +7: iteration 86670/ 173500 | consumed samples: 22187520 | consumed tokens: 45440040960 | elapsed time per iteration (s): 0.43 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.929281E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.368 | TFLOPs: 30.98 | +7: iteration 86680/ 173500 | consumed samples: 22190080 | consumed tokens: 45445283840 | elapsed time per iteration (s): 0.42 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.948323E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.886 | TFLOPs: 31.89 | +7: iteration 86690/ 173500 | consumed samples: 22192640 | consumed tokens: 45450526720 | elapsed time per iteration (s): 0.42 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.954791E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.064 | TFLOPs: 31.85 | +7: iteration 86700/ 173500 | consumed samples: 22195200 | consumed tokens: 45455769600 | elapsed time per iteration (s): 0.42 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.944197E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.644 | TFLOPs: 31.62 | +7: iteration 86710/ 173500 | consumed samples: 22197760 | consumed tokens: 45461012480 | elapsed time per iteration (s): 0.42 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.940378E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.311 | TFLOPs: 31.86 | +7: iteration 86720/ 173500 | consumed samples: 22200320 | consumed tokens: 45466255360 | elapsed time per iteration (s): 0.43 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.937257E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.624 | TFLOPs: 31.09 | +7: iteration 86730/ 173500 | consumed samples: 22202880 | consumed tokens: 45471498240 | elapsed time per iteration (s): 0.42 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.933818E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.405 | TFLOPs: 31.66 | +7: iteration 86740/ 173500 | consumed samples: 22205440 | consumed tokens: 45476741120 | elapsed time per iteration (s): 0.42 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.940618E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.907 | TFLOPs: 31.90 | +7: iteration 86750/ 173500 | consumed samples: 22208000 | consumed tokens: 45481984000 | elapsed time per iteration (s): 0.42 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.950204E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.858 | TFLOPs: 31.63 | +7: iteration 86760/ 173500 | consumed samples: 22210560 | consumed tokens: 45487226880 | elapsed time per iteration (s): 0.42 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.919589E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.199 | TFLOPs: 31.70 | +7: iteration 86770/ 173500 | consumed samples: 22213120 | consumed tokens: 45492469760 | elapsed time per iteration (s): 0.42 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.945566E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.650 | TFLOPs: 31.83 | +7: iteration 86780/ 173500 | consumed samples: 22215680 | consumed tokens: 45497712640 | elapsed time per iteration (s): 0.42 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.940773E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.643 | TFLOPs: 31.83 | +7: iteration 86790/ 173500 | consumed samples: 22218240 | consumed tokens: 45502955520 | elapsed time per iteration (s): 0.46 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.940477E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.247 | TFLOPs: 28.98 | +7: iteration 86800/ 173500 | consumed samples: 22220800 | consumed tokens: 45508198400 | elapsed time per iteration (s): 0.44 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.948048E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.758 | TFLOPs: 30.37 | +7: iteration 86810/ 173500 | consumed samples: 22223360 | consumed tokens: 45513441280 | elapsed time per iteration (s): 0.46 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.942559E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.250 | TFLOPs: 28.92 | +7: iteration 86820/ 173500 | consumed samples: 22225920 | consumed tokens: 45518684160 | elapsed time per iteration (s): 0.46 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.952866E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.978 | TFLOPs: 28.96 | +7: iteration 86830/ 173500 | consumed samples: 22228480 | consumed tokens: 45523927040 | elapsed time per iteration (s): 0.43 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.951807E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.465 | TFLOPs: 31.24 | +7: iteration 86840/ 173500 | consumed samples: 22231040 | consumed tokens: 45529169920 | elapsed time per iteration (s): 0.42 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.945060E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.519 | TFLOPs: 31.72 | +7: iteration 86850/ 173500 | consumed samples: 22233600 | consumed tokens: 45534412800 | elapsed time per iteration (s): 0.44 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.962054E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.962 | TFLOPs: 30.38 | +7: iteration 86860/ 173500 | consumed samples: 22236160 | consumed tokens: 45539655680 | elapsed time per iteration (s): 0.42 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.939867E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.299 | TFLOPs: 31.71 | +7: iteration 86870/ 173500 | consumed samples: 22238720 | consumed tokens: 45544898560 | elapsed time per iteration (s): 0.42 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.933410E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.383 | TFLOPs: 31.92 | +7: iteration 86880/ 173500 | consumed samples: 22241280 | consumed tokens: 45550141440 | elapsed time per iteration (s): 0.43 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.952038E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.102 | TFLOPs: 31.07 | +7: iteration 86890/ 173500 | consumed samples: 22243840 | consumed tokens: 45555384320 | elapsed time per iteration (s): 0.44 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.956133E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.235 | TFLOPs: 30.50 | +7: iteration 86900/ 173500 | consumed samples: 22246400 | consumed tokens: 45560627200 | elapsed time per iteration (s): 0.43 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.941934E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.144 | TFLOPs: 30.91 | +7: iteration 86910/ 173500 | consumed samples: 22248960 | consumed tokens: 45565870080 | elapsed time per iteration (s): 0.46 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.939661E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.933 | TFLOPs: 28.96 | +7: iteration 86920/ 173500 | consumed samples: 22251520 | consumed tokens: 45571112960 | elapsed time per iteration (s): 0.47 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.935143E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.917 | TFLOPs: 28.33 | +7: iteration 86930/ 173500 | consumed samples: 22254080 | consumed tokens: 45576355840 | elapsed time per iteration (s): 0.48 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.933965E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.202 | TFLOPs: 28.13 | +7: iteration 86940/ 173500 | consumed samples: 22256640 | consumed tokens: 45581598720 | elapsed time per iteration (s): 0.45 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.934003E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.212 | TFLOPs: 29.76 | +7: iteration 86950/ 173500 | consumed samples: 22259200 | consumed tokens: 45586841600 | elapsed time per iteration (s): 0.48 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.944450E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.150 | TFLOPs: 28.18 | +7: iteration 86960/ 173500 | consumed samples: 22261760 | consumed tokens: 45592084480 | elapsed time per iteration (s): 0.42 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.933862E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.170 | TFLOPs: 32.12 | +7: iteration 86970/ 173500 | consumed samples: 22264320 | consumed tokens: 45597327360 | elapsed time per iteration (s): 0.42 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.943232E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.713 | TFLOPs: 32.04 | +7: iteration 86980/ 173500 | consumed samples: 22266880 | consumed tokens: 45602570240 | elapsed time per iteration (s): 0.42 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.935062E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.186 | TFLOPs: 32.02 | +7: iteration 86990/ 173500 | consumed samples: 22269440 | consumed tokens: 45607813120 | elapsed time per iteration (s): 0.42 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.932859E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.063 | TFLOPs: 31.75 | +7: iteration 87000/ 173500 | consumed samples: 22272000 | consumed tokens: 45613056000 | elapsed time per iteration (s): 0.42 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.946005E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.770 | TFLOPs: 31.94 | +7: iteration 87010/ 173500 | consumed samples: 22274560 | consumed tokens: 45618298880 | elapsed time per iteration (s): 0.42 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.942316E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.202 | TFLOPs: 31.91 | +7: iteration 87020/ 173500 | consumed samples: 22277120 | consumed tokens: 45623541760 | elapsed time per iteration (s): 0.42 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.956389E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.342 | TFLOPs: 31.71 | +7: iteration 87030/ 173500 | consumed samples: 22279680 | consumed tokens: 45628784640 | elapsed time per iteration (s): 0.42 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.955552E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.803 | TFLOPs: 31.94 | +7: iteration 87040/ 173500 | consumed samples: 22282240 | consumed tokens: 45634027520 | elapsed time per iteration (s): 0.42 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.944727E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.549 | TFLOPs: 31.93 | +7: iteration 87050/ 173500 | consumed samples: 22284800 | consumed tokens: 45639270400 | elapsed time per iteration (s): 0.43 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.948987E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.636 | TFLOPs: 31.15 | +7: iteration 87060/ 173500 | consumed samples: 22287360 | consumed tokens: 45644513280 | elapsed time per iteration (s): 0.42 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.946365E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.780 | TFLOPs: 31.94 | +7: iteration 87070/ 173500 | consumed samples: 22289920 | consumed tokens: 45649756160 | elapsed time per iteration (s): 0.42 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.935585E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.274 | TFLOPs: 31.92 | +7: iteration 87080/ 173500 | consumed samples: 22292480 | consumed tokens: 45654999040 | elapsed time per iteration (s): 0.42 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.932607E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.953 | TFLOPs: 31.74 | +7: iteration 87090/ 173500 | consumed samples: 22295040 | consumed tokens: 45660241920 | elapsed time per iteration (s): 0.42 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.941273E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.049 | TFLOPs: 31.90 | +7: iteration 87100/ 173500 | consumed samples: 22297600 | consumed tokens: 45665484800 | elapsed time per iteration (s): 0.42 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.922020E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.852 | TFLOPs: 31.89 | +7: iteration 87110/ 173500 | consumed samples: 22300160 | consumed tokens: 45670727680 | elapsed time per iteration (s): 0.43 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.937372E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.948 | TFLOPs: 31.58 | +7: iteration 87120/ 173500 | consumed samples: 22302720 | consumed tokens: 45675970560 | elapsed time per iteration (s): 0.42 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.940111E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.278 | TFLOPs: 31.86 | +7: iteration 87130/ 173500 | consumed samples: 22305280 | consumed tokens: 45681213440 | elapsed time per iteration (s): 0.42 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.936847E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.618 | TFLOPs: 31.88 | +7: iteration 87140/ 173500 | consumed samples: 22307840 | consumed tokens: 45686456320 | elapsed time per iteration (s): 0.42 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.937200E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.706 | TFLOPs: 31.89 | +7: iteration 87150/ 173500 | consumed samples: 22310400 | consumed tokens: 45691699200 | elapsed time per iteration (s): 0.42 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.961843E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.744 | TFLOPs: 31.89 | +7: iteration 87160/ 173500 | consumed samples: 22312960 | consumed tokens: 45696942080 | elapsed time per iteration (s): 0.42 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.936285E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.086 | TFLOPs: 31.85 | +7: iteration 87170/ 173500 | consumed samples: 22315520 | consumed tokens: 45702184960 | elapsed time per iteration (s): 0.42 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.937921E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.930 | TFLOPs: 31.84 | +7: iteration 87180/ 173500 | consumed samples: 22318080 | consumed tokens: 45707427840 | elapsed time per iteration (s): 0.42 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.948770E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.977 | TFLOPs: 31.85 | +7: iteration 87190/ 173500 | consumed samples: 22320640 | consumed tokens: 45712670720 | elapsed time per iteration (s): 0.42 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.945392E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.931 | TFLOPs: 31.84 | +7: iteration 87200/ 173500 | consumed samples: 22323200 | consumed tokens: 45717913600 | elapsed time per iteration (s): 0.42 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.936267E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.034 | TFLOPs: 31.85 | +7: iteration 87210/ 173500 | consumed samples: 22325760 | consumed tokens: 45723156480 | elapsed time per iteration (s): 0.42 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.936333E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.799 | TFLOPs: 31.84 | +7: iteration 87220/ 173500 | consumed samples: 22328320 | consumed tokens: 45728399360 | elapsed time per iteration (s): 0.42 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.939985E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.917 | TFLOPs: 31.84 | +7: iteration 87230/ 173500 | consumed samples: 22330880 | consumed tokens: 45733642240 | elapsed time per iteration (s): 0.43 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.938000E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.931 | TFLOPs: 31.01 | +7: iteration 87240/ 173500 | consumed samples: 22333440 | consumed tokens: 45738885120 | elapsed time per iteration (s): 0.42 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.923901E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.501 | TFLOPs: 31.66 | +7: iteration 87250/ 173500 | consumed samples: 22336000 | consumed tokens: 45744128000 | elapsed time per iteration (s): 0.43 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.950025E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.354 | TFLOPs: 31.55 | +7: iteration 87260/ 173500 | consumed samples: 22338560 | consumed tokens: 45749370880 | elapsed time per iteration (s): 0.42 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.939170E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.030 | TFLOPs: 31.85 | +7: iteration 87270/ 173500 | consumed samples: 22341120 | consumed tokens: 45754613760 | elapsed time per iteration (s): 0.42 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.956495E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.992 | TFLOPs: 31.85 | +7: iteration 87280/ 173500 | consumed samples: 22343680 | consumed tokens: 45759856640 | elapsed time per iteration (s): 0.42 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.934360E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.515 | TFLOPs: 31.82 | +7: iteration 87290/ 173500 | consumed samples: 22346240 | consumed tokens: 45765099520 | elapsed time per iteration (s): 0.42 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.939076E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.805 | TFLOPs: 31.84 | +7: iteration 87300/ 173500 | consumed samples: 22348800 | consumed tokens: 45770342400 | elapsed time per iteration (s): 0.42 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.947952E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.736 | TFLOPs: 31.83 | +7: iteration 87310/ 173500 | consumed samples: 22351360 | consumed tokens: 45775585280 | elapsed time per iteration (s): 0.42 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.939936E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.256 | TFLOPs: 31.86 | +7: iteration 87320/ 173500 | consumed samples: 22353920 | consumed tokens: 45780828160 | elapsed time per iteration (s): 0.42 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.938834E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.070 | TFLOPs: 31.85 | +7: iteration 87330/ 173500 | consumed samples: 22356480 | consumed tokens: 45786071040 | elapsed time per iteration (s): 0.42 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.944194E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.233 | TFLOPs: 31.86 | +7: iteration 87340/ 173500 | consumed samples: 22359040 | consumed tokens: 45791313920 | elapsed time per iteration (s): 0.42 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.942487E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.083 | TFLOPs: 31.85 | +7: iteration 87350/ 173500 | consumed samples: 22361600 | consumed tokens: 45796556800 | elapsed time per iteration (s): 0.43 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.946374E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.673 | TFLOPs: 31.36 | +7: iteration 87360/ 173500 | consumed samples: 22364160 | consumed tokens: 45801799680 | elapsed time per iteration (s): 0.42 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.947988E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.007 | TFLOPs: 31.85 | +7: iteration 87370/ 173500 | consumed samples: 22366720 | consumed tokens: 45807042560 | elapsed time per iteration (s): 0.42 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.943003E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.707 | TFLOPs: 31.83 | +7: iteration 87380/ 173500 | consumed samples: 22369280 | consumed tokens: 45812285440 | elapsed time per iteration (s): 0.42 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.956061E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.391 | TFLOPs: 31.82 | +7: iteration 87390/ 173500 | consumed samples: 22371840 | consumed tokens: 45817528320 | elapsed time per iteration (s): 0.42 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.951113E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.105 | TFLOPs: 31.85 | +7: iteration 87400/ 173500 | consumed samples: 22374400 | consumed tokens: 45822771200 | elapsed time per iteration (s): 0.42 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.950456E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.113 | TFLOPs: 31.80 | +7: iteration 87410/ 173500 | consumed samples: 22376960 | consumed tokens: 45828014080 | elapsed time per iteration (s): 0.44 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.940931E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.228 | TFLOPs: 30.86 | +7: iteration 87420/ 173500 | consumed samples: 22379520 | consumed tokens: 45833256960 | elapsed time per iteration (s): 0.43 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.950191E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.793 | TFLOPs: 31.31 | +7: iteration 87430/ 173500 | consumed samples: 22382080 | consumed tokens: 45838499840 | elapsed time per iteration (s): 0.45 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.939069E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.013 | TFLOPs: 30.12 | +7: iteration 87440/ 173500 | consumed samples: 22384640 | consumed tokens: 45843742720 | elapsed time per iteration (s): 0.42 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.947434E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.471 | TFLOPs: 31.93 | +7: iteration 87450/ 173500 | consumed samples: 22387200 | consumed tokens: 45848985600 | elapsed time per iteration (s): 0.44 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.949416E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.671 | TFLOPs: 30.73 | +7: iteration 87460/ 173500 | consumed samples: 22389760 | consumed tokens: 45854228480 | elapsed time per iteration (s): 0.42 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.934299E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.587 | TFLOPs: 32.04 | +7: iteration 87470/ 173500 | consumed samples: 22392320 | consumed tokens: 45859471360 | elapsed time per iteration (s): 0.42 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.933743E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.118 | TFLOPs: 31.91 | +7: iteration 87480/ 173500 | consumed samples: 22394880 | consumed tokens: 45864714240 | elapsed time per iteration (s): 0.42 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.935454E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.243 | TFLOPs: 31.91 | +7: iteration 87490/ 173500 | consumed samples: 22397440 | consumed tokens: 45869957120 | elapsed time per iteration (s): 0.42 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.935991E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.171 | TFLOPs: 31.70 | +7: iteration 87500/ 173500 | consumed samples: 22400000 | consumed tokens: 45875200000 | elapsed time per iteration (s): 0.42 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.936463E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.822 | TFLOPs: 31.89 | +7: iteration 87510/ 173500 | consumed samples: 22402560 | consumed tokens: 45880442880 | elapsed time per iteration (s): 0.42 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.929794E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.046 | TFLOPs: 31.80 | +7: iteration 87520/ 173500 | consumed samples: 22405120 | consumed tokens: 45885685760 | elapsed time per iteration (s): 0.42 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.936317E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.730 | TFLOPs: 31.89 | +7: iteration 87530/ 173500 | consumed samples: 22407680 | consumed tokens: 45890928640 | elapsed time per iteration (s): 0.42 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.936890E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.593 | TFLOPs: 31.88 | +7: iteration 87540/ 173500 | consumed samples: 22410240 | consumed tokens: 45896171520 | elapsed time per iteration (s): 0.42 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.951579E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.268 | TFLOPs: 31.86 | +7: iteration 87550/ 173500 | consumed samples: 22412800 | consumed tokens: 45901414400 | elapsed time per iteration (s): 0.42 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.936863E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.212 | TFLOPs: 31.86 | +7: iteration 87560/ 173500 | consumed samples: 22415360 | consumed tokens: 45906657280 | elapsed time per iteration (s): 0.42 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.941408E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.290 | TFLOPs: 31.86 | +7: iteration 87570/ 173500 | consumed samples: 22417920 | consumed tokens: 45911900160 | elapsed time per iteration (s): 0.42 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.943304E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.990 | TFLOPs: 31.80 | +7: iteration 87580/ 173500 | consumed samples: 22420480 | consumed tokens: 45917143040 | elapsed time per iteration (s): 0.42 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.950484E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.998 | TFLOPs: 31.85 | +7: iteration 87590/ 173500 | consumed samples: 22423040 | consumed tokens: 45922385920 | elapsed time per iteration (s): 0.42 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.947632E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.551 | TFLOPs: 31.82 | +7: iteration 87600/ 173500 | consumed samples: 22425600 | consumed tokens: 45927628800 | elapsed time per iteration (s): 0.42 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.949488E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.772 | TFLOPs: 31.84 | +7: iteration 87610/ 173500 | consumed samples: 22428160 | consumed tokens: 45932871680 | elapsed time per iteration (s): 0.42 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.953088E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.859 | TFLOPs: 31.84 | +7: iteration 87620/ 173500 | consumed samples: 22430720 | consumed tokens: 45938114560 | elapsed time per iteration (s): 0.42 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.946257E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.499 | TFLOPs: 31.82 | +7: iteration 87630/ 173500 | consumed samples: 22433280 | consumed tokens: 45943357440 | elapsed time per iteration (s): 0.42 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.927106E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.742 | TFLOPs: 31.83 | +7: iteration 87640/ 173500 | consumed samples: 22435840 | consumed tokens: 45948600320 | elapsed time per iteration (s): 0.42 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.960625E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.695 | TFLOPs: 31.83 | +7: iteration 87650/ 173500 | consumed samples: 22438400 | consumed tokens: 45953843200 | elapsed time per iteration (s): 0.42 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.946061E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.365 | TFLOPs: 31.81 | +7: iteration 87660/ 173500 | consumed samples: 22440960 | consumed tokens: 45959086080 | elapsed time per iteration (s): 0.42 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.930180E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.890 | TFLOPs: 31.84 | +7: iteration 87670/ 173500 | consumed samples: 22443520 | consumed tokens: 45964328960 | elapsed time per iteration (s): 0.42 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.939758E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.263 | TFLOPs: 31.86 | +7: iteration 87680/ 173500 | consumed samples: 22446080 | consumed tokens: 45969571840 | elapsed time per iteration (s): 0.42 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.929923E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.086 | TFLOPs: 31.85 | +7: iteration 87690/ 173500 | consumed samples: 22448640 | consumed tokens: 45974814720 | elapsed time per iteration (s): 0.42 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.945397E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.150 | TFLOPs: 31.86 | +7: iteration 87700/ 173500 | consumed samples: 22451200 | consumed tokens: 45980057600 | elapsed time per iteration (s): 0.42 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.942824E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.581 | TFLOPs: 31.88 | +7: iteration 87710/ 173500 | consumed samples: 22453760 | consumed tokens: 45985300480 | elapsed time per iteration (s): 0.42 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.938224E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.292 | TFLOPs: 31.86 | +7: iteration 87720/ 173500 | consumed samples: 22456320 | consumed tokens: 45990543360 | elapsed time per iteration (s): 0.42 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.945222E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.056 | TFLOPs: 31.85 | +7: iteration 87730/ 173500 | consumed samples: 22458880 | consumed tokens: 45995786240 | elapsed time per iteration (s): 0.42 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.953652E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.105 | TFLOPs: 31.85 | +7: iteration 87740/ 173500 | consumed samples: 22461440 | consumed tokens: 46001029120 | elapsed time per iteration (s): 0.42 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.944940E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.767 | TFLOPs: 31.84 | +7: iteration 87750/ 173500 | consumed samples: 22464000 | consumed tokens: 46006272000 | elapsed time per iteration (s): 0.43 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.957021E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.701 | TFLOPs: 31.20 | +7: iteration 87760/ 173500 | consumed samples: 22466560 | consumed tokens: 46011514880 | elapsed time per iteration (s): 0.43 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.935080E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.133 | TFLOPs: 31.02 | +7: iteration 87770/ 173500 | consumed samples: 22469120 | consumed tokens: 46016757760 | elapsed time per iteration (s): 0.42 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.938169E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.403 | TFLOPs: 31.92 | +7: iteration 87780/ 173500 | consumed samples: 22471680 | consumed tokens: 46022000640 | elapsed time per iteration (s): 0.42 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.936222E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.206 | TFLOPs: 31.91 | +7: iteration 87790/ 173500 | consumed samples: 22474240 | consumed tokens: 46027243520 | elapsed time per iteration (s): 0.42 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.936302E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.757 | TFLOPs: 31.89 | +7: iteration 87800/ 173500 | consumed samples: 22476800 | consumed tokens: 46032486400 | elapsed time per iteration (s): 0.42 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.925425E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.900 | TFLOPs: 31.90 | +7: iteration 87810/ 173500 | consumed samples: 22479360 | consumed tokens: 46037729280 | elapsed time per iteration (s): 0.42 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.938396E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.781 | TFLOPs: 31.89 | +7: iteration 87820/ 173500 | consumed samples: 22481920 | consumed tokens: 46042972160 | elapsed time per iteration (s): 0.42 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.946143E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.501 | TFLOPs: 31.87 | +7: iteration 87830/ 173500 | consumed samples: 22484480 | consumed tokens: 46048215040 | elapsed time per iteration (s): 0.42 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.945221E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.728 | TFLOPs: 31.89 | +7: iteration 87840/ 173500 | consumed samples: 22487040 | consumed tokens: 46053457920 | elapsed time per iteration (s): 0.42 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.932328E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.041 | TFLOPs: 31.90 | +7: iteration 87850/ 173500 | consumed samples: 22489600 | consumed tokens: 46058700800 | elapsed time per iteration (s): 0.42 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.941676E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.980 | TFLOPs: 31.90 | +7: iteration 87860/ 173500 | consumed samples: 22492160 | consumed tokens: 46063943680 | elapsed time per iteration (s): 0.42 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.936125E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.900 | TFLOPs: 31.90 | +7: iteration 87870/ 173500 | consumed samples: 22494720 | consumed tokens: 46069186560 | elapsed time per iteration (s): 0.42 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.934029E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.354 | TFLOPs: 31.92 | +7: iteration 87880/ 173500 | consumed samples: 22497280 | consumed tokens: 46074429440 | elapsed time per iteration (s): 0.42 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.934173E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.272 | TFLOPs: 31.92 | +7: iteration 87890/ 173500 | consumed samples: 22499840 | consumed tokens: 46079672320 | elapsed time per iteration (s): 0.42 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.937146E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.134 | TFLOPs: 31.91 | +7: iteration 87900/ 173500 | consumed samples: 22502400 | consumed tokens: 46084915200 | elapsed time per iteration (s): 0.42 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.944292E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.514 | TFLOPs: 31.88 | +7: iteration 87910/ 173500 | consumed samples: 22504960 | consumed tokens: 46090158080 | elapsed time per iteration (s): 0.42 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.932667E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.817 | TFLOPs: 31.89 | +7: iteration 87920/ 173500 | consumed samples: 22507520 | consumed tokens: 46095400960 | elapsed time per iteration (s): 0.42 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.940363E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.735 | TFLOPs: 31.89 | +7: iteration 87930/ 173500 | consumed samples: 22510080 | consumed tokens: 46100643840 | elapsed time per iteration (s): 0.42 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.940529E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.774 | TFLOPs: 31.89 | +7: iteration 87940/ 173500 | consumed samples: 22512640 | consumed tokens: 46105886720 | elapsed time per iteration (s): 0.42 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.940096E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.040 | TFLOPs: 31.90 | +7: iteration 87950/ 173500 | consumed samples: 22515200 | consumed tokens: 46111129600 | elapsed time per iteration (s): 0.42 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.947859E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.762 | TFLOPs: 31.89 | +7: iteration 87960/ 173500 | consumed samples: 22517760 | consumed tokens: 46116372480 | elapsed time per iteration (s): 0.42 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.947086E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.741 | TFLOPs: 31.89 | +7: iteration 87970/ 173500 | consumed samples: 22520320 | consumed tokens: 46121615360 | elapsed time per iteration (s): 0.42 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.946828E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.395 | TFLOPs: 31.87 | +7: iteration 87980/ 173500 | consumed samples: 22522880 | consumed tokens: 46126858240 | elapsed time per iteration (s): 0.42 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.939863E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.199 | TFLOPs: 31.86 | +7: iteration 87990/ 173500 | consumed samples: 22525440 | consumed tokens: 46132101120 | elapsed time per iteration (s): 0.42 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.949455E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.683 | TFLOPs: 31.88 | +0: [2023-03-17 09:37:46,593] [INFO] [logging.py:68:log_dist] [Rank 0] step=88000, skipped=0, lr=[0.00010937083470846484, 0.00010937083470846484, 0.00010937083470846484], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 88000/ 173500 | consumed samples: 22528000 | consumed tokens: 46137344000 | elapsed time per iteration (s): 0.42 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.957639E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.313 | TFLOPs: 31.86 | +0: steps: 88000 loss: 2.9783 iter time (s): 0.423 samples/sec: 605.083 +7: iteration 88010/ 173500 | consumed samples: 22530560 | consumed tokens: 46142586880 | elapsed time per iteration (s): 0.42 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.946784E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.215 | TFLOPs: 31.81 | +7: iteration 88020/ 173500 | consumed samples: 22533120 | consumed tokens: 46147829760 | elapsed time per iteration (s): 0.42 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.931516E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.333 | TFLOPs: 31.87 | +7: iteration 88030/ 173500 | consumed samples: 22535680 | consumed tokens: 46153072640 | elapsed time per iteration (s): 0.42 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.943419E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.923 | TFLOPs: 31.90 | +7: iteration 88040/ 173500 | consumed samples: 22538240 | consumed tokens: 46158315520 | elapsed time per iteration (s): 0.42 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.938343E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.466 | TFLOPs: 31.87 | +7: iteration 88050/ 173500 | consumed samples: 22540800 | consumed tokens: 46163558400 | elapsed time per iteration (s): 0.42 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.932599E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.582 | TFLOPs: 31.88 | +7: iteration 88060/ 173500 | consumed samples: 22543360 | consumed tokens: 46168801280 | elapsed time per iteration (s): 0.42 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.945778E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.385 | TFLOPs: 31.87 | +7: iteration 88070/ 173500 | consumed samples: 22545920 | consumed tokens: 46174044160 | elapsed time per iteration (s): 0.42 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.935260E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.505 | TFLOPs: 31.87 | +7: iteration 88080/ 173500 | consumed samples: 22548480 | consumed tokens: 46179287040 | elapsed time per iteration (s): 0.42 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.932196E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.518 | TFLOPs: 31.88 | +7: iteration 88090/ 173500 | consumed samples: 22551040 | consumed tokens: 46184529920 | elapsed time per iteration (s): 0.42 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.941075E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.624 | TFLOPs: 31.88 | +7: iteration 88100/ 173500 | consumed samples: 22553600 | consumed tokens: 46189772800 | elapsed time per iteration (s): 0.42 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.926292E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.142 | TFLOPs: 31.91 | +7: iteration 88110/ 173500 | consumed samples: 22556160 | consumed tokens: 46195015680 | elapsed time per iteration (s): 0.42 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.936817E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.971 | TFLOPs: 31.90 | +7: iteration 88120/ 173500 | consumed samples: 22558720 | consumed tokens: 46200258560 | elapsed time per iteration (s): 0.42 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.937747E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.293 | TFLOPs: 31.86 | +7: iteration 88130/ 173500 | consumed samples: 22561280 | consumed tokens: 46205501440 | elapsed time per iteration (s): 0.42 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.942824E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.595 | TFLOPs: 31.88 | +7: iteration 88140/ 173500 | consumed samples: 22563840 | consumed tokens: 46210744320 | elapsed time per iteration (s): 0.42 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.929698E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.120 | TFLOPs: 31.85 | +7: iteration 88150/ 173500 | consumed samples: 22566400 | consumed tokens: 46215987200 | elapsed time per iteration (s): 0.42 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.927605E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.266 | TFLOPs: 31.86 | +7: iteration 88160/ 173500 | consumed samples: 22568960 | consumed tokens: 46221230080 | elapsed time per iteration (s): 0.42 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.932076E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.118 | TFLOPs: 31.85 | +7: iteration 88170/ 173500 | consumed samples: 22571520 | consumed tokens: 46226472960 | elapsed time per iteration (s): 0.42 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.935122E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.696 | TFLOPs: 31.88 | +7: iteration 88180/ 173500 | consumed samples: 22574080 | consumed tokens: 46231715840 | elapsed time per iteration (s): 0.42 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.933038E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.707 | TFLOPs: 31.89 | +7: iteration 88190/ 173500 | consumed samples: 22576640 | consumed tokens: 46236958720 | elapsed time per iteration (s): 0.42 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.933214E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.351 | TFLOPs: 31.87 | +7: iteration 88200/ 173500 | consumed samples: 22579200 | consumed tokens: 46242201600 | elapsed time per iteration (s): 0.42 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.942806E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.138 | TFLOPs: 31.65 | +7: iteration 88210/ 173500 | consumed samples: 22581760 | consumed tokens: 46247444480 | elapsed time per iteration (s): 0.42 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.934105E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.995 | TFLOPs: 31.85 | +7: iteration 88220/ 173500 | consumed samples: 22584320 | consumed tokens: 46252687360 | elapsed time per iteration (s): 0.42 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.952658E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.745 | TFLOPs: 31.89 | +7: iteration 88230/ 173500 | consumed samples: 22586880 | consumed tokens: 46257930240 | elapsed time per iteration (s): 0.42 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.942566E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.273 | TFLOPs: 31.86 | +7: iteration 88240/ 173500 | consumed samples: 22589440 | consumed tokens: 46263173120 | elapsed time per iteration (s): 0.42 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.945598E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.534 | TFLOPs: 31.88 | +7: iteration 88250/ 173500 | consumed samples: 22592000 | consumed tokens: 46268416000 | elapsed time per iteration (s): 0.42 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.949561E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.493 | TFLOPs: 31.87 | +7: iteration 88260/ 173500 | consumed samples: 22594560 | consumed tokens: 46273658880 | elapsed time per iteration (s): 0.42 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.939957E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.404 | TFLOPs: 31.87 | +7: iteration 88270/ 173500 | consumed samples: 22597120 | consumed tokens: 46278901760 | elapsed time per iteration (s): 0.42 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.925900E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.650 | TFLOPs: 31.88 | +7: iteration 88280/ 173500 | consumed samples: 22599680 | consumed tokens: 46284144640 | elapsed time per iteration (s): 0.42 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.933440E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.595 | TFLOPs: 31.88 | +7: iteration 88290/ 173500 | consumed samples: 22602240 | consumed tokens: 46289387520 | elapsed time per iteration (s): 0.42 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.942627E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.339 | TFLOPs: 31.87 | +7: iteration 88300/ 173500 | consumed samples: 22604800 | consumed tokens: 46294630400 | elapsed time per iteration (s): 0.42 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.941039E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.879 | TFLOPs: 31.84 | +7: iteration 88310/ 173500 | consumed samples: 22607360 | consumed tokens: 46299873280 | elapsed time per iteration (s): 0.42 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.934919E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.559 | TFLOPs: 31.83 | +7: iteration 88320/ 173500 | consumed samples: 22609920 | consumed tokens: 46305116160 | elapsed time per iteration (s): 0.42 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.936857E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.600 | TFLOPs: 31.83 | +7: iteration 88330/ 173500 | consumed samples: 22612480 | consumed tokens: 46310359040 | elapsed time per iteration (s): 0.42 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.939061E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.813 | TFLOPs: 31.84 | +7: iteration 88340/ 173500 | consumed samples: 22615040 | consumed tokens: 46315601920 | elapsed time per iteration (s): 0.42 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.949435E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.275 | TFLOPs: 31.86 | +7: iteration 88350/ 173500 | consumed samples: 22617600 | consumed tokens: 46320844800 | elapsed time per iteration (s): 0.42 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.944045E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.104 | TFLOPs: 31.85 | +7: iteration 88360/ 173500 | consumed samples: 22620160 | consumed tokens: 46326087680 | elapsed time per iteration (s): 0.42 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.960565E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.614 | TFLOPs: 31.67 | +7: iteration 88370/ 173500 | consumed samples: 22622720 | consumed tokens: 46331330560 | elapsed time per iteration (s): 0.42 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.941315E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.347 | TFLOPs: 31.87 | +7: iteration 88380/ 173500 | consumed samples: 22625280 | consumed tokens: 46336573440 | elapsed time per iteration (s): 0.42 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.926888E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.212 | TFLOPs: 31.86 | +7: iteration 88390/ 173500 | consumed samples: 22627840 | consumed tokens: 46341816320 | elapsed time per iteration (s): 0.42 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.927064E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.750 | TFLOPs: 31.84 | +7: iteration 88400/ 173500 | consumed samples: 22630400 | consumed tokens: 46347059200 | elapsed time per iteration (s): 0.42 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.945539E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.800 | TFLOPs: 31.84 | +7: iteration 88410/ 173500 | consumed samples: 22632960 | consumed tokens: 46352302080 | elapsed time per iteration (s): 0.42 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.937319E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.531 | TFLOPs: 31.82 | +7: iteration 88420/ 173500 | consumed samples: 22635520 | consumed tokens: 46357544960 | elapsed time per iteration (s): 0.42 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.933562E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.236 | TFLOPs: 31.86 | +7: iteration 88430/ 173500 | consumed samples: 22638080 | consumed tokens: 46362787840 | elapsed time per iteration (s): 0.42 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.938149E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.437 | TFLOPs: 31.87 | +7: iteration 88440/ 173500 | consumed samples: 22640640 | consumed tokens: 46368030720 | elapsed time per iteration (s): 0.42 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.940264E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.207 | TFLOPs: 31.86 | +7: iteration 88450/ 173500 | consumed samples: 22643200 | consumed tokens: 46373273600 | elapsed time per iteration (s): 0.42 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.943624E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.266 | TFLOPs: 31.86 | +7: iteration 88460/ 173500 | consumed samples: 22645760 | consumed tokens: 46378516480 | elapsed time per iteration (s): 0.42 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.950918E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.089 | TFLOPs: 31.85 | +7: iteration 88470/ 173500 | consumed samples: 22648320 | consumed tokens: 46383759360 | elapsed time per iteration (s): 0.42 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.943259E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.190 | TFLOPs: 31.86 | +7: iteration 88480/ 173500 | consumed samples: 22650880 | consumed tokens: 46389002240 | elapsed time per iteration (s): 0.42 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.946540E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.621 | TFLOPs: 31.88 | +7: iteration 88490/ 173500 | consumed samples: 22653440 | consumed tokens: 46394245120 | elapsed time per iteration (s): 0.42 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.943839E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.341 | TFLOPs: 31.87 | +7: iteration 88500/ 173500 | consumed samples: 22656000 | consumed tokens: 46399488000 | elapsed time per iteration (s): 0.42 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.954364E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.261 | TFLOPs: 31.86 | +7: iteration 88510/ 173500 | consumed samples: 22658560 | consumed tokens: 46404730880 | elapsed time per iteration (s): 0.42 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.953369E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.522 | TFLOPs: 31.88 | +7: iteration 88520/ 173500 | consumed samples: 22661120 | consumed tokens: 46409973760 | elapsed time per iteration (s): 0.42 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.937804E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.921 | TFLOPs: 31.90 | +7: iteration 88530/ 173500 | consumed samples: 22663680 | consumed tokens: 46415216640 | elapsed time per iteration (s): 0.42 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.933105E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.337 | TFLOPs: 31.87 | +7: iteration 88540/ 173500 | consumed samples: 22666240 | consumed tokens: 46420459520 | elapsed time per iteration (s): 0.42 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.937889E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.315 | TFLOPs: 31.86 | +7: iteration 88550/ 173500 | consumed samples: 22668800 | consumed tokens: 46425702400 | elapsed time per iteration (s): 0.42 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.932755E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.605 | TFLOPs: 31.88 | +7: iteration 88560/ 173500 | consumed samples: 22671360 | consumed tokens: 46430945280 | elapsed time per iteration (s): 0.42 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.947258E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.486 | TFLOPs: 31.87 | +7: iteration 88570/ 173500 | consumed samples: 22673920 | consumed tokens: 46436188160 | elapsed time per iteration (s): 0.42 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.949854E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.043 | TFLOPs: 31.85 | +7: iteration 88580/ 173500 | consumed samples: 22676480 | consumed tokens: 46441431040 | elapsed time per iteration (s): 0.42 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.952794E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.446 | TFLOPs: 31.87 | +7: iteration 88590/ 173500 | consumed samples: 22679040 | consumed tokens: 46446673920 | elapsed time per iteration (s): 0.42 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.948503E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.661 | TFLOPs: 31.88 | +7: iteration 88600/ 173500 | consumed samples: 22681600 | consumed tokens: 46451916800 | elapsed time per iteration (s): 0.42 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.934527E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.580 | TFLOPs: 31.88 | +7: iteration 88610/ 173500 | consumed samples: 22684160 | consumed tokens: 46457159680 | elapsed time per iteration (s): 0.42 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.935955E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.247 | TFLOPs: 31.86 | +7: iteration 88620/ 173500 | consumed samples: 22686720 | consumed tokens: 46462402560 | elapsed time per iteration (s): 0.42 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.936271E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.295 | TFLOPs: 31.86 | +7: iteration 88630/ 173500 | consumed samples: 22689280 | consumed tokens: 46467645440 | elapsed time per iteration (s): 0.42 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.959129E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.811 | TFLOPs: 31.89 | +7: iteration 88640/ 173500 | consumed samples: 22691840 | consumed tokens: 46472888320 | elapsed time per iteration (s): 0.42 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.937482E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.181 | TFLOPs: 31.86 | +7: iteration 88650/ 173500 | consumed samples: 22694400 | consumed tokens: 46478131200 | elapsed time per iteration (s): 0.42 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.937634E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.664 | TFLOPs: 31.88 | +7: iteration 88660/ 173500 | consumed samples: 22696960 | consumed tokens: 46483374080 | elapsed time per iteration (s): 0.42 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.945962E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.183 | TFLOPs: 31.86 | +7: iteration 88670/ 173500 | consumed samples: 22699520 | consumed tokens: 46488616960 | elapsed time per iteration (s): 0.42 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.932418E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.935 | TFLOPs: 31.84 | +7: iteration 88680/ 173500 | consumed samples: 22702080 | consumed tokens: 46493859840 | elapsed time per iteration (s): 0.42 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.952428E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.416 | TFLOPs: 31.87 | +7: iteration 88690/ 173500 | consumed samples: 22704640 | consumed tokens: 46499102720 | elapsed time per iteration (s): 0.42 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.928716E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.384 | TFLOPs: 31.87 | +7: iteration 88700/ 173500 | consumed samples: 22707200 | consumed tokens: 46504345600 | elapsed time per iteration (s): 0.42 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.944096E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.978 | TFLOPs: 31.85 | +7: iteration 88710/ 173500 | consumed samples: 22709760 | consumed tokens: 46509588480 | elapsed time per iteration (s): 0.42 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.935969E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.302 | TFLOPs: 31.86 | +7: iteration 88720/ 173500 | consumed samples: 22712320 | consumed tokens: 46514831360 | elapsed time per iteration (s): 0.42 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.946127E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.615 | TFLOPs: 31.88 | +7: iteration 88730/ 173500 | consumed samples: 22714880 | consumed tokens: 46520074240 | elapsed time per iteration (s): 0.42 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.934856E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.346 | TFLOPs: 31.87 | +7: iteration 88740/ 173500 | consumed samples: 22717440 | consumed tokens: 46525317120 | elapsed time per iteration (s): 0.42 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.934828E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.704 | TFLOPs: 31.89 | +7: iteration 88750/ 173500 | consumed samples: 22720000 | consumed tokens: 46530560000 | elapsed time per iteration (s): 0.43 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.946220E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.474 | TFLOPs: 31.14 | +7: iteration 88760/ 173500 | consumed samples: 22722560 | consumed tokens: 46535802880 | elapsed time per iteration (s): 0.42 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.944196E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.976 | TFLOPs: 31.90 | +7: iteration 88770/ 173500 | consumed samples: 22725120 | consumed tokens: 46541045760 | elapsed time per iteration (s): 0.42 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.929247E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.591 | TFLOPs: 31.88 | +7: iteration 88780/ 173500 | consumed samples: 22727680 | consumed tokens: 46546288640 | elapsed time per iteration (s): 0.42 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.952330E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.144 | TFLOPs: 31.91 | +7: iteration 88790/ 173500 | consumed samples: 22730240 | consumed tokens: 46551531520 | elapsed time per iteration (s): 0.42 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.939340E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.804 | TFLOPs: 31.89 | +7: iteration 88800/ 173500 | consumed samples: 22732800 | consumed tokens: 46556774400 | elapsed time per iteration (s): 0.42 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.940817E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.131 | TFLOPs: 31.91 | +7: iteration 88810/ 173500 | consumed samples: 22735360 | consumed tokens: 46562017280 | elapsed time per iteration (s): 0.42 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.925047E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.166 | TFLOPs: 31.91 | +7: iteration 88820/ 173500 | consumed samples: 22737920 | consumed tokens: 46567260160 | elapsed time per iteration (s): 0.42 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.934314E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.611 | TFLOPs: 31.88 | +7: iteration 88830/ 173500 | consumed samples: 22740480 | consumed tokens: 46572503040 | elapsed time per iteration (s): 0.42 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.948285E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.243 | TFLOPs: 31.86 | +7: iteration 88840/ 173500 | consumed samples: 22743040 | consumed tokens: 46577745920 | elapsed time per iteration (s): 0.42 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.939828E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.536 | TFLOPs: 31.88 | +7: iteration 88850/ 173500 | consumed samples: 22745600 | consumed tokens: 46582988800 | elapsed time per iteration (s): 0.42 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.950034E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.702 | TFLOPs: 31.89 | +7: iteration 88860/ 173500 | consumed samples: 22748160 | consumed tokens: 46588231680 | elapsed time per iteration (s): 0.42 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.929926E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.698 | TFLOPs: 31.88 | +7: iteration 88870/ 173500 | consumed samples: 22750720 | consumed tokens: 46593474560 | elapsed time per iteration (s): 0.42 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.939724E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.670 | TFLOPs: 31.88 | +7: iteration 88880/ 173500 | consumed samples: 22753280 | consumed tokens: 46598717440 | elapsed time per iteration (s): 0.42 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.938449E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.957 | TFLOPs: 31.90 | +7: iteration 88890/ 173500 | consumed samples: 22755840 | consumed tokens: 46603960320 | elapsed time per iteration (s): 0.42 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.934953E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.621 | TFLOPs: 31.88 | +7: iteration 88900/ 173500 | consumed samples: 22758400 | consumed tokens: 46609203200 | elapsed time per iteration (s): 0.42 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.938929E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.950 | TFLOPs: 31.90 | +7: iteration 88910/ 173500 | consumed samples: 22760960 | consumed tokens: 46614446080 | elapsed time per iteration (s): 0.42 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.934716E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.006 | TFLOPs: 31.90 | +7: iteration 88920/ 173500 | consumed samples: 22763520 | consumed tokens: 46619688960 | elapsed time per iteration (s): 0.42 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.931957E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.224 | TFLOPs: 31.91 | +7: iteration 88930/ 173500 | consumed samples: 22766080 | consumed tokens: 46624931840 | elapsed time per iteration (s): 0.42 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.919830E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.699 | TFLOPs: 31.88 | +7: iteration 88940/ 173500 | consumed samples: 22768640 | consumed tokens: 46630174720 | elapsed time per iteration (s): 0.42 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.940977E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.356 | TFLOPs: 31.87 | +7: iteration 88950/ 173500 | consumed samples: 22771200 | consumed tokens: 46635417600 | elapsed time per iteration (s): 0.42 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.940129E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.483 | TFLOPs: 31.87 | +7: iteration 88960/ 173500 | consumed samples: 22773760 | consumed tokens: 46640660480 | elapsed time per iteration (s): 0.42 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.938519E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.085 | TFLOPs: 31.85 | +7: iteration 88970/ 173500 | consumed samples: 22776320 | consumed tokens: 46645903360 | elapsed time per iteration (s): 0.42 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.949838E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.634 | TFLOPs: 31.88 | +7: iteration 88980/ 173500 | consumed samples: 22778880 | consumed tokens: 46651146240 | elapsed time per iteration (s): 0.42 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.935707E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.286 | TFLOPs: 31.86 | +7: iteration 88990/ 173500 | consumed samples: 22781440 | consumed tokens: 46656389120 | elapsed time per iteration (s): 0.42 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.937962E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.399 | TFLOPs: 31.87 | +7: iteration 89000/ 173500 | consumed samples: 22784000 | consumed tokens: 46661632000 | elapsed time per iteration (s): 0.42 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.933796E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.152 | TFLOPs: 31.86 | +7: iteration 89010/ 173500 | consumed samples: 22786560 | consumed tokens: 46666874880 | elapsed time per iteration (s): 0.42 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.934890E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.622 | TFLOPs: 31.88 | +7: iteration 89020/ 173500 | consumed samples: 22789120 | consumed tokens: 46672117760 | elapsed time per iteration (s): 0.42 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.934690E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.251 | TFLOPs: 31.86 | +7: iteration 89030/ 173500 | consumed samples: 22791680 | consumed tokens: 46677360640 | elapsed time per iteration (s): 0.42 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.931176E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.790 | TFLOPs: 31.84 | +7: iteration 89040/ 173500 | consumed samples: 22794240 | consumed tokens: 46682603520 | elapsed time per iteration (s): 0.42 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.940058E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.223 | TFLOPs: 31.86 | +7: iteration 89050/ 173500 | consumed samples: 22796800 | consumed tokens: 46687846400 | elapsed time per iteration (s): 0.42 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.934774E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.298 | TFLOPs: 31.86 | +7: iteration 89060/ 173500 | consumed samples: 22799360 | consumed tokens: 46693089280 | elapsed time per iteration (s): 0.42 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.934112E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.376 | TFLOPs: 31.87 | +7: iteration 89070/ 173500 | consumed samples: 22801920 | consumed tokens: 46698332160 | elapsed time per iteration (s): 0.42 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.936956E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.117 | TFLOPs: 31.85 | +7: iteration 89080/ 173500 | consumed samples: 22804480 | consumed tokens: 46703575040 | elapsed time per iteration (s): 0.42 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.935422E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.932 | TFLOPs: 31.84 | +7: iteration 89090/ 173500 | consumed samples: 22807040 | consumed tokens: 46708817920 | elapsed time per iteration (s): 0.42 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.927281E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.391 | TFLOPs: 31.87 | +7: iteration 89100/ 173500 | consumed samples: 22809600 | consumed tokens: 46714060800 | elapsed time per iteration (s): 0.42 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.944764E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.665 | TFLOPs: 31.88 | +7: iteration 89110/ 173500 | consumed samples: 22812160 | consumed tokens: 46719303680 | elapsed time per iteration (s): 0.42 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.939707E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.568 | TFLOPs: 31.88 | +7: iteration 89120/ 173500 | consumed samples: 22814720 | consumed tokens: 46724546560 | elapsed time per iteration (s): 0.42 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.948325E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.309 | TFLOPs: 31.86 | +7: iteration 89130/ 173500 | consumed samples: 22817280 | consumed tokens: 46729789440 | elapsed time per iteration (s): 0.42 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.940114E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.392 | TFLOPs: 31.87 | +7: iteration 89140/ 173500 | consumed samples: 22819840 | consumed tokens: 46735032320 | elapsed time per iteration (s): 0.42 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.936864E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.033 | TFLOPs: 31.85 | +7: iteration 89150/ 173500 | consumed samples: 22822400 | consumed tokens: 46740275200 | elapsed time per iteration (s): 0.42 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.919950E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.284 | TFLOPs: 31.86 | +7: iteration 89160/ 173500 | consumed samples: 22824960 | consumed tokens: 46745518080 | elapsed time per iteration (s): 0.42 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.946817E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.241 | TFLOPs: 31.86 | +7: iteration 89170/ 173500 | consumed samples: 22827520 | consumed tokens: 46750760960 | elapsed time per iteration (s): 0.42 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.932312E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.005 | TFLOPs: 31.64 | +7: iteration 89180/ 173500 | consumed samples: 22830080 | consumed tokens: 46756003840 | elapsed time per iteration (s): 0.43 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.928360E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.328 | TFLOPs: 31.13 | +7: iteration 89190/ 173500 | consumed samples: 22832640 | consumed tokens: 46761246720 | elapsed time per iteration (s): 0.42 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.928838E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.404 | TFLOPs: 31.87 | +7: iteration 89200/ 173500 | consumed samples: 22835200 | consumed tokens: 46766489600 | elapsed time per iteration (s): 0.42 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.936445E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.306 | TFLOPs: 31.86 | +7: iteration 89210/ 173500 | consumed samples: 22837760 | consumed tokens: 46771732480 | elapsed time per iteration (s): 0.42 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.923772E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.352 | TFLOPs: 31.87 | +7: iteration 89220/ 173500 | consumed samples: 22840320 | consumed tokens: 46776975360 | elapsed time per iteration (s): 0.42 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.922511E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.533 | TFLOPs: 31.88 | +7: iteration 89230/ 173500 | consumed samples: 22842880 | consumed tokens: 46782218240 | elapsed time per iteration (s): 0.42 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.929619E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.592 | TFLOPs: 31.77 | +7: iteration 89240/ 173500 | consumed samples: 22845440 | consumed tokens: 46787461120 | elapsed time per iteration (s): 0.43 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.942903E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.299 | TFLOPs: 31.02 | +7: iteration 89250/ 173500 | consumed samples: 22848000 | consumed tokens: 46792704000 | elapsed time per iteration (s): 0.42 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.928116E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.969 | TFLOPs: 31.95 | +7: iteration 89260/ 173500 | consumed samples: 22850560 | consumed tokens: 46797946880 | elapsed time per iteration (s): 0.42 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.936229E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.709 | TFLOPs: 31.94 | +7: iteration 89270/ 173500 | consumed samples: 22853120 | consumed tokens: 46803189760 | elapsed time per iteration (s): 0.42 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.926763E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.794 | TFLOPs: 31.79 | +7: iteration 89280/ 173500 | consumed samples: 22855680 | consumed tokens: 46808432640 | elapsed time per iteration (s): 0.42 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.945157E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.164 | TFLOPs: 31.75 | +7: iteration 89290/ 173500 | consumed samples: 22858240 | consumed tokens: 46813675520 | elapsed time per iteration (s): 0.42 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.933702E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.653 | TFLOPs: 31.94 | +7: iteration 89300/ 173500 | consumed samples: 22860800 | consumed tokens: 46818918400 | elapsed time per iteration (s): 0.42 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.928874E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.756 | TFLOPs: 31.73 | +7: iteration 89310/ 173500 | consumed samples: 22863360 | consumed tokens: 46824161280 | elapsed time per iteration (s): 0.42 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.934478E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.504 | TFLOPs: 31.87 | +7: iteration 89320/ 173500 | consumed samples: 22865920 | consumed tokens: 46829404160 | elapsed time per iteration (s): 0.42 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.937339E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.754 | TFLOPs: 31.63 | +7: iteration 89330/ 173500 | consumed samples: 22868480 | consumed tokens: 46834647040 | elapsed time per iteration (s): 0.42 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.949281E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.047 | TFLOPs: 31.69 | +7: iteration 89340/ 173500 | consumed samples: 22871040 | consumed tokens: 46839889920 | elapsed time per iteration (s): 0.42 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.938287E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.602 | TFLOPs: 31.77 | +7: iteration 89350/ 173500 | consumed samples: 22873600 | consumed tokens: 46845132800 | elapsed time per iteration (s): 0.43 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.939921E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.818 | TFLOPs: 31.58 | +7: iteration 89360/ 173500 | consumed samples: 22876160 | consumed tokens: 46850375680 | elapsed time per iteration (s): 0.43 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.940270E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.236 | TFLOPs: 31.44 | +7: iteration 89370/ 173500 | consumed samples: 22878720 | consumed tokens: 46855618560 | elapsed time per iteration (s): 0.42 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.932613E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.117 | TFLOPs: 31.64 | +7: iteration 89380/ 173500 | consumed samples: 22881280 | consumed tokens: 46860861440 | elapsed time per iteration (s): 0.42 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.921997E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.702 | TFLOPs: 31.94 | +7: iteration 89390/ 173500 | consumed samples: 22883840 | consumed tokens: 46866104320 | elapsed time per iteration (s): 0.42 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.925920E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.023 | TFLOPs: 31.69 | +7: iteration 89400/ 173500 | consumed samples: 22886400 | consumed tokens: 46871347200 | elapsed time per iteration (s): 0.43 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.940561E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.561 | TFLOPs: 31.56 | +7: iteration 89410/ 173500 | consumed samples: 22888960 | consumed tokens: 46876590080 | elapsed time per iteration (s): 0.42 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.930607E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.743 | TFLOPs: 31.89 | +7: iteration 89420/ 173500 | consumed samples: 22891520 | consumed tokens: 46881832960 | elapsed time per iteration (s): 0.42 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.941068E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.817 | TFLOPs: 31.94 | +7: iteration 89430/ 173500 | consumed samples: 22894080 | consumed tokens: 46887075840 | elapsed time per iteration (s): 0.43 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.948483E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.586 | TFLOPs: 31.25 | +7: iteration 89440/ 173500 | consumed samples: 22896640 | consumed tokens: 46892318720 | elapsed time per iteration (s): 0.42 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.935201E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.980 | TFLOPs: 31.95 | +7: iteration 89450/ 173500 | consumed samples: 22899200 | consumed tokens: 46897561600 | elapsed time per iteration (s): 0.43 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.939673E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.182 | TFLOPs: 31.39 | +7: iteration 89460/ 173500 | consumed samples: 22901760 | consumed tokens: 46902804480 | elapsed time per iteration (s): 0.42 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.941285E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.773 | TFLOPs: 31.73 | +7: iteration 89470/ 173500 | consumed samples: 22904320 | consumed tokens: 46908047360 | elapsed time per iteration (s): 0.43 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.935147E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.997 | TFLOPs: 31.59 | +7: iteration 89480/ 173500 | consumed samples: 22906880 | consumed tokens: 46913290240 | elapsed time per iteration (s): 0.42 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.927732E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.059 | TFLOPs: 31.80 | +7: iteration 89490/ 173500 | consumed samples: 22909440 | consumed tokens: 46918533120 | elapsed time per iteration (s): 0.43 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.942850E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.798 | TFLOPs: 31.52 | +7: iteration 89500/ 173500 | consumed samples: 22912000 | consumed tokens: 46923776000 | elapsed time per iteration (s): 0.42 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.933574E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.375 | TFLOPs: 31.76 | +7: iteration 89510/ 173500 | consumed samples: 22914560 | consumed tokens: 46929018880 | elapsed time per iteration (s): 0.43 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.925484E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.791 | TFLOPs: 31.58 | +7: iteration 89520/ 173500 | consumed samples: 22917120 | consumed tokens: 46934261760 | elapsed time per iteration (s): 0.42 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.931925E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.930 | TFLOPs: 31.63 | +7: iteration 89530/ 173500 | consumed samples: 22919680 | consumed tokens: 46939504640 | elapsed time per iteration (s): 0.42 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.927837E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.329 | TFLOPs: 31.66 | +7: iteration 89540/ 173500 | consumed samples: 22922240 | consumed tokens: 46944747520 | elapsed time per iteration (s): 0.42 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.942013E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.779 | TFLOPs: 31.68 | +7: iteration 89550/ 173500 | consumed samples: 22924800 | consumed tokens: 46949990400 | elapsed time per iteration (s): 0.42 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.941582E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.467 | TFLOPs: 31.93 | +7: iteration 89560/ 173500 | consumed samples: 22927360 | consumed tokens: 46955233280 | elapsed time per iteration (s): 0.42 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.935764E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.014 | TFLOPs: 31.95 | +7: iteration 89570/ 173500 | consumed samples: 22929920 | consumed tokens: 46960476160 | elapsed time per iteration (s): 0.43 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.951352E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.389 | TFLOPs: 31.45 | +7: iteration 89580/ 173500 | consumed samples: 22932480 | consumed tokens: 46965719040 | elapsed time per iteration (s): 0.42 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.932891E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.776 | TFLOPs: 31.94 | +7: iteration 89590/ 173500 | consumed samples: 22935040 | consumed tokens: 46970961920 | elapsed time per iteration (s): 0.42 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.936543E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.816 | TFLOPs: 31.73 | +7: iteration 89600/ 173500 | consumed samples: 22937600 | consumed tokens: 46976204800 | elapsed time per iteration (s): 0.43 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.936302E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.213 | TFLOPs: 31.49 | +7: iteration 89610/ 173500 | consumed samples: 22940160 | consumed tokens: 46981447680 | elapsed time per iteration (s): 0.42 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.947871E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.172 | TFLOPs: 31.91 | +7: iteration 89620/ 173500 | consumed samples: 22942720 | consumed tokens: 46986690560 | elapsed time per iteration (s): 0.42 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.925317E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.535 | TFLOPs: 31.72 | +7: iteration 89630/ 173500 | consumed samples: 22945280 | consumed tokens: 46991933440 | elapsed time per iteration (s): 0.43 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.936176E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.128 | TFLOPs: 31.28 | +7: iteration 89640/ 173500 | consumed samples: 22947840 | consumed tokens: 46997176320 | elapsed time per iteration (s): 0.43 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.930586E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.578 | TFLOPs: 31.30 | +7: iteration 89650/ 173500 | consumed samples: 22950400 | consumed tokens: 47002419200 | elapsed time per iteration (s): 0.44 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.955765E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.417 | TFLOPs: 30.72 | +7: iteration 89660/ 173500 | consumed samples: 22952960 | consumed tokens: 47007662080 | elapsed time per iteration (s): 0.42 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.945193E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.653 | TFLOPs: 31.78 | +7: iteration 89670/ 173500 | consumed samples: 22955520 | consumed tokens: 47012904960 | elapsed time per iteration (s): 0.42 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.945146E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.813 | TFLOPs: 31.73 | +7: iteration 89680/ 173500 | consumed samples: 22958080 | consumed tokens: 47018147840 | elapsed time per iteration (s): 0.42 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.933034E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.498 | TFLOPs: 31.77 | +7: iteration 89690/ 173500 | consumed samples: 22960640 | consumed tokens: 47023390720 | elapsed time per iteration (s): 0.42 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.931100E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.583 | TFLOPs: 31.83 | +7: iteration 89700/ 173500 | consumed samples: 22963200 | consumed tokens: 47028633600 | elapsed time per iteration (s): 0.43 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.918375E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.302 | TFLOPs: 31.55 | +7: iteration 89710/ 173500 | consumed samples: 22965760 | consumed tokens: 47033876480 | elapsed time per iteration (s): 0.42 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.923587E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.996 | TFLOPs: 31.95 | +7: iteration 89720/ 173500 | consumed samples: 22968320 | consumed tokens: 47039119360 | elapsed time per iteration (s): 0.42 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.932924E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.491 | TFLOPs: 31.93 | +7: iteration 89730/ 173500 | consumed samples: 22970880 | consumed tokens: 47044362240 | elapsed time per iteration (s): 0.43 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.938544E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.938 | TFLOPs: 31.37 | +7: iteration 89740/ 173500 | consumed samples: 22973440 | consumed tokens: 47049605120 | elapsed time per iteration (s): 0.42 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.931063E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.939 | TFLOPs: 31.95 | +7: iteration 89750/ 173500 | consumed samples: 22976000 | consumed tokens: 47054848000 | elapsed time per iteration (s): 0.42 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.934905E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.696 | TFLOPs: 31.94 | +7: iteration 89760/ 173500 | consumed samples: 22978560 | consumed tokens: 47060090880 | elapsed time per iteration (s): 0.42 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.943048E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.248 | TFLOPs: 31.91 | +7: iteration 89770/ 173500 | consumed samples: 22981120 | consumed tokens: 47065333760 | elapsed time per iteration (s): 0.43 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.934053E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.540 | TFLOPs: 31.51 | +7: iteration 89780/ 173500 | consumed samples: 22983680 | consumed tokens: 47070576640 | elapsed time per iteration (s): 0.42 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.942242E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.510 | TFLOPs: 31.67 | +7: iteration 89790/ 173500 | consumed samples: 22986240 | consumed tokens: 47075819520 | elapsed time per iteration (s): 0.42 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.933279E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.996 | TFLOPs: 31.74 | +7: iteration 89800/ 173500 | consumed samples: 22988800 | consumed tokens: 47081062400 | elapsed time per iteration (s): 0.43 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.937534E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.527 | TFLOPs: 31.56 | +7: iteration 89810/ 173500 | consumed samples: 22991360 | consumed tokens: 47086305280 | elapsed time per iteration (s): 0.43 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.928883E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.132 | TFLOPs: 31.33 | +7: iteration 89820/ 173500 | consumed samples: 22993920 | consumed tokens: 47091548160 | elapsed time per iteration (s): 0.42 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.938946E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.518 | TFLOPs: 31.77 | +7: iteration 89830/ 173500 | consumed samples: 22996480 | consumed tokens: 47096791040 | elapsed time per iteration (s): 0.42 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.938112E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.049 | TFLOPs: 31.75 | +7: iteration 89840/ 173500 | consumed samples: 22999040 | consumed tokens: 47102033920 | elapsed time per iteration (s): 0.42 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.946811E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.808 | TFLOPs: 31.94 | +7: iteration 89850/ 173500 | consumed samples: 23001600 | consumed tokens: 47107276800 | elapsed time per iteration (s): 0.42 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.954696E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.987 | TFLOPs: 31.64 | +7: iteration 89860/ 173500 | consumed samples: 23004160 | consumed tokens: 47112519680 | elapsed time per iteration (s): 0.43 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.935938E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.675 | TFLOPs: 31.57 | +7: iteration 89870/ 173500 | consumed samples: 23006720 | consumed tokens: 47117762560 | elapsed time per iteration (s): 0.42 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.941913E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.536 | TFLOPs: 31.77 | +7: iteration 89880/ 173500 | consumed samples: 23009280 | consumed tokens: 47123005440 | elapsed time per iteration (s): 0.42 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.934733E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.557 | TFLOPs: 31.62 | +7: iteration 89890/ 173500 | consumed samples: 23011840 | consumed tokens: 47128248320 | elapsed time per iteration (s): 0.42 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.950841E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.169 | TFLOPs: 31.75 | +7: iteration 89900/ 173500 | consumed samples: 23014400 | consumed tokens: 47133491200 | elapsed time per iteration (s): 0.42 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.948267E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.625 | TFLOPs: 31.88 | +7: iteration 89910/ 173500 | consumed samples: 23016960 | consumed tokens: 47138734080 | elapsed time per iteration (s): 0.42 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.940643E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.562 | TFLOPs: 31.88 | +7: iteration 89920/ 173500 | consumed samples: 23019520 | consumed tokens: 47143976960 | elapsed time per iteration (s): 0.43 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.943707E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.988 | TFLOPs: 31.48 | +7: iteration 89930/ 173500 | consumed samples: 23022080 | consumed tokens: 47149219840 | elapsed time per iteration (s): 0.42 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.947587E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.075 | TFLOPs: 31.96 | +7: iteration 89940/ 173500 | consumed samples: 23024640 | consumed tokens: 47154462720 | elapsed time per iteration (s): 0.43 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.915118E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.701 | TFLOPs: 31.31 | +7: iteration 89950/ 173500 | consumed samples: 23027200 | consumed tokens: 47159705600 | elapsed time per iteration (s): 0.42 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.936292E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.678 | TFLOPs: 31.88 | +7: iteration 89960/ 173500 | consumed samples: 23029760 | consumed tokens: 47164948480 | elapsed time per iteration (s): 0.42 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.939204E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.600 | TFLOPs: 31.93 | +7: iteration 89970/ 173500 | consumed samples: 23032320 | consumed tokens: 47170191360 | elapsed time per iteration (s): 0.43 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.931468E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.238 | TFLOPs: 31.39 | +7: iteration 89980/ 173500 | consumed samples: 23034880 | consumed tokens: 47175434240 | elapsed time per iteration (s): 0.43 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.937196E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.018 | TFLOPs: 31.32 | +7: iteration 89990/ 173500 | consumed samples: 23037440 | consumed tokens: 47180677120 | elapsed time per iteration (s): 0.42 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.939241E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.960 | TFLOPs: 31.74 | +0: [2023-03-17 09:51:51,729] [INFO] [logging.py:68:log_dist] [Rank 0] step=90000, skipped=0, lr=[0.00010607986950689534, 0.00010607986950689534, 0.00010607986950689534], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 90000/ 173500 | consumed samples: 23040000 | consumed tokens: 47185920000 | elapsed time per iteration (s): 0.42 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.928289E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.110 | TFLOPs: 31.96 | +0: steps: 90000 loss: 2.9085 iter time (s): 0.420 samples/sec: 609.431 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 90000 | lm loss value: 3.242333E+00 | lm loss PPL: 2.559336E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 90000 to checkpoints_221m91b400m +0: [2023-03-17 09:51:51,892] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step90000 is begin to save! +0: [2023-03-17 09:51:51,898] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_01-model_00-model_states.pt... +0: [2023-03-17 09:51:52,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_01-model_00-model_states.pt. +0: [2023-03-17 09:51:52,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_03-model_00-model_states.pt... +0: [2023-03-17 09:51:52,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_03-model_00-model_states.pt. +0: [2023-03-17 09:51:52,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_04-model_00-model_states.pt... +0: [2023-03-17 09:51:52,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_04-model_00-model_states.pt. +0: [2023-03-17 09:51:52,066] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_05-model_00-model_states.pt... +0: [2023-03-17 09:51:52,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_05-model_00-model_states.pt. +0: [2023-03-17 09:51:52,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_06-model_00-model_states.pt... +0: [2023-03-17 09:51:52,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_06-model_00-model_states.pt. +0: [2023-03-17 09:51:52,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_07-model_00-model_states.pt... +0: [2023-03-17 09:51:52,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_07-model_00-model_states.pt. +0: [2023-03-17 09:51:52,139] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_08-model_00-model_states.pt... +0: [2023-03-17 09:51:52,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_08-model_00-model_states.pt. +0: [2023-03-17 09:51:52,163] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_09-model_00-model_states.pt... +0: [2023-03-17 09:51:52,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_09-model_00-model_states.pt. +0: [2023-03-17 09:51:52,187] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_10-model_00-model_states.pt... +0: [2023-03-17 09:51:52,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_10-model_00-model_states.pt. +0: [2023-03-17 09:51:52,212] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_11-model_00-model_states.pt... +0: [2023-03-17 09:51:52,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_11-model_00-model_states.pt. +0: [2023-03-17 09:51:52,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_12-model_00-model_states.pt... +0: [2023-03-17 09:51:52,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_12-model_00-model_states.pt. +0: [2023-03-17 09:51:52,261] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_13-model_00-model_states.pt... +0: [2023-03-17 09:51:52,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_13-model_00-model_states.pt. +0: [2023-03-17 09:51:52,285] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_14-model_00-model_states.pt... +0: [2023-03-17 09:51:52,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_14-model_00-model_states.pt. +0: [2023-03-17 09:51:52,309] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_15-model_00-model_states.pt... +0: [2023-03-17 09:51:52,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_15-model_00-model_states.pt. +0: [2023-03-17 09:51:52,334] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_16-model_00-model_states.pt... +0: [2023-03-17 09:51:52,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_16-model_00-model_states.pt. +0: [2023-03-17 09:51:52,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_17-model_00-model_states.pt... +0: [2023-03-17 09:51:52,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_17-model_00-model_states.pt. +0: [2023-03-17 09:51:52,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_18-model_00-model_states.pt... +0: [2023-03-17 09:51:52,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_18-model_00-model_states.pt. +0: [2023-03-17 09:51:52,409] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_19-model_00-model_states.pt... +0: [2023-03-17 09:51:52,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_19-model_00-model_states.pt. +0: [2023-03-17 09:51:52,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_20-model_00-model_states.pt... +0: [2023-03-17 09:51:52,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_20-model_00-model_states.pt. +0: [2023-03-17 09:51:52,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/layer_22-model_00-model_states.pt... +0: [2023-03-17 09:51:52,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/layer_22-model_00-model_states.pt. +0: [2023-03-17 09:51:52,462] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step90000/mp_rank_00_model_states.pt +0: [2023-03-17 09:51:52,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/mp_rank_00_model_states.pt... +0: [2023-03-17 09:51:52,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/mp_rank_00_model_states.pt. +0: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +6: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-17 09:51:52,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +0: [2023-03-17 09:51:52,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:51:52,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:51:52,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 09:51:52,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +0: [2023-03-17 09:51:52,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:51:52,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 09:51:52,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +0: [2023-03-17 09:51:52,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:51:52,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 09:51:52,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +0: [2023-03-17 09:51:52,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:51:52,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 09:51:52,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +0: [2023-03-17 09:51:52,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:51:52,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 09:51:52,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +0: [2023-03-17 09:51:52,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:51:52,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 09:51:52,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +0: [2023-03-17 09:51:52,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:51:52,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 09:51:52,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +6: [2023-03-17 09:51:52,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:51:52,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:51:52,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:51:52,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:51:52,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:51:52,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:51:52,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:51:52,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 09:51:52,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 09:51:52,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 09:51:52,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 09:51:52,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +6: [2023-03-17 09:51:52,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +6: [2023-03-17 09:51:52,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 09:51:52,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 09:51:52,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 09:51:52,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 09:51:52,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 09:51:52,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +6: [2023-03-17 09:51:52,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +6: [2023-03-17 09:51:52,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +6: [2023-03-17 09:51:52,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +6: [2023-03-17 09:51:52,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +6: [2023-03-17 09:51:52,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +1: [2023-03-17 09:51:52,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:51:52,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 09:51:52,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +1: [2023-03-17 09:51:52,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:51:52,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 09:51:52,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +1: [2023-03-17 09:51:52,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:51:52,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 09:51:52,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +1: [2023-03-17 09:51:52,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:51:52,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 09:51:52,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +7: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +7: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +7: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 09:51:52,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +7: [2023-03-17 09:51:52,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:51:52,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 09:51:52,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 09:51:52,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 09:51:52,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 09:51:52,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +7: [2023-03-17 09:51:52,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +1: [2023-03-17 09:51:52,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:51:52,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:51:52,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:51:52,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 09:51:52,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 09:51:52,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 09:51:52,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +1: [2023-03-17 09:51:52,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +1: [2023-03-17 09:51:52,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 09:51:52,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 09:51:52,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +1: [2023-03-17 09:51:52,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:51:52,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 09:51:52,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 09:51:52,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 09:51:52,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 09:51:52,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 09:51:52,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +5: [2023-03-17 09:51:52,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +5: [2023-03-17 09:51:52,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +5: [2023-03-17 09:51:52,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +4: [2023-03-17 09:51:52,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:51:52,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:51:52,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:51:52,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:51:52,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:51:52,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:51:52,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:51:52,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 09:51:52,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 09:51:52,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 09:51:52,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 09:51:52,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 09:51:52,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 09:51:52,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 09:51:52,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +4: [2023-03-17 09:51:52,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +4: [2023-03-17 09:51:52,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +4: [2023-03-17 09:51:52,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +4: [2023-03-17 09:51:52,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +4: [2023-03-17 09:51:52,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +4: [2023-03-17 09:51:52,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +4: [2023-03-17 09:51:52,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 09:51:52,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 09:51:52,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +2: [2023-03-17 09:51:52,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:51:52,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:51:52,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:51:52,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +0: [2023-03-17 09:51:52,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 09:51:52,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +2: [2023-03-17 09:51:52,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 09:51:52,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 09:51:52,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 09:51:52,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 09:51:52,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +2: [2023-03-17 09:51:52,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +2: [2023-03-17 09:51:52,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +2: [2023-03-17 09:51:52,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +2: [2023-03-17 09:51:52,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:51:52,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 09:51:52,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +2: [2023-03-17 09:51:52,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:51:52,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 09:51:52,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +2: [2023-03-17 09:51:52,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:51:52,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 09:51:52,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +2: [2023-03-17 09:51:52,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 09:51:52,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 09:51:52,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! +0: successfully saved checkpoint at iteration 90000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 741.25 +7: iteration 90010/ 173500 | consumed samples: 23042560 | consumed tokens: 47191162880 | elapsed time per iteration (s): 0.51 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.943776E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 506.369 | TFLOPs: 26.57 | +7: iteration 90020/ 173500 | consumed samples: 23045120 | consumed tokens: 47196405760 | elapsed time per iteration (s): 0.42 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.943623E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.163 | TFLOPs: 31.86 | +7: iteration 90030/ 173500 | consumed samples: 23047680 | consumed tokens: 47201648640 | elapsed time per iteration (s): 0.42 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.936726E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.910 | TFLOPs: 31.74 | +7: iteration 90040/ 173500 | consumed samples: 23050240 | consumed tokens: 47206891520 | elapsed time per iteration (s): 0.43 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.948767E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.297 | TFLOPs: 31.60 | +7: iteration 90050/ 173500 | consumed samples: 23052800 | consumed tokens: 47212134400 | elapsed time per iteration (s): 0.42 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.935290E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.285 | TFLOPs: 31.71 | +7: iteration 90060/ 173500 | consumed samples: 23055360 | consumed tokens: 47217377280 | elapsed time per iteration (s): 0.42 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.945938E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.722 | TFLOPs: 31.68 | +7: iteration 90070/ 173500 | consumed samples: 23057920 | consumed tokens: 47222620160 | elapsed time per iteration (s): 0.42 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.941464E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.828 | TFLOPs: 31.79 | +7: iteration 90080/ 173500 | consumed samples: 23060480 | consumed tokens: 47227863040 | elapsed time per iteration (s): 0.42 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.926657E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.607 | TFLOPs: 31.72 | +7: iteration 90090/ 173500 | consumed samples: 23063040 | consumed tokens: 47233105920 | elapsed time per iteration (s): 0.43 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.933307E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.140 | TFLOPs: 31.54 | +7: iteration 90100/ 173500 | consumed samples: 23065600 | consumed tokens: 47238348800 | elapsed time per iteration (s): 0.42 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.927843E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.624 | TFLOPs: 31.62 | +7: iteration 90110/ 173500 | consumed samples: 23068160 | consumed tokens: 47243591680 | elapsed time per iteration (s): 0.42 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.930330E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.710 | TFLOPs: 31.62 | +7: iteration 90120/ 173500 | consumed samples: 23070720 | consumed tokens: 47248834560 | elapsed time per iteration (s): 0.42 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.944472E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.498 | TFLOPs: 31.77 | +7: iteration 90130/ 173500 | consumed samples: 23073280 | consumed tokens: 47254077440 | elapsed time per iteration (s): 0.43 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.927099E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.029 | TFLOPs: 31.59 | +7: iteration 90140/ 173500 | consumed samples: 23075840 | consumed tokens: 47259320320 | elapsed time per iteration (s): 0.42 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.951709E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.667 | TFLOPs: 31.78 | +7: iteration 90150/ 173500 | consumed samples: 23078400 | consumed tokens: 47264563200 | elapsed time per iteration (s): 0.43 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.939745E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.937 | TFLOPs: 31.58 | +7: iteration 90160/ 173500 | consumed samples: 23080960 | consumed tokens: 47269806080 | elapsed time per iteration (s): 0.42 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.936280E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.936 | TFLOPs: 32.00 | +7: iteration 90170/ 173500 | consumed samples: 23083520 | consumed tokens: 47275048960 | elapsed time per iteration (s): 0.43 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.933092E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.694 | TFLOPs: 31.57 | +7: iteration 90180/ 173500 | consumed samples: 23086080 | consumed tokens: 47280291840 | elapsed time per iteration (s): 0.44 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.948025E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.562 | TFLOPs: 30.51 | +7: iteration 90190/ 173500 | consumed samples: 23088640 | consumed tokens: 47285534720 | elapsed time per iteration (s): 0.42 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.935131E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.118 | TFLOPs: 31.70 | +7: iteration 90200/ 173500 | consumed samples: 23091200 | consumed tokens: 47290777600 | elapsed time per iteration (s): 0.42 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.940146E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.271 | TFLOPs: 31.71 | +7: iteration 90210/ 173500 | consumed samples: 23093760 | consumed tokens: 47296020480 | elapsed time per iteration (s): 0.43 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.928671E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.701 | TFLOPs: 31.15 | +7: iteration 90220/ 173500 | consumed samples: 23096320 | consumed tokens: 47301263360 | elapsed time per iteration (s): 0.42 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.948595E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.618 | TFLOPs: 31.78 | +7: iteration 90230/ 173500 | consumed samples: 23098880 | consumed tokens: 47306506240 | elapsed time per iteration (s): 0.43 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.948582E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.806 | TFLOPs: 31.21 | +7: iteration 90240/ 173500 | consumed samples: 23101440 | consumed tokens: 47311749120 | elapsed time per iteration (s): 0.42 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.942257E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.412 | TFLOPs: 31.87 | +7: iteration 90250/ 173500 | consumed samples: 23104000 | consumed tokens: 47316992000 | elapsed time per iteration (s): 0.42 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.934741E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.081 | TFLOPs: 31.70 | +7: iteration 90260/ 173500 | consumed samples: 23106560 | consumed tokens: 47322234880 | elapsed time per iteration (s): 0.42 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.946252E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.245 | TFLOPs: 31.65 | +7: iteration 90270/ 173500 | consumed samples: 23109120 | consumed tokens: 47327477760 | elapsed time per iteration (s): 0.43 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.939284E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.649 | TFLOPs: 31.57 | +7: iteration 90280/ 173500 | consumed samples: 23111680 | consumed tokens: 47332720640 | elapsed time per iteration (s): 0.43 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.931172E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.174 | TFLOPs: 31.60 | +7: iteration 90290/ 173500 | consumed samples: 23114240 | consumed tokens: 47337963520 | elapsed time per iteration (s): 0.42 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.917707E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.116 | TFLOPs: 31.70 | +7: iteration 90300/ 173500 | consumed samples: 23116800 | consumed tokens: 47343206400 | elapsed time per iteration (s): 0.45 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.916574E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.177 | TFLOPs: 30.13 | +7: iteration 90310/ 173500 | consumed samples: 23119360 | consumed tokens: 47348449280 | elapsed time per iteration (s): 0.42 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.926958E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.673 | TFLOPs: 31.78 | +7: iteration 90320/ 173500 | consumed samples: 23121920 | consumed tokens: 47353692160 | elapsed time per iteration (s): 0.43 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.934362E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.189 | TFLOPs: 30.97 | +7: iteration 90330/ 173500 | consumed samples: 23124480 | consumed tokens: 47358935040 | elapsed time per iteration (s): 0.43 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.931257E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.017 | TFLOPs: 31.38 | +7: iteration 90340/ 173500 | consumed samples: 23127040 | consumed tokens: 47364177920 | elapsed time per iteration (s): 0.42 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.923990E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.346 | TFLOPs: 31.87 | +7: iteration 90350/ 173500 | consumed samples: 23129600 | consumed tokens: 47369420800 | elapsed time per iteration (s): 0.42 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.937684E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.365 | TFLOPs: 31.71 | +7: iteration 90360/ 173500 | consumed samples: 23132160 | consumed tokens: 47374663680 | elapsed time per iteration (s): 0.43 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.934100E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.553 | TFLOPs: 31.51 | +7: iteration 90370/ 173500 | consumed samples: 23134720 | consumed tokens: 47379906560 | elapsed time per iteration (s): 0.43 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.943971E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.471 | TFLOPs: 31.51 | +7: iteration 90380/ 173500 | consumed samples: 23137280 | consumed tokens: 47385149440 | elapsed time per iteration (s): 0.43 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.929379E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.260 | TFLOPs: 31.44 | +7: iteration 90390/ 173500 | consumed samples: 23139840 | consumed tokens: 47390392320 | elapsed time per iteration (s): 0.43 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.925573E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.812 | TFLOPs: 31.47 | +7: iteration 90400/ 173500 | consumed samples: 23142400 | consumed tokens: 47395635200 | elapsed time per iteration (s): 0.42 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.939677E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.388 | TFLOPs: 31.76 | +7: iteration 90410/ 173500 | consumed samples: 23144960 | consumed tokens: 47400878080 | elapsed time per iteration (s): 0.42 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.930969E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.555 | TFLOPs: 31.98 | +7: iteration 90420/ 173500 | consumed samples: 23147520 | consumed tokens: 47406120960 | elapsed time per iteration (s): 0.44 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.926004E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.388 | TFLOPs: 30.71 | +7: iteration 90430/ 173500 | consumed samples: 23150080 | consumed tokens: 47411363840 | elapsed time per iteration (s): 0.44 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.933329E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.061 | TFLOPs: 30.54 | +7: iteration 90440/ 173500 | consumed samples: 23152640 | consumed tokens: 47416606720 | elapsed time per iteration (s): 0.46 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.930874E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.201 | TFLOPs: 29.34 | +7: iteration 90450/ 173500 | consumed samples: 23155200 | consumed tokens: 47421849600 | elapsed time per iteration (s): 0.48 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.936559E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.942 | TFLOPs: 28.17 | +7: iteration 90460/ 173500 | consumed samples: 23157760 | consumed tokens: 47427092480 | elapsed time per iteration (s): 0.47 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.928105E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 544.482 | TFLOPs: 28.57 | +7: iteration 90470/ 173500 | consumed samples: 23160320 | consumed tokens: 47432335360 | elapsed time per iteration (s): 0.44 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.930921E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.556 | TFLOPs: 30.30 | +7: iteration 90480/ 173500 | consumed samples: 23162880 | consumed tokens: 47437578240 | elapsed time per iteration (s): 0.42 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.946391E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.866 | TFLOPs: 32.00 | +7: iteration 90490/ 173500 | consumed samples: 23165440 | consumed tokens: 47442821120 | elapsed time per iteration (s): 0.43 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.947408E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.070 | TFLOPs: 30.91 | +7: iteration 90500/ 173500 | consumed samples: 23168000 | consumed tokens: 47448064000 | elapsed time per iteration (s): 0.44 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.937666E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.106 | TFLOPs: 30.44 | +7: iteration 90510/ 173500 | consumed samples: 23170560 | consumed tokens: 47453306880 | elapsed time per iteration (s): 0.42 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.948401E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.161 | TFLOPs: 31.91 | +7: iteration 90520/ 173500 | consumed samples: 23173120 | consumed tokens: 47458549760 | elapsed time per iteration (s): 0.42 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.950382E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.119 | TFLOPs: 31.64 | +7: iteration 90530/ 173500 | consumed samples: 23175680 | consumed tokens: 47463792640 | elapsed time per iteration (s): 0.42 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.928607E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.794 | TFLOPs: 31.84 | +7: iteration 90540/ 173500 | consumed samples: 23178240 | consumed tokens: 47469035520 | elapsed time per iteration (s): 0.43 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.935610E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.600 | TFLOPs: 31.36 | +7: iteration 90550/ 173500 | consumed samples: 23180800 | consumed tokens: 47474278400 | elapsed time per iteration (s): 0.47 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.945664E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 544.827 | TFLOPs: 28.59 | +7: iteration 90560/ 173500 | consumed samples: 23183360 | consumed tokens: 47479521280 | elapsed time per iteration (s): 0.48 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.939964E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.361 | TFLOPs: 27.98 | +7: iteration 90570/ 173500 | consumed samples: 23185920 | consumed tokens: 47484764160 | elapsed time per iteration (s): 0.47 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.945419E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 546.590 | TFLOPs: 28.68 | +7: iteration 90580/ 173500 | consumed samples: 23188480 | consumed tokens: 47490007040 | elapsed time per iteration (s): 0.45 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.947556E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.467 | TFLOPs: 29.88 | +7: iteration 90590/ 173500 | consumed samples: 23191040 | consumed tokens: 47495249920 | elapsed time per iteration (s): 0.46 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.936190E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.821 | TFLOPs: 28.95 | +7: iteration 90600/ 173500 | consumed samples: 23193600 | consumed tokens: 47500492800 | elapsed time per iteration (s): 0.43 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.933835E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.696 | TFLOPs: 31.31 | +7: iteration 90610/ 173500 | consumed samples: 23196160 | consumed tokens: 47505735680 | elapsed time per iteration (s): 0.42 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.947458E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.011 | TFLOPs: 31.80 | +7: iteration 90620/ 173500 | consumed samples: 23198720 | consumed tokens: 47510978560 | elapsed time per iteration (s): 0.42 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.942569E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.145 | TFLOPs: 31.70 | +7: iteration 90630/ 173500 | consumed samples: 23201280 | consumed tokens: 47516221440 | elapsed time per iteration (s): 0.42 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.947092E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.641 | TFLOPs: 31.99 | +7: iteration 90640/ 173500 | consumed samples: 23203840 | consumed tokens: 47521464320 | elapsed time per iteration (s): 0.42 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.930676E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.609 | TFLOPs: 31.62 | +7: iteration 90650/ 173500 | consumed samples: 23206400 | consumed tokens: 47526707200 | elapsed time per iteration (s): 0.43 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.952086E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.256 | TFLOPs: 31.60 | +7: iteration 90660/ 173500 | consumed samples: 23208960 | consumed tokens: 47531950080 | elapsed time per iteration (s): 0.42 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.929819E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.874 | TFLOPs: 31.79 | +7: iteration 90670/ 173500 | consumed samples: 23211520 | consumed tokens: 47537192960 | elapsed time per iteration (s): 0.42 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.937299E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.688 | TFLOPs: 31.78 | +7: iteration 90680/ 173500 | consumed samples: 23214080 | consumed tokens: 47542435840 | elapsed time per iteration (s): 0.42 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.946504E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.730 | TFLOPs: 31.94 | +7: iteration 90690/ 173500 | consumed samples: 23216640 | consumed tokens: 47547678720 | elapsed time per iteration (s): 0.43 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.921996E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.658 | TFLOPs: 30.94 | +7: iteration 90700/ 173500 | consumed samples: 23219200 | consumed tokens: 47552921600 | elapsed time per iteration (s): 0.42 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.922608E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.989 | TFLOPs: 31.80 | +7: iteration 90710/ 173500 | consumed samples: 23221760 | consumed tokens: 47558164480 | elapsed time per iteration (s): 0.42 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.945156E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.831 | TFLOPs: 31.79 | +7: iteration 90720/ 173500 | consumed samples: 23224320 | consumed tokens: 47563407360 | elapsed time per iteration (s): 0.42 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.948513E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.597 | TFLOPs: 31.98 | +7: iteration 90730/ 173500 | consumed samples: 23226880 | consumed tokens: 47568650240 | elapsed time per iteration (s): 0.42 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.946891E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.400 | TFLOPs: 31.97 | +7: iteration 90740/ 173500 | consumed samples: 23229440 | consumed tokens: 47573893120 | elapsed time per iteration (s): 0.42 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.938078E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.054 | TFLOPs: 31.96 | +7: iteration 90750/ 173500 | consumed samples: 23232000 | consumed tokens: 47579136000 | elapsed time per iteration (s): 0.42 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.939524E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.520 | TFLOPs: 31.61 | +7: iteration 90760/ 173500 | consumed samples: 23234560 | consumed tokens: 47584378880 | elapsed time per iteration (s): 0.42 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.931543E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.249 | TFLOPs: 31.91 | +7: iteration 90770/ 173500 | consumed samples: 23237120 | consumed tokens: 47589621760 | elapsed time per iteration (s): 0.42 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.932623E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.570 | TFLOPs: 31.93 | +7: iteration 90780/ 173500 | consumed samples: 23239680 | consumed tokens: 47594864640 | elapsed time per iteration (s): 0.42 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.953905E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.011 | TFLOPs: 31.90 | +7: iteration 90790/ 173500 | consumed samples: 23242240 | consumed tokens: 47600107520 | elapsed time per iteration (s): 0.42 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.937229E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.583 | TFLOPs: 31.67 | +7: iteration 90800/ 173500 | consumed samples: 23244800 | consumed tokens: 47605350400 | elapsed time per iteration (s): 0.42 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.932977E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.374 | TFLOPs: 31.61 | +7: iteration 90810/ 173500 | consumed samples: 23247360 | consumed tokens: 47610593280 | elapsed time per iteration (s): 0.42 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.937248E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.811 | TFLOPs: 31.68 | +7: iteration 90820/ 173500 | consumed samples: 23249920 | consumed tokens: 47615836160 | elapsed time per iteration (s): 0.42 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.939606E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.368 | TFLOPs: 31.92 | +7: iteration 90830/ 173500 | consumed samples: 23252480 | consumed tokens: 47621079040 | elapsed time per iteration (s): 0.42 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.928422E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.158 | TFLOPs: 31.75 | +7: iteration 90840/ 173500 | consumed samples: 23255040 | consumed tokens: 47626321920 | elapsed time per iteration (s): 0.42 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.947977E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.884 | TFLOPs: 31.89 | +7: iteration 90850/ 173500 | consumed samples: 23257600 | consumed tokens: 47631564800 | elapsed time per iteration (s): 0.42 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.930079E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.756 | TFLOPs: 31.68 | +7: iteration 90860/ 173500 | consumed samples: 23260160 | consumed tokens: 47636807680 | elapsed time per iteration (s): 0.42 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.938980E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.402 | TFLOPs: 31.92 | +7: iteration 90870/ 173500 | consumed samples: 23262720 | consumed tokens: 47642050560 | elapsed time per iteration (s): 0.42 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.929490E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.807 | TFLOPs: 31.89 | +7: iteration 90880/ 173500 | consumed samples: 23265280 | consumed tokens: 47647293440 | elapsed time per iteration (s): 0.43 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.943395E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.562 | TFLOPs: 31.51 | +7: iteration 90890/ 173500 | consumed samples: 23267840 | consumed tokens: 47652536320 | elapsed time per iteration (s): 0.43 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.952145E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.792 | TFLOPs: 31.05 | +7: iteration 90900/ 173500 | consumed samples: 23270400 | consumed tokens: 47657779200 | elapsed time per iteration (s): 0.42 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.929769E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.148 | TFLOPs: 31.91 | +7: iteration 90910/ 173500 | consumed samples: 23272960 | consumed tokens: 47663022080 | elapsed time per iteration (s): 0.42 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.941397E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.671 | TFLOPs: 31.67 | +7: iteration 90920/ 173500 | consumed samples: 23275520 | consumed tokens: 47668264960 | elapsed time per iteration (s): 0.42 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.941667E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.929 | TFLOPs: 31.74 | +7: iteration 90930/ 173500 | consumed samples: 23278080 | consumed tokens: 47673507840 | elapsed time per iteration (s): 0.43 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.946190E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.212 | TFLOPs: 31.54 | +7: iteration 90940/ 173500 | consumed samples: 23280640 | consumed tokens: 47678750720 | elapsed time per iteration (s): 0.42 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.926148E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.497 | TFLOPs: 31.61 | +7: iteration 90950/ 173500 | consumed samples: 23283200 | consumed tokens: 47683993600 | elapsed time per iteration (s): 0.43 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.928978E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.781 | TFLOPs: 31.36 | +7: iteration 90960/ 173500 | consumed samples: 23285760 | consumed tokens: 47689236480 | elapsed time per iteration (s): 0.42 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.943130E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.573 | TFLOPs: 31.88 | +7: iteration 90970/ 173500 | consumed samples: 23288320 | consumed tokens: 47694479360 | elapsed time per iteration (s): 0.42 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.915681E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.231 | TFLOPs: 31.70 | +7: iteration 90980/ 173500 | consumed samples: 23290880 | consumed tokens: 47699722240 | elapsed time per iteration (s): 0.43 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.941791E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.696 | TFLOPs: 31.57 | +7: iteration 90990/ 173500 | consumed samples: 23293440 | consumed tokens: 47704965120 | elapsed time per iteration (s): 0.42 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.922140E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.752 | TFLOPs: 31.68 | +7: iteration 91000/ 173500 | consumed samples: 23296000 | consumed tokens: 47710208000 | elapsed time per iteration (s): 0.43 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.934430E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.499 | TFLOPs: 31.51 | +7: iteration 91010/ 173500 | consumed samples: 23298560 | consumed tokens: 47715450880 | elapsed time per iteration (s): 0.42 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.932399E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.494 | TFLOPs: 31.77 | +7: iteration 91020/ 173500 | consumed samples: 23301120 | consumed tokens: 47720693760 | elapsed time per iteration (s): 0.42 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.931256E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.186 | TFLOPs: 31.70 | +7: iteration 91030/ 173500 | consumed samples: 23303680 | consumed tokens: 47725936640 | elapsed time per iteration (s): 0.43 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.930280E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.616 | TFLOPs: 31.51 | +7: iteration 91040/ 173500 | consumed samples: 23306240 | consumed tokens: 47731179520 | elapsed time per iteration (s): 0.43 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.926832E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.502 | TFLOPs: 31.35 | +7: iteration 91050/ 173500 | consumed samples: 23308800 | consumed tokens: 47736422400 | elapsed time per iteration (s): 0.42 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.950561E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.292 | TFLOPs: 31.65 | +7: iteration 91060/ 173500 | consumed samples: 23311360 | consumed tokens: 47741665280 | elapsed time per iteration (s): 0.42 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.931030E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.152 | TFLOPs: 31.75 | +7: iteration 91070/ 173500 | consumed samples: 23313920 | consumed tokens: 47746908160 | elapsed time per iteration (s): 0.43 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.950777E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.913 | TFLOPs: 31.37 | +7: iteration 91080/ 173500 | consumed samples: 23316480 | consumed tokens: 47752151040 | elapsed time per iteration (s): 0.42 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.944617E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.553 | TFLOPs: 31.88 | +7: iteration 91090/ 173500 | consumed samples: 23319040 | consumed tokens: 47757393920 | elapsed time per iteration (s): 0.42 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.950460E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.373 | TFLOPs: 31.87 | +7: iteration 91100/ 173500 | consumed samples: 23321600 | consumed tokens: 47762636800 | elapsed time per iteration (s): 0.43 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.950154E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.702 | TFLOPs: 31.41 | +7: iteration 91110/ 173500 | consumed samples: 23324160 | consumed tokens: 47767879680 | elapsed time per iteration (s): 0.42 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.922401E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.086 | TFLOPs: 31.70 | +7: iteration 91120/ 173500 | consumed samples: 23326720 | consumed tokens: 47773122560 | elapsed time per iteration (s): 0.43 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.924744E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.298 | TFLOPs: 31.60 | +7: iteration 91130/ 173500 | consumed samples: 23329280 | consumed tokens: 47778365440 | elapsed time per iteration (s): 0.42 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.937180E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.947 | TFLOPs: 31.69 | +7: iteration 91140/ 173500 | consumed samples: 23331840 | consumed tokens: 47783608320 | elapsed time per iteration (s): 0.43 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.930303E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.868 | TFLOPs: 31.47 | +7: iteration 91150/ 173500 | consumed samples: 23334400 | consumed tokens: 47788851200 | elapsed time per iteration (s): 0.42 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.924114E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.189 | TFLOPs: 31.70 | +7: iteration 91160/ 173500 | consumed samples: 23336960 | consumed tokens: 47794094080 | elapsed time per iteration (s): 0.42 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.934090E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.255 | TFLOPs: 31.70 | +7: iteration 91170/ 173500 | consumed samples: 23339520 | consumed tokens: 47799336960 | elapsed time per iteration (s): 0.42 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.931718E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.514 | TFLOPs: 31.61 | +7: iteration 91180/ 173500 | consumed samples: 23342080 | consumed tokens: 47804579840 | elapsed time per iteration (s): 0.42 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.939518E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.053 | TFLOPs: 31.75 | +7: iteration 91190/ 173500 | consumed samples: 23344640 | consumed tokens: 47809822720 | elapsed time per iteration (s): 0.43 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.944518E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.159 | TFLOPs: 31.38 | +7: iteration 91200/ 173500 | consumed samples: 23347200 | consumed tokens: 47815065600 | elapsed time per iteration (s): 0.42 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.922584E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.112 | TFLOPs: 31.70 | +7: iteration 91210/ 173500 | consumed samples: 23349760 | consumed tokens: 47820308480 | elapsed time per iteration (s): 0.43 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.944293E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.052 | TFLOPs: 31.38 | +7: iteration 91220/ 173500 | consumed samples: 23352320 | consumed tokens: 47825551360 | elapsed time per iteration (s): 0.43 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.930120E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.428 | TFLOPs: 31.03 | +7: iteration 91230/ 173500 | consumed samples: 23354880 | consumed tokens: 47830794240 | elapsed time per iteration (s): 0.42 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.924358E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.213 | TFLOPs: 31.86 | +7: iteration 91240/ 173500 | consumed samples: 23357440 | consumed tokens: 47836037120 | elapsed time per iteration (s): 0.43 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.936014E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.134 | TFLOPs: 31.54 | +7: iteration 91250/ 173500 | consumed samples: 23360000 | consumed tokens: 47841280000 | elapsed time per iteration (s): 0.43 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.937642E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.796 | TFLOPs: 31.16 | +7: iteration 91260/ 173500 | consumed samples: 23362560 | consumed tokens: 47846522880 | elapsed time per iteration (s): 0.45 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.919245E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.547 | TFLOPs: 30.09 | +7: iteration 91270/ 173500 | consumed samples: 23365120 | consumed tokens: 47851765760 | elapsed time per iteration (s): 0.44 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.923360E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.416 | TFLOPs: 30.87 | +7: iteration 91280/ 173500 | consumed samples: 23367680 | consumed tokens: 47857008640 | elapsed time per iteration (s): 0.43 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.922436E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.155 | TFLOPs: 31.44 | +7: iteration 91290/ 173500 | consumed samples: 23370240 | consumed tokens: 47862251520 | elapsed time per iteration (s): 0.43 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.931636E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.925 | TFLOPs: 30.95 | +7: iteration 91300/ 173500 | consumed samples: 23372800 | consumed tokens: 47867494400 | elapsed time per iteration (s): 0.42 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.949377E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.741 | TFLOPs: 31.73 | +7: iteration 91310/ 173500 | consumed samples: 23375360 | consumed tokens: 47872737280 | elapsed time per iteration (s): 0.45 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.949750E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.536 | TFLOPs: 29.99 | +7: iteration 91320/ 173500 | consumed samples: 23377920 | consumed tokens: 47877980160 | elapsed time per iteration (s): 0.44 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.945550E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.211 | TFLOPs: 30.86 | +7: iteration 91330/ 173500 | consumed samples: 23380480 | consumed tokens: 47883223040 | elapsed time per iteration (s): 0.43 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.945830E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.834 | TFLOPs: 30.95 | +7: iteration 91340/ 173500 | consumed samples: 23383040 | consumed tokens: 47888465920 | elapsed time per iteration (s): 0.43 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.951295E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.513 | TFLOPs: 31.25 | +7: iteration 91350/ 173500 | consumed samples: 23385600 | consumed tokens: 47893708800 | elapsed time per iteration (s): 0.43 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.945288E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.000 | TFLOPs: 31.38 | +7: iteration 91360/ 173500 | consumed samples: 23388160 | consumed tokens: 47898951680 | elapsed time per iteration (s): 0.44 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.930578E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.698 | TFLOPs: 30.42 | +7: iteration 91370/ 173500 | consumed samples: 23390720 | consumed tokens: 47904194560 | elapsed time per iteration (s): 0.43 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.930080E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.983 | TFLOPs: 31.01 | +7: iteration 91380/ 173500 | consumed samples: 23393280 | consumed tokens: 47909437440 | elapsed time per iteration (s): 0.43 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.939224E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.574 | TFLOPs: 31.09 | +7: iteration 91390/ 173500 | consumed samples: 23395840 | consumed tokens: 47914680320 | elapsed time per iteration (s): 0.44 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.939014E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.404 | TFLOPs: 30.82 | +7: iteration 91400/ 173500 | consumed samples: 23398400 | consumed tokens: 47919923200 | elapsed time per iteration (s): 0.45 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.938563E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.791 | TFLOPs: 30.11 | +7: iteration 91410/ 173500 | consumed samples: 23400960 | consumed tokens: 47925166080 | elapsed time per iteration (s): 0.45 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.948606E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.971 | TFLOPs: 29.75 | +7: iteration 91420/ 173500 | consumed samples: 23403520 | consumed tokens: 47930408960 | elapsed time per iteration (s): 0.44 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.927864E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.716 | TFLOPs: 30.73 | +7: iteration 91430/ 173500 | consumed samples: 23406080 | consumed tokens: 47935651840 | elapsed time per iteration (s): 0.43 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.939226E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.662 | TFLOPs: 31.36 | +7: iteration 91440/ 173500 | consumed samples: 23408640 | consumed tokens: 47940894720 | elapsed time per iteration (s): 0.44 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.923296E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.376 | TFLOPs: 30.56 | +7: iteration 91450/ 173500 | consumed samples: 23411200 | consumed tokens: 47946137600 | elapsed time per iteration (s): 0.44 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.936496E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.813 | TFLOPs: 30.74 | +7: iteration 91460/ 173500 | consumed samples: 23413760 | consumed tokens: 47951380480 | elapsed time per iteration (s): 0.44 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.941271E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.983 | TFLOPs: 30.75 | +7: iteration 91470/ 173500 | consumed samples: 23416320 | consumed tokens: 47956623360 | elapsed time per iteration (s): 0.43 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.928790E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.129 | TFLOPs: 31.12 | +7: iteration 91480/ 173500 | consumed samples: 23418880 | consumed tokens: 47961866240 | elapsed time per iteration (s): 0.44 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.915368E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.392 | TFLOPs: 30.71 | +7: iteration 91490/ 173500 | consumed samples: 23421440 | consumed tokens: 47967109120 | elapsed time per iteration (s): 0.44 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.946236E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.570 | TFLOPs: 30.78 | +7: iteration 91500/ 173500 | consumed samples: 23424000 | consumed tokens: 47972352000 | elapsed time per iteration (s): 0.44 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.918629E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.110 | TFLOPs: 30.49 | +7: iteration 91510/ 173500 | consumed samples: 23426560 | consumed tokens: 47977594880 | elapsed time per iteration (s): 0.43 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.939039E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.785 | TFLOPs: 31.05 | +7: iteration 91520/ 173500 | consumed samples: 23429120 | consumed tokens: 47982837760 | elapsed time per iteration (s): 0.43 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.932522E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.356 | TFLOPs: 31.45 | +7: iteration 91530/ 173500 | consumed samples: 23431680 | consumed tokens: 47988080640 | elapsed time per iteration (s): 0.43 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.939553E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.118 | TFLOPs: 31.59 | +7: iteration 91540/ 173500 | consumed samples: 23434240 | consumed tokens: 47993323520 | elapsed time per iteration (s): 0.44 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.939042E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.653 | TFLOPs: 30.47 | +7: iteration 91550/ 173500 | consumed samples: 23436800 | consumed tokens: 47998566400 | elapsed time per iteration (s): 0.43 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.922002E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.336 | TFLOPs: 31.39 | +7: iteration 91560/ 173500 | consumed samples: 23439360 | consumed tokens: 48003809280 | elapsed time per iteration (s): 0.43 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.932859E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.887 | TFLOPs: 31.53 | +7: iteration 91570/ 173500 | consumed samples: 23441920 | consumed tokens: 48009052160 | elapsed time per iteration (s): 0.43 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.932402E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.900 | TFLOPs: 31.37 | +7: iteration 91580/ 173500 | consumed samples: 23444480 | consumed tokens: 48014295040 | elapsed time per iteration (s): 0.44 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.940612E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.685 | TFLOPs: 30.26 | +7: iteration 91590/ 173500 | consumed samples: 23447040 | consumed tokens: 48019537920 | elapsed time per iteration (s): 0.44 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.917836E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.124 | TFLOPs: 30.86 | +7: iteration 91600/ 173500 | consumed samples: 23449600 | consumed tokens: 48024780800 | elapsed time per iteration (s): 0.45 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.933304E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.978 | TFLOPs: 30.17 | +7: iteration 91610/ 173500 | consumed samples: 23452160 | consumed tokens: 48030023680 | elapsed time per iteration (s): 0.43 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.931288E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.253 | TFLOPs: 31.60 | +7: iteration 91620/ 173500 | consumed samples: 23454720 | consumed tokens: 48035266560 | elapsed time per iteration (s): 0.43 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.940448E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.976 | TFLOPs: 31.37 | +7: iteration 91630/ 173500 | consumed samples: 23457280 | consumed tokens: 48040509440 | elapsed time per iteration (s): 0.43 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.941004E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.653 | TFLOPs: 31.36 | +7: iteration 91640/ 173500 | consumed samples: 23459840 | consumed tokens: 48045752320 | elapsed time per iteration (s): 0.43 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.942473E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.084 | TFLOPs: 31.33 | +7: iteration 91650/ 173500 | consumed samples: 23462400 | consumed tokens: 48050995200 | elapsed time per iteration (s): 0.43 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.939866E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.180 | TFLOPs: 31.28 | +7: iteration 91660/ 173500 | consumed samples: 23464960 | consumed tokens: 48056238080 | elapsed time per iteration (s): 0.44 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.939669E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.033 | TFLOPs: 30.85 | +7: iteration 91670/ 173500 | consumed samples: 23467520 | consumed tokens: 48061480960 | elapsed time per iteration (s): 0.43 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.943883E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.167 | TFLOPs: 31.18 | +7: iteration 91680/ 173500 | consumed samples: 23470080 | consumed tokens: 48066723840 | elapsed time per iteration (s): 0.43 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.937150E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.791 | TFLOPs: 31.05 | +7: iteration 91690/ 173500 | consumed samples: 23472640 | consumed tokens: 48071966720 | elapsed time per iteration (s): 0.43 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.928558E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.599 | TFLOPs: 31.04 | +7: iteration 91700/ 173500 | consumed samples: 23475200 | consumed tokens: 48077209600 | elapsed time per iteration (s): 0.44 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.936924E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.035 | TFLOPs: 30.75 | +7: iteration 91710/ 173500 | consumed samples: 23477760 | consumed tokens: 48082452480 | elapsed time per iteration (s): 0.43 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.929138E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.186 | TFLOPs: 31.28 | +7: iteration 91720/ 173500 | consumed samples: 23480320 | consumed tokens: 48087695360 | elapsed time per iteration (s): 0.44 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.943819E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.358 | TFLOPs: 30.87 | +7: iteration 91730/ 173500 | consumed samples: 23482880 | consumed tokens: 48092938240 | elapsed time per iteration (s): 0.43 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.942987E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.712 | TFLOPs: 30.99 | +7: iteration 91740/ 173500 | consumed samples: 23485440 | consumed tokens: 48098181120 | elapsed time per iteration (s): 0.44 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.938211E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.660 | TFLOPs: 30.83 | +7: iteration 91750/ 173500 | consumed samples: 23488000 | consumed tokens: 48103424000 | elapsed time per iteration (s): 0.44 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.935832E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.691 | TFLOPs: 30.57 | +7: iteration 91760/ 173500 | consumed samples: 23490560 | consumed tokens: 48108666880 | elapsed time per iteration (s): 0.44 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.946486E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.013 | TFLOPs: 30.59 | +7: iteration 91770/ 173500 | consumed samples: 23493120 | consumed tokens: 48113909760 | elapsed time per iteration (s): 0.43 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.921697E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.280 | TFLOPs: 31.23 | +7: iteration 91780/ 173500 | consumed samples: 23495680 | consumed tokens: 48119152640 | elapsed time per iteration (s): 0.43 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.924707E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.297 | TFLOPs: 31.08 | +7: iteration 91790/ 173500 | consumed samples: 23498240 | consumed tokens: 48124395520 | elapsed time per iteration (s): 0.43 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.939474E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.958 | TFLOPs: 30.95 | +7: iteration 91800/ 173500 | consumed samples: 23500800 | consumed tokens: 48129638400 | elapsed time per iteration (s): 0.43 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.937825E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.006 | TFLOPs: 31.53 | +7: iteration 91810/ 173500 | consumed samples: 23503360 | consumed tokens: 48134881280 | elapsed time per iteration (s): 0.43 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.932983E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.839 | TFLOPs: 31.37 | +7: iteration 91820/ 173500 | consumed samples: 23505920 | consumed tokens: 48140124160 | elapsed time per iteration (s): 0.44 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.939508E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.164 | TFLOPs: 30.76 | +7: iteration 91830/ 173500 | consumed samples: 23508480 | consumed tokens: 48145367040 | elapsed time per iteration (s): 0.44 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.928144E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.430 | TFLOPs: 30.72 | +7: iteration 91840/ 173500 | consumed samples: 23511040 | consumed tokens: 48150609920 | elapsed time per iteration (s): 0.45 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.924432E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.008 | TFLOPs: 29.96 | +7: iteration 91850/ 173500 | consumed samples: 23513600 | consumed tokens: 48155852800 | elapsed time per iteration (s): 0.44 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.935181E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.417 | TFLOPs: 30.61 | +7: iteration 91860/ 173500 | consumed samples: 23516160 | consumed tokens: 48161095680 | elapsed time per iteration (s): 0.43 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.944469E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.193 | TFLOPs: 31.02 | +7: iteration 91870/ 173500 | consumed samples: 23518720 | consumed tokens: 48166338560 | elapsed time per iteration (s): 0.44 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.931530E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.000 | TFLOPs: 30.54 | +7: iteration 91880/ 173500 | consumed samples: 23521280 | consumed tokens: 48171581440 | elapsed time per iteration (s): 0.44 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.927496E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.498 | TFLOPs: 30.46 | +7: iteration 91890/ 173500 | consumed samples: 23523840 | consumed tokens: 48176824320 | elapsed time per iteration (s): 0.42 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.935402E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.298 | TFLOPs: 31.76 | +7: iteration 91900/ 173500 | consumed samples: 23526400 | consumed tokens: 48182067200 | elapsed time per iteration (s): 0.43 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.928865E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.980 | TFLOPs: 31.53 | +7: iteration 91910/ 173500 | consumed samples: 23528960 | consumed tokens: 48187310080 | elapsed time per iteration (s): 0.42 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.936522E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.491 | TFLOPs: 31.72 | +7: iteration 91920/ 173500 | consumed samples: 23531520 | consumed tokens: 48192552960 | elapsed time per iteration (s): 0.43 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.935762E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.362 | TFLOPs: 31.55 | +7: iteration 91930/ 173500 | consumed samples: 23534080 | consumed tokens: 48197795840 | elapsed time per iteration (s): 0.43 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.930145E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.428 | TFLOPs: 30.93 | +7: iteration 91940/ 173500 | consumed samples: 23536640 | consumed tokens: 48203038720 | elapsed time per iteration (s): 0.44 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.934506E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.689 | TFLOPs: 30.73 | +7: iteration 91950/ 173500 | consumed samples: 23539200 | consumed tokens: 48208281600 | elapsed time per iteration (s): 0.44 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.943517E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.610 | TFLOPs: 30.67 | +7: iteration 91960/ 173500 | consumed samples: 23541760 | consumed tokens: 48213524480 | elapsed time per iteration (s): 0.43 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.936541E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.743 | TFLOPs: 30.94 | +7: iteration 91970/ 173500 | consumed samples: 23544320 | consumed tokens: 48218767360 | elapsed time per iteration (s): 0.43 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.930710E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.770 | TFLOPs: 30.89 | +7: iteration 91980/ 173500 | consumed samples: 23546880 | consumed tokens: 48224010240 | elapsed time per iteration (s): 0.43 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.934344E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.249 | TFLOPs: 31.28 | +7: iteration 91990/ 173500 | consumed samples: 23549440 | consumed tokens: 48229253120 | elapsed time per iteration (s): 0.44 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.926376E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.302 | TFLOPs: 30.87 | +0: [2023-03-17 10:06:13,085] [INFO] [logging.py:68:log_dist] [Rank 0] step=92000, skipped=0, lr=[0.0001027941492351335, 0.0001027941492351335, 0.0001027941492351335], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 92000/ 173500 | consumed samples: 23552000 | consumed tokens: 48234496000 | elapsed time per iteration (s): 0.43 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.948134E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.426 | TFLOPs: 30.93 | +0: steps: 92000 loss: 2.9739 iter time (s): 0.428 samples/sec: 598.146 +7: iteration 92010/ 173500 | consumed samples: 23554560 | consumed tokens: 48239738880 | elapsed time per iteration (s): 0.43 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.936680E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.555 | TFLOPs: 31.14 | +7: iteration 92020/ 173500 | consumed samples: 23557120 | consumed tokens: 48244981760 | elapsed time per iteration (s): 0.42 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.929306E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.239 | TFLOPs: 31.97 | +7: iteration 92030/ 173500 | consumed samples: 23559680 | consumed tokens: 48250224640 | elapsed time per iteration (s): 0.43 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.940189E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.383 | TFLOPs: 31.13 | +7: iteration 92040/ 173500 | consumed samples: 23562240 | consumed tokens: 48255467520 | elapsed time per iteration (s): 0.43 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.937158E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.224 | TFLOPs: 31.23 | +7: iteration 92050/ 173500 | consumed samples: 23564800 | consumed tokens: 48260710400 | elapsed time per iteration (s): 0.43 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.942632E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.033 | TFLOPs: 31.01 | +7: iteration 92060/ 173500 | consumed samples: 23567360 | consumed tokens: 48265953280 | elapsed time per iteration (s): 0.43 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.948210E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.652 | TFLOPs: 31.10 | +7: iteration 92070/ 173500 | consumed samples: 23569920 | consumed tokens: 48271196160 | elapsed time per iteration (s): 0.43 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.934389E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.738 | TFLOPs: 31.26 | +7: iteration 92080/ 173500 | consumed samples: 23572480 | consumed tokens: 48276439040 | elapsed time per iteration (s): 0.44 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.936216E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.205 | TFLOPs: 30.65 | +7: iteration 92090/ 173500 | consumed samples: 23575040 | consumed tokens: 48281681920 | elapsed time per iteration (s): 0.43 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.917818E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.845 | TFLOPs: 31.42 | +7: iteration 92100/ 173500 | consumed samples: 23577600 | consumed tokens: 48286924800 | elapsed time per iteration (s): 0.43 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.925226E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.928 | TFLOPs: 31.48 | +7: iteration 92110/ 173500 | consumed samples: 23580160 | consumed tokens: 48292167680 | elapsed time per iteration (s): 0.43 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.943576E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.295 | TFLOPs: 31.50 | +7: iteration 92120/ 173500 | consumed samples: 23582720 | consumed tokens: 48297410560 | elapsed time per iteration (s): 0.43 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.936045E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.272 | TFLOPs: 31.13 | +7: iteration 92130/ 173500 | consumed samples: 23585280 | consumed tokens: 48302653440 | elapsed time per iteration (s): 0.43 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.931527E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.753 | TFLOPs: 31.10 | +7: iteration 92140/ 173500 | consumed samples: 23587840 | consumed tokens: 48307896320 | elapsed time per iteration (s): 0.43 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.927168E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.221 | TFLOPs: 31.28 | +7: iteration 92150/ 173500 | consumed samples: 23590400 | consumed tokens: 48313139200 | elapsed time per iteration (s): 0.45 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.935908E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.801 | TFLOPs: 30.00 | +7: iteration 92160/ 173500 | consumed samples: 23592960 | consumed tokens: 48318382080 | elapsed time per iteration (s): 0.43 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.941229E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.354 | TFLOPs: 31.34 | +7: iteration 92170/ 173500 | consumed samples: 23595520 | consumed tokens: 48323624960 | elapsed time per iteration (s): 0.43 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.924399E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.944 | TFLOPs: 31.37 | +7: iteration 92180/ 173500 | consumed samples: 23598080 | consumed tokens: 48328867840 | elapsed time per iteration (s): 0.44 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.932714E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.453 | TFLOPs: 30.19 | +7: iteration 92190/ 173500 | consumed samples: 23600640 | consumed tokens: 48334110720 | elapsed time per iteration (s): 0.44 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.936699E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.381 | TFLOPs: 30.66 | +7: iteration 92200/ 173500 | consumed samples: 23603200 | consumed tokens: 48339353600 | elapsed time per iteration (s): 0.43 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.947933E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.850 | TFLOPs: 30.90 | +7: iteration 92210/ 173500 | consumed samples: 23605760 | consumed tokens: 48344596480 | elapsed time per iteration (s): 0.43 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.946614E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.724 | TFLOPs: 31.26 | +7: iteration 92220/ 173500 | consumed samples: 23608320 | consumed tokens: 48349839360 | elapsed time per iteration (s): 0.44 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.929384E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.571 | TFLOPs: 30.41 | +7: iteration 92230/ 173500 | consumed samples: 23610880 | consumed tokens: 48355082240 | elapsed time per iteration (s): 0.43 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.930749E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.011 | TFLOPs: 31.27 | +7: iteration 92240/ 173500 | consumed samples: 23613440 | consumed tokens: 48360325120 | elapsed time per iteration (s): 0.43 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.943334E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.748 | TFLOPs: 31.26 | +7: iteration 92250/ 173500 | consumed samples: 23616000 | consumed tokens: 48365568000 | elapsed time per iteration (s): 0.43 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.939417E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.533 | TFLOPs: 31.35 | +7: iteration 92260/ 173500 | consumed samples: 23618560 | consumed tokens: 48370810880 | elapsed time per iteration (s): 0.43 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.948343E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.753 | TFLOPs: 31.57 | +7: iteration 92270/ 173500 | consumed samples: 23621120 | consumed tokens: 48376053760 | elapsed time per iteration (s): 0.43 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.924101E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.792 | TFLOPs: 31.58 | +7: iteration 92280/ 173500 | consumed samples: 23623680 | consumed tokens: 48381296640 | elapsed time per iteration (s): 0.43 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.943678E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.700 | TFLOPs: 31.47 | +7: iteration 92290/ 173500 | consumed samples: 23626240 | consumed tokens: 48386539520 | elapsed time per iteration (s): 0.43 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.941509E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.222 | TFLOPs: 31.44 | +7: iteration 92300/ 173500 | consumed samples: 23628800 | consumed tokens: 48391782400 | elapsed time per iteration (s): 0.42 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.942483E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.184 | TFLOPs: 31.65 | +7: iteration 92310/ 173500 | consumed samples: 23631360 | consumed tokens: 48397025280 | elapsed time per iteration (s): 0.43 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.930313E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.363 | TFLOPs: 31.19 | +7: iteration 92320/ 173500 | consumed samples: 23633920 | consumed tokens: 48402268160 | elapsed time per iteration (s): 0.43 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.926521E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.373 | TFLOPs: 31.24 | +7: iteration 92330/ 173500 | consumed samples: 23636480 | consumed tokens: 48407511040 | elapsed time per iteration (s): 0.42 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.929092E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.416 | TFLOPs: 31.71 | +7: iteration 92340/ 173500 | consumed samples: 23639040 | consumed tokens: 48412753920 | elapsed time per iteration (s): 0.43 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.933464E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.419 | TFLOPs: 31.35 | +7: iteration 92350/ 173500 | consumed samples: 23641600 | consumed tokens: 48417996800 | elapsed time per iteration (s): 0.43 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.940516E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.912 | TFLOPs: 31.27 | +7: iteration 92360/ 173500 | consumed samples: 23644160 | consumed tokens: 48423239680 | elapsed time per iteration (s): 0.43 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.925197E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.448 | TFLOPs: 31.35 | +7: iteration 92370/ 173500 | consumed samples: 23646720 | consumed tokens: 48428482560 | elapsed time per iteration (s): 0.43 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.929314E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.205 | TFLOPs: 31.44 | +7: iteration 92380/ 173500 | consumed samples: 23649280 | consumed tokens: 48433725440 | elapsed time per iteration (s): 0.44 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.944772E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.429 | TFLOPs: 30.40 | +7: iteration 92390/ 173500 | consumed samples: 23651840 | consumed tokens: 48438968320 | elapsed time per iteration (s): 0.43 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.929767E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.317 | TFLOPs: 31.55 | +7: iteration 92400/ 173500 | consumed samples: 23654400 | consumed tokens: 48444211200 | elapsed time per iteration (s): 0.42 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.940790E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.960 | TFLOPs: 31.85 | +7: iteration 92410/ 173500 | consumed samples: 23656960 | consumed tokens: 48449454080 | elapsed time per iteration (s): 0.43 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.930871E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.207 | TFLOPs: 31.39 | +7: iteration 92420/ 173500 | consumed samples: 23659520 | consumed tokens: 48454696960 | elapsed time per iteration (s): 0.43 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.937116E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.069 | TFLOPs: 31.59 | +7: iteration 92430/ 173500 | consumed samples: 23662080 | consumed tokens: 48459939840 | elapsed time per iteration (s): 0.42 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.942137E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.060 | TFLOPs: 31.75 | +7: iteration 92440/ 173500 | consumed samples: 23664640 | consumed tokens: 48465182720 | elapsed time per iteration (s): 0.43 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.947791E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.939 | TFLOPs: 31.11 | +7: iteration 92450/ 173500 | consumed samples: 23667200 | consumed tokens: 48470425600 | elapsed time per iteration (s): 0.43 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.940081E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.893 | TFLOPs: 31.58 | +7: iteration 92460/ 173500 | consumed samples: 23669760 | consumed tokens: 48475668480 | elapsed time per iteration (s): 0.42 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.938005E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.358 | TFLOPs: 31.71 | +7: iteration 92470/ 173500 | consumed samples: 23672320 | consumed tokens: 48480911360 | elapsed time per iteration (s): 0.42 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.935085E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.706 | TFLOPs: 31.68 | +7: iteration 92480/ 173500 | consumed samples: 23674880 | consumed tokens: 48486154240 | elapsed time per iteration (s): 0.43 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.918655E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.261 | TFLOPs: 31.18 | +7: iteration 92490/ 173500 | consumed samples: 23677440 | consumed tokens: 48491397120 | elapsed time per iteration (s): 0.42 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.936966E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.169 | TFLOPs: 31.65 | +7: iteration 92500/ 173500 | consumed samples: 23680000 | consumed tokens: 48496640000 | elapsed time per iteration (s): 0.43 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.917266E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.016 | TFLOPs: 31.59 | +7: iteration 92510/ 173500 | consumed samples: 23682560 | consumed tokens: 48501882880 | elapsed time per iteration (s): 0.43 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.936196E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.734 | TFLOPs: 31.41 | +7: iteration 92520/ 173500 | consumed samples: 23685120 | consumed tokens: 48507125760 | elapsed time per iteration (s): 0.42 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.928758E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.824 | TFLOPs: 31.68 | +7: iteration 92530/ 173500 | consumed samples: 23687680 | consumed tokens: 48512368640 | elapsed time per iteration (s): 0.44 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.928723E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.602 | TFLOPs: 30.83 | +7: iteration 92540/ 173500 | consumed samples: 23690240 | consumed tokens: 48517611520 | elapsed time per iteration (s): 0.43 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.931350E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.144 | TFLOPs: 31.59 | +7: iteration 92550/ 173500 | consumed samples: 23692800 | consumed tokens: 48522854400 | elapsed time per iteration (s): 0.42 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.936323E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.428 | TFLOPs: 31.71 | +7: iteration 92560/ 173500 | consumed samples: 23695360 | consumed tokens: 48528097280 | elapsed time per iteration (s): 0.43 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.936271E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.431 | TFLOPs: 31.08 | +7: iteration 92570/ 173500 | consumed samples: 23697920 | consumed tokens: 48533340160 | elapsed time per iteration (s): 0.44 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.919764E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.271 | TFLOPs: 30.81 | +7: iteration 92580/ 173500 | consumed samples: 23700480 | consumed tokens: 48538583040 | elapsed time per iteration (s): 0.43 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.930570E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.505 | TFLOPs: 30.98 | +7: iteration 92590/ 173500 | consumed samples: 23703040 | consumed tokens: 48543825920 | elapsed time per iteration (s): 0.42 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.934395E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.142 | TFLOPs: 31.96 | +7: iteration 92600/ 173500 | consumed samples: 23705600 | consumed tokens: 48549068800 | elapsed time per iteration (s): 0.43 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.937612E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.510 | TFLOPs: 31.56 | +7: iteration 92610/ 173500 | consumed samples: 23708160 | consumed tokens: 48554311680 | elapsed time per iteration (s): 0.43 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.931231E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.959 | TFLOPs: 31.48 | +7: iteration 92620/ 173500 | consumed samples: 23710720 | consumed tokens: 48559554560 | elapsed time per iteration (s): 0.42 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.923197E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.128 | TFLOPs: 31.96 | +7: iteration 92630/ 173500 | consumed samples: 23713280 | consumed tokens: 48564797440 | elapsed time per iteration (s): 0.42 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.949693E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.423 | TFLOPs: 31.77 | +7: iteration 92640/ 173500 | consumed samples: 23715840 | consumed tokens: 48570040320 | elapsed time per iteration (s): 0.43 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.935799E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.816 | TFLOPs: 31.52 | +7: iteration 92650/ 173500 | consumed samples: 23718400 | consumed tokens: 48575283200 | elapsed time per iteration (s): 0.42 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.926198E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.181 | TFLOPs: 31.75 | +7: iteration 92660/ 173500 | consumed samples: 23720960 | consumed tokens: 48580526080 | elapsed time per iteration (s): 0.44 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.936746E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.598 | TFLOPs: 30.83 | +7: iteration 92670/ 173500 | consumed samples: 23723520 | consumed tokens: 48585768960 | elapsed time per iteration (s): 0.44 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.930853E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.340 | TFLOPs: 30.82 | +7: iteration 92680/ 173500 | consumed samples: 23726080 | consumed tokens: 48591011840 | elapsed time per iteration (s): 0.43 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.939339E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.226 | TFLOPs: 31.55 | +7: iteration 92690/ 173500 | consumed samples: 23728640 | consumed tokens: 48596254720 | elapsed time per iteration (s): 0.43 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.948182E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.019 | TFLOPs: 30.90 | +7: iteration 92700/ 173500 | consumed samples: 23731200 | consumed tokens: 48601497600 | elapsed time per iteration (s): 0.43 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.930736E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.435 | TFLOPs: 31.24 | +7: iteration 92710/ 173500 | consumed samples: 23733760 | consumed tokens: 48606740480 | elapsed time per iteration (s): 0.43 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.943062E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.098 | TFLOPs: 31.33 | +7: iteration 92720/ 173500 | consumed samples: 23736320 | consumed tokens: 48611983360 | elapsed time per iteration (s): 0.43 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.933460E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.243 | TFLOPs: 31.07 | +7: iteration 92730/ 173500 | consumed samples: 23738880 | consumed tokens: 48617226240 | elapsed time per iteration (s): 0.43 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.962026E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.243 | TFLOPs: 31.02 | +7: iteration 92740/ 173500 | consumed samples: 23741440 | consumed tokens: 48622469120 | elapsed time per iteration (s): 0.42 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.936341E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.476 | TFLOPs: 31.98 | +7: iteration 92750/ 173500 | consumed samples: 23744000 | consumed tokens: 48627712000 | elapsed time per iteration (s): 0.43 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.937847E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.661 | TFLOPs: 30.99 | +7: iteration 92760/ 173500 | consumed samples: 23746560 | consumed tokens: 48632954880 | elapsed time per iteration (s): 0.43 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.937438E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.543 | TFLOPs: 31.51 | +7: iteration 92770/ 173500 | consumed samples: 23749120 | consumed tokens: 48638197760 | elapsed time per iteration (s): 0.43 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.920856E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.923 | TFLOPs: 31.11 | +7: iteration 92780/ 173500 | consumed samples: 23751680 | consumed tokens: 48643440640 | elapsed time per iteration (s): 0.42 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.931112E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.815 | TFLOPs: 32.00 | +7: iteration 92790/ 173500 | consumed samples: 23754240 | consumed tokens: 48648683520 | elapsed time per iteration (s): 0.43 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.943267E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.450 | TFLOPs: 31.08 | +7: iteration 92800/ 173500 | consumed samples: 23756800 | consumed tokens: 48653926400 | elapsed time per iteration (s): 0.44 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.926589E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.697 | TFLOPs: 30.47 | +7: iteration 92810/ 173500 | consumed samples: 23759360 | consumed tokens: 48659169280 | elapsed time per iteration (s): 0.44 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.918573E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.276 | TFLOPs: 30.81 | +7: iteration 92820/ 173500 | consumed samples: 23761920 | consumed tokens: 48664412160 | elapsed time per iteration (s): 0.44 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.943935E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.265 | TFLOPs: 30.71 | +7: iteration 92830/ 173500 | consumed samples: 23764480 | consumed tokens: 48669655040 | elapsed time per iteration (s): 0.43 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.932061E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.430 | TFLOPs: 31.08 | +7: iteration 92840/ 173500 | consumed samples: 23767040 | consumed tokens: 48674897920 | elapsed time per iteration (s): 0.42 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.948740E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.596 | TFLOPs: 31.77 | +7: iteration 92850/ 173500 | consumed samples: 23769600 | consumed tokens: 48680140800 | elapsed time per iteration (s): 0.42 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.940905E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.888 | TFLOPs: 31.84 | +7: iteration 92860/ 173500 | consumed samples: 23772160 | consumed tokens: 48685383680 | elapsed time per iteration (s): 0.44 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.929831E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.510 | TFLOPs: 30.72 | +7: iteration 92870/ 173500 | consumed samples: 23774720 | consumed tokens: 48690626560 | elapsed time per iteration (s): 0.44 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.931405E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.945 | TFLOPs: 30.69 | +7: iteration 92880/ 173500 | consumed samples: 23777280 | consumed tokens: 48695869440 | elapsed time per iteration (s): 0.44 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.946071E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.287 | TFLOPs: 30.55 | +7: iteration 92890/ 173500 | consumed samples: 23779840 | consumed tokens: 48701112320 | elapsed time per iteration (s): 0.43 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.930230E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.070 | TFLOPs: 31.27 | +7: iteration 92900/ 173500 | consumed samples: 23782400 | consumed tokens: 48706355200 | elapsed time per iteration (s): 0.42 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.928028E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.168 | TFLOPs: 31.86 | +7: iteration 92910/ 173500 | consumed samples: 23784960 | consumed tokens: 48711598080 | elapsed time per iteration (s): 0.43 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.936570E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.353 | TFLOPs: 31.60 | +7: iteration 92920/ 173500 | consumed samples: 23787520 | consumed tokens: 48716840960 | elapsed time per iteration (s): 0.45 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.932603E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.106 | TFLOPs: 30.12 | +7: iteration 92930/ 173500 | consumed samples: 23790080 | consumed tokens: 48722083840 | elapsed time per iteration (s): 0.42 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.926412E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.776 | TFLOPs: 31.89 | +7: iteration 92940/ 173500 | consumed samples: 23792640 | consumed tokens: 48727326720 | elapsed time per iteration (s): 0.42 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.930329E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.109 | TFLOPs: 31.80 | +7: iteration 92950/ 173500 | consumed samples: 23795200 | consumed tokens: 48732569600 | elapsed time per iteration (s): 0.42 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.925031E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.893 | TFLOPs: 31.74 | +7: iteration 92960/ 173500 | consumed samples: 23797760 | consumed tokens: 48737812480 | elapsed time per iteration (s): 0.42 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.930862E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.422 | TFLOPs: 31.66 | +7: iteration 92970/ 173500 | consumed samples: 23800320 | consumed tokens: 48743055360 | elapsed time per iteration (s): 0.42 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.942878E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.800 | TFLOPs: 31.79 | +7: iteration 92980/ 173500 | consumed samples: 23802880 | consumed tokens: 48748298240 | elapsed time per iteration (s): 0.43 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.936237E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.142 | TFLOPs: 30.96 | +7: iteration 92990/ 173500 | consumed samples: 23805440 | consumed tokens: 48753541120 | elapsed time per iteration (s): 0.43 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.935491E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.418 | TFLOPs: 31.19 | +7: iteration 93000/ 173500 | consumed samples: 23808000 | consumed tokens: 48758784000 | elapsed time per iteration (s): 0.43 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.926508E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.817 | TFLOPs: 30.89 | +7: iteration 93010/ 173500 | consumed samples: 23810560 | consumed tokens: 48764026880 | elapsed time per iteration (s): 0.42 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.927717E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.635 | TFLOPs: 31.78 | +7: iteration 93020/ 173500 | consumed samples: 23813120 | consumed tokens: 48769269760 | elapsed time per iteration (s): 0.43 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.934932E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.018 | TFLOPs: 31.48 | +7: iteration 93030/ 173500 | consumed samples: 23815680 | consumed tokens: 48774512640 | elapsed time per iteration (s): 0.43 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.937600E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.331 | TFLOPs: 31.13 | +7: iteration 93040/ 173500 | consumed samples: 23818240 | consumed tokens: 48779755520 | elapsed time per iteration (s): 0.43 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.930094E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.073 | TFLOPs: 31.43 | +7: iteration 93050/ 173500 | consumed samples: 23820800 | consumed tokens: 48784998400 | elapsed time per iteration (s): 0.43 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.928966E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.264 | TFLOPs: 31.60 | +7: iteration 93060/ 173500 | consumed samples: 23823360 | consumed tokens: 48790241280 | elapsed time per iteration (s): 0.43 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.928288E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.575 | TFLOPs: 31.25 | +7: iteration 93070/ 173500 | consumed samples: 23825920 | consumed tokens: 48795484160 | elapsed time per iteration (s): 0.42 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.929808E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.466 | TFLOPs: 31.77 | +7: iteration 93080/ 173500 | consumed samples: 23828480 | consumed tokens: 48800727040 | elapsed time per iteration (s): 0.42 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.934309E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.324 | TFLOPs: 31.71 | +7: iteration 93090/ 173500 | consumed samples: 23831040 | consumed tokens: 48805969920 | elapsed time per iteration (s): 0.42 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.930640E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.869 | TFLOPs: 31.95 | +7: iteration 93100/ 173500 | consumed samples: 23833600 | consumed tokens: 48811212800 | elapsed time per iteration (s): 0.42 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.939383E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.371 | TFLOPs: 31.71 | +7: iteration 93110/ 173500 | consumed samples: 23836160 | consumed tokens: 48816455680 | elapsed time per iteration (s): 0.45 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.926424E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.485 | TFLOPs: 29.88 | +7: iteration 93120/ 173500 | consumed samples: 23838720 | consumed tokens: 48821698560 | elapsed time per iteration (s): 0.43 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.923799E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.395 | TFLOPs: 31.08 | +7: iteration 93130/ 173500 | consumed samples: 23841280 | consumed tokens: 48826941440 | elapsed time per iteration (s): 0.43 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.922201E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.591 | TFLOPs: 31.56 | +7: iteration 93140/ 173500 | consumed samples: 23843840 | consumed tokens: 48832184320 | elapsed time per iteration (s): 0.43 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.941621E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.123 | TFLOPs: 31.54 | +7: iteration 93150/ 173500 | consumed samples: 23846400 | consumed tokens: 48837427200 | elapsed time per iteration (s): 0.42 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.948791E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.760 | TFLOPs: 31.84 | +7: iteration 93160/ 173500 | consumed samples: 23848960 | consumed tokens: 48842670080 | elapsed time per iteration (s): 0.43 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.934727E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.236 | TFLOPs: 30.92 | +7: iteration 93170/ 173500 | consumed samples: 23851520 | consumed tokens: 48847912960 | elapsed time per iteration (s): 0.44 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.928273E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.642 | TFLOPs: 30.83 | +7: iteration 93180/ 173500 | consumed samples: 23854080 | consumed tokens: 48853155840 | elapsed time per iteration (s): 0.43 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.907154E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.943 | TFLOPs: 31.43 | +7: iteration 93190/ 173500 | consumed samples: 23856640 | consumed tokens: 48858398720 | elapsed time per iteration (s): 0.42 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.927720E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.301 | TFLOPs: 31.71 | +7: iteration 93200/ 173500 | consumed samples: 23859200 | consumed tokens: 48863641600 | elapsed time per iteration (s): 0.43 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.951906E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.207 | TFLOPs: 31.23 | +7: iteration 93210/ 173500 | consumed samples: 23861760 | consumed tokens: 48868884480 | elapsed time per iteration (s): 0.43 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.939958E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.573 | TFLOPs: 31.56 | +7: iteration 93220/ 173500 | consumed samples: 23864320 | consumed tokens: 48874127360 | elapsed time per iteration (s): 0.42 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.910539E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.730 | TFLOPs: 31.73 | +7: iteration 93230/ 173500 | consumed samples: 23866880 | consumed tokens: 48879370240 | elapsed time per iteration (s): 0.43 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.936451E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.482 | TFLOPs: 31.03 | +7: iteration 93240/ 173500 | consumed samples: 23869440 | consumed tokens: 48884613120 | elapsed time per iteration (s): 0.43 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.921931E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.199 | TFLOPs: 31.54 | +7: iteration 93250/ 173500 | consumed samples: 23872000 | consumed tokens: 48889856000 | elapsed time per iteration (s): 0.43 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.920541E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.602 | TFLOPs: 31.41 | +7: iteration 93260/ 173500 | consumed samples: 23874560 | consumed tokens: 48895098880 | elapsed time per iteration (s): 0.43 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.931158E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.141 | TFLOPs: 31.49 | +7: iteration 93270/ 173500 | consumed samples: 23877120 | consumed tokens: 48900341760 | elapsed time per iteration (s): 0.43 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.932457E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.362 | TFLOPs: 31.24 | +7: iteration 93280/ 173500 | consumed samples: 23879680 | consumed tokens: 48905584640 | elapsed time per iteration (s): 0.42 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.937387E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.407 | TFLOPs: 31.66 | +7: iteration 93290/ 173500 | consumed samples: 23882240 | consumed tokens: 48910827520 | elapsed time per iteration (s): 0.42 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.925057E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.234 | TFLOPs: 31.70 | +7: iteration 93300/ 173500 | consumed samples: 23884800 | consumed tokens: 48916070400 | elapsed time per iteration (s): 0.43 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.934228E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.602 | TFLOPs: 31.57 | +7: iteration 93310/ 173500 | consumed samples: 23887360 | consumed tokens: 48921313280 | elapsed time per iteration (s): 0.42 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.929956E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.076 | TFLOPs: 31.80 | +7: iteration 93320/ 173500 | consumed samples: 23889920 | consumed tokens: 48926556160 | elapsed time per iteration (s): 0.43 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.936711E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.334 | TFLOPs: 31.55 | +7: iteration 93330/ 173500 | consumed samples: 23892480 | consumed tokens: 48931799040 | elapsed time per iteration (s): 0.42 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.939743E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.868 | TFLOPs: 31.95 | +7: iteration 93340/ 173500 | consumed samples: 23895040 | consumed tokens: 48937041920 | elapsed time per iteration (s): 0.45 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.927056E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.248 | TFLOPs: 30.18 | +7: iteration 93350/ 173500 | consumed samples: 23897600 | consumed tokens: 48942284800 | elapsed time per iteration (s): 0.42 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.923023E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.788 | TFLOPs: 31.63 | +7: iteration 93360/ 173500 | consumed samples: 23900160 | consumed tokens: 48947527680 | elapsed time per iteration (s): 0.45 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.933012E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.030 | TFLOPs: 30.17 | +7: iteration 93370/ 173500 | consumed samples: 23902720 | consumed tokens: 48952770560 | elapsed time per iteration (s): 0.43 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.916977E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.805 | TFLOPs: 31.52 | +7: iteration 93380/ 173500 | consumed samples: 23905280 | consumed tokens: 48958013440 | elapsed time per iteration (s): 0.42 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.927571E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.121 | TFLOPs: 31.96 | +7: iteration 93390/ 173500 | consumed samples: 23907840 | consumed tokens: 48963256320 | elapsed time per iteration (s): 0.42 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.940559E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.056 | TFLOPs: 31.69 | +7: iteration 93400/ 173500 | consumed samples: 23910400 | consumed tokens: 48968499200 | elapsed time per iteration (s): 0.44 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.945749E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.273 | TFLOPs: 30.81 | +7: iteration 93410/ 173500 | consumed samples: 23912960 | consumed tokens: 48973742080 | elapsed time per iteration (s): 0.43 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.939376E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.494 | TFLOPs: 31.35 | +7: iteration 93420/ 173500 | consumed samples: 23915520 | consumed tokens: 48978984960 | elapsed time per iteration (s): 0.42 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.946231E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.777 | TFLOPs: 31.84 | +7: iteration 93430/ 173500 | consumed samples: 23918080 | consumed tokens: 48984227840 | elapsed time per iteration (s): 0.43 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.927152E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.632 | TFLOPs: 31.30 | +7: iteration 93440/ 173500 | consumed samples: 23920640 | consumed tokens: 48989470720 | elapsed time per iteration (s): 0.43 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.938951E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.291 | TFLOPs: 31.34 | +7: iteration 93450/ 173500 | consumed samples: 23923200 | consumed tokens: 48994713600 | elapsed time per iteration (s): 0.43 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.924587E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.980 | TFLOPs: 31.53 | +7: iteration 93460/ 173500 | consumed samples: 23925760 | consumed tokens: 48999956480 | elapsed time per iteration (s): 0.42 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.940984E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.050 | TFLOPs: 31.96 | +7: iteration 93470/ 173500 | consumed samples: 23928320 | consumed tokens: 49005199360 | elapsed time per iteration (s): 0.43 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.922433E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.203 | TFLOPs: 31.23 | +7: iteration 93480/ 173500 | consumed samples: 23930880 | consumed tokens: 49010442240 | elapsed time per iteration (s): 0.42 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.943016E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.500 | TFLOPs: 31.82 | +7: iteration 93490/ 173500 | consumed samples: 23933440 | consumed tokens: 49015685120 | elapsed time per iteration (s): 0.43 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.933175E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.386 | TFLOPs: 31.45 | +7: iteration 93500/ 173500 | consumed samples: 23936000 | consumed tokens: 49020928000 | elapsed time per iteration (s): 0.43 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.925180E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.287 | TFLOPs: 31.13 | +7: iteration 93510/ 173500 | consumed samples: 23938560 | consumed tokens: 49026170880 | elapsed time per iteration (s): 0.43 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.917251E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.397 | TFLOPs: 31.50 | +7: iteration 93520/ 173500 | consumed samples: 23941120 | consumed tokens: 49031413760 | elapsed time per iteration (s): 0.42 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.937484E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.463 | TFLOPs: 31.66 | +7: iteration 93530/ 173500 | consumed samples: 23943680 | consumed tokens: 49036656640 | elapsed time per iteration (s): 0.43 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.923506E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.824 | TFLOPs: 31.31 | +7: iteration 93540/ 173500 | consumed samples: 23946240 | consumed tokens: 49041899520 | elapsed time per iteration (s): 0.44 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.947929E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.104 | TFLOPs: 30.59 | +7: iteration 93550/ 173500 | consumed samples: 23948800 | consumed tokens: 49047142400 | elapsed time per iteration (s): 0.44 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.930867E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.627 | TFLOPs: 30.57 | +7: iteration 93560/ 173500 | consumed samples: 23951360 | consumed tokens: 49052385280 | elapsed time per iteration (s): 0.42 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.935019E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.455 | TFLOPs: 31.87 | +7: iteration 93570/ 173500 | consumed samples: 23953920 | consumed tokens: 49057628160 | elapsed time per iteration (s): 0.42 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.923689E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.172 | TFLOPs: 31.96 | +7: iteration 93580/ 173500 | consumed samples: 23956480 | consumed tokens: 49062871040 | elapsed time per iteration (s): 0.44 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.950192E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.141 | TFLOPs: 30.33 | +7: iteration 93590/ 173500 | consumed samples: 23959040 | consumed tokens: 49068113920 | elapsed time per iteration (s): 0.42 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.922954E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.244 | TFLOPs: 32.02 | +7: iteration 93600/ 173500 | consumed samples: 23961600 | consumed tokens: 49073356800 | elapsed time per iteration (s): 0.43 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.929851E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.090 | TFLOPs: 31.38 | +7: iteration 93610/ 173500 | consumed samples: 23964160 | consumed tokens: 49078599680 | elapsed time per iteration (s): 0.42 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.928356E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.998 | TFLOPs: 31.74 | +7: iteration 93620/ 173500 | consumed samples: 23966720 | consumed tokens: 49083842560 | elapsed time per iteration (s): 0.43 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.932388E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.349 | TFLOPs: 31.60 | +7: iteration 93630/ 173500 | consumed samples: 23969280 | consumed tokens: 49089085440 | elapsed time per iteration (s): 0.43 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.913609E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.102 | TFLOPs: 30.96 | +7: iteration 93640/ 173500 | consumed samples: 23971840 | consumed tokens: 49094328320 | elapsed time per iteration (s): 0.43 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.929083E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.997 | TFLOPs: 31.59 | +7: iteration 93650/ 173500 | consumed samples: 23974400 | consumed tokens: 49099571200 | elapsed time per iteration (s): 0.43 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.922617E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.810 | TFLOPs: 31.00 | +7: iteration 93660/ 173500 | consumed samples: 23976960 | consumed tokens: 49104814080 | elapsed time per iteration (s): 0.45 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.923017E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.383 | TFLOPs: 30.14 | +7: iteration 93670/ 173500 | consumed samples: 23979520 | consumed tokens: 49110056960 | elapsed time per iteration (s): 0.43 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.922331E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.222 | TFLOPs: 31.02 | +7: iteration 93680/ 173500 | consumed samples: 23982080 | consumed tokens: 49115299840 | elapsed time per iteration (s): 0.43 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 2.933675E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.331 | TFLOPs: 30.92 | +7: iteration 93690/ 173500 | consumed samples: 23984640 | consumed tokens: 49120542720 | elapsed time per iteration (s): 0.42 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 2.929795E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.596 | TFLOPs: 31.77 | +7: iteration 93700/ 173500 | consumed samples: 23987200 | consumed tokens: 49125785600 | elapsed time per iteration (s): 0.42 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 2.928912E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.329 | TFLOPs: 31.81 | +7: iteration 93710/ 173500 | consumed samples: 23989760 | consumed tokens: 49131028480 | elapsed time per iteration (s): 0.43 | learning rate: 9.999E-05 | global batch size: 256 | lm loss: 2.938228E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.018 | TFLOPs: 31.48 | +7: iteration 93720/ 173500 | consumed samples: 23992320 | consumed tokens: 49136271360 | elapsed time per iteration (s): 0.42 | learning rate: 9.998E-05 | global batch size: 256 | lm loss: 2.938174E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.091 | TFLOPs: 31.80 | +7: iteration 93730/ 173500 | consumed samples: 23994880 | consumed tokens: 49141514240 | elapsed time per iteration (s): 0.42 | learning rate: 9.996E-05 | global batch size: 256 | lm loss: 2.933844E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.194 | TFLOPs: 31.81 | +7: iteration 93740/ 173500 | consumed samples: 23997440 | consumed tokens: 49146757120 | elapsed time per iteration (s): 0.43 | learning rate: 9.994E-05 | global batch size: 256 | lm loss: 2.937962E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.071 | TFLOPs: 31.43 | +7: iteration 93750/ 173500 | consumed samples: 24000000 | consumed tokens: 49152000000 | elapsed time per iteration (s): 0.42 | learning rate: 9.993E-05 | global batch size: 256 | lm loss: 2.936709E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.391 | TFLOPs: 31.61 | +7: iteration 93760/ 173500 | consumed samples: 24002560 | consumed tokens: 49157242880 | elapsed time per iteration (s): 0.44 | learning rate: 9.991E-05 | global batch size: 256 | lm loss: 2.936942E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.403 | TFLOPs: 30.77 | +7: iteration 93770/ 173500 | consumed samples: 24005120 | consumed tokens: 49162485760 | elapsed time per iteration (s): 0.42 | learning rate: 9.989E-05 | global batch size: 256 | lm loss: 2.936125E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.490 | TFLOPs: 31.98 | +7: iteration 93780/ 173500 | consumed samples: 24007680 | consumed tokens: 49167728640 | elapsed time per iteration (s): 0.42 | learning rate: 9.988E-05 | global batch size: 256 | lm loss: 2.936173E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.551 | TFLOPs: 31.72 | +7: iteration 93790/ 173500 | consumed samples: 24010240 | consumed tokens: 49172971520 | elapsed time per iteration (s): 0.42 | learning rate: 9.986E-05 | global batch size: 256 | lm loss: 2.915737E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.175 | TFLOPs: 31.75 | +7: iteration 93800/ 173500 | consumed samples: 24012800 | consumed tokens: 49178214400 | elapsed time per iteration (s): 0.42 | learning rate: 9.985E-05 | global batch size: 256 | lm loss: 2.938482E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.294 | TFLOPs: 31.81 | +7: iteration 93810/ 173500 | consumed samples: 24015360 | consumed tokens: 49183457280 | elapsed time per iteration (s): 0.42 | learning rate: 9.983E-05 | global batch size: 256 | lm loss: 2.928868E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.411 | TFLOPs: 31.66 | +7: iteration 93820/ 173500 | consumed samples: 24017920 | consumed tokens: 49188700160 | elapsed time per iteration (s): 0.42 | learning rate: 9.981E-05 | global batch size: 256 | lm loss: 2.931118E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.899 | TFLOPs: 31.79 | +7: iteration 93830/ 173500 | consumed samples: 24020480 | consumed tokens: 49193943040 | elapsed time per iteration (s): 0.42 | learning rate: 9.980E-05 | global batch size: 256 | lm loss: 2.937021E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.023 | TFLOPs: 31.74 | +7: iteration 93840/ 173500 | consumed samples: 24023040 | consumed tokens: 49199185920 | elapsed time per iteration (s): 0.43 | learning rate: 9.978E-05 | global batch size: 256 | lm loss: 2.942232E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.772 | TFLOPs: 31.42 | +7: iteration 93850/ 173500 | consumed samples: 24025600 | consumed tokens: 49204428800 | elapsed time per iteration (s): 0.42 | learning rate: 9.976E-05 | global batch size: 256 | lm loss: 2.933878E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.010 | TFLOPs: 31.90 | +7: iteration 93860/ 173500 | consumed samples: 24028160 | consumed tokens: 49209671680 | elapsed time per iteration (s): 0.42 | learning rate: 9.975E-05 | global batch size: 256 | lm loss: 2.917648E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.745 | TFLOPs: 31.68 | +7: iteration 93870/ 173500 | consumed samples: 24030720 | consumed tokens: 49214914560 | elapsed time per iteration (s): 0.42 | learning rate: 9.973E-05 | global batch size: 256 | lm loss: 2.936257E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.147 | TFLOPs: 31.70 | +7: iteration 93880/ 173500 | consumed samples: 24033280 | consumed tokens: 49220157440 | elapsed time per iteration (s): 0.43 | learning rate: 9.971E-05 | global batch size: 256 | lm loss: 2.931261E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.390 | TFLOPs: 30.98 | +7: iteration 93890/ 173500 | consumed samples: 24035840 | consumed tokens: 49225400320 | elapsed time per iteration (s): 0.42 | learning rate: 9.970E-05 | global batch size: 256 | lm loss: 2.930547E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.473 | TFLOPs: 31.77 | +7: iteration 93900/ 173500 | consumed samples: 24038400 | consumed tokens: 49230643200 | elapsed time per iteration (s): 0.42 | learning rate: 9.968E-05 | global batch size: 256 | lm loss: 2.931326E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.440 | TFLOPs: 31.66 | +7: iteration 93910/ 173500 | consumed samples: 24040960 | consumed tokens: 49235886080 | elapsed time per iteration (s): 0.44 | learning rate: 9.967E-05 | global batch size: 256 | lm loss: 2.922181E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.865 | TFLOPs: 30.63 | +7: iteration 93920/ 173500 | consumed samples: 24043520 | consumed tokens: 49241128960 | elapsed time per iteration (s): 0.43 | learning rate: 9.965E-05 | global batch size: 256 | lm loss: 2.927697E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.750 | TFLOPs: 31.52 | +7: iteration 93930/ 173500 | consumed samples: 24046080 | consumed tokens: 49246371840 | elapsed time per iteration (s): 0.43 | learning rate: 9.963E-05 | global batch size: 256 | lm loss: 2.934377E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.938 | TFLOPs: 31.43 | +7: iteration 93940/ 173500 | consumed samples: 24048640 | consumed tokens: 49251614720 | elapsed time per iteration (s): 0.43 | learning rate: 9.962E-05 | global batch size: 256 | lm loss: 2.923772E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.276 | TFLOPs: 31.50 | +7: iteration 93950/ 173500 | consumed samples: 24051200 | consumed tokens: 49256857600 | elapsed time per iteration (s): 0.43 | learning rate: 9.960E-05 | global batch size: 256 | lm loss: 2.931000E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.545 | TFLOPs: 31.56 | +7: iteration 93960/ 173500 | consumed samples: 24053760 | consumed tokens: 49262100480 | elapsed time per iteration (s): 0.42 | learning rate: 9.958E-05 | global batch size: 256 | lm loss: 2.937701E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.789 | TFLOPs: 31.84 | +7: iteration 93970/ 173500 | consumed samples: 24056320 | consumed tokens: 49267343360 | elapsed time per iteration (s): 0.42 | learning rate: 9.957E-05 | global batch size: 256 | lm loss: 2.938982E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.277 | TFLOPs: 31.92 | +7: iteration 93980/ 173500 | consumed samples: 24058880 | consumed tokens: 49272586240 | elapsed time per iteration (s): 0.44 | learning rate: 9.955E-05 | global batch size: 256 | lm loss: 2.935640E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.972 | TFLOPs: 30.80 | +7: iteration 93990/ 173500 | consumed samples: 24061440 | consumed tokens: 49277829120 | elapsed time per iteration (s): 0.44 | learning rate: 9.953E-05 | global batch size: 256 | lm loss: 2.936016E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.576 | TFLOPs: 30.62 | +0: [2023-03-17 10:20:30,014] [INFO] [logging.py:68:log_dist] [Rank 0] step=94000, skipped=0, lr=[9.951807001525316e-05, 9.951807001525316e-05, 9.951807001525316e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 94000/ 173500 | consumed samples: 24064000 | consumed tokens: 49283072000 | elapsed time per iteration (s): 0.43 | learning rate: 9.952E-05 | global batch size: 256 | lm loss: 2.932906E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.320 | TFLOPs: 31.50 | +0: steps: 94000 loss: 2.9177 iter time (s): 0.426 samples/sec: 600.807 +7: iteration 94010/ 173500 | consumed samples: 24066560 | consumed tokens: 49288314880 | elapsed time per iteration (s): 0.43 | learning rate: 9.950E-05 | global batch size: 256 | lm loss: 2.924196E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.409 | TFLOPs: 31.29 | +7: iteration 94020/ 173500 | consumed samples: 24069120 | consumed tokens: 49293557760 | elapsed time per iteration (s): 0.44 | learning rate: 9.949E-05 | global batch size: 256 | lm loss: 2.941941E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.859 | TFLOPs: 30.84 | +7: iteration 94030/ 173500 | consumed samples: 24071680 | consumed tokens: 49298800640 | elapsed time per iteration (s): 0.43 | learning rate: 9.947E-05 | global batch size: 256 | lm loss: 2.936247E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.573 | TFLOPs: 31.14 | +7: iteration 94040/ 173500 | consumed samples: 24074240 | consumed tokens: 49304043520 | elapsed time per iteration (s): 0.43 | learning rate: 9.945E-05 | global batch size: 256 | lm loss: 2.929471E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.867 | TFLOPs: 31.58 | +7: iteration 94050/ 173500 | consumed samples: 24076800 | consumed tokens: 49309286400 | elapsed time per iteration (s): 0.43 | learning rate: 9.944E-05 | global batch size: 256 | lm loss: 2.935358E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.175 | TFLOPs: 31.54 | +7: iteration 94060/ 173500 | consumed samples: 24079360 | consumed tokens: 49314529280 | elapsed time per iteration (s): 0.43 | learning rate: 9.942E-05 | global batch size: 256 | lm loss: 2.942824E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.065 | TFLOPs: 31.33 | +7: iteration 94070/ 173500 | consumed samples: 24081920 | consumed tokens: 49319772160 | elapsed time per iteration (s): 0.43 | learning rate: 9.940E-05 | global batch size: 256 | lm loss: 2.930912E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.827 | TFLOPs: 31.58 | +7: iteration 94080/ 173500 | consumed samples: 24084480 | consumed tokens: 49325015040 | elapsed time per iteration (s): 0.43 | learning rate: 9.939E-05 | global batch size: 256 | lm loss: 2.916418E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.820 | TFLOPs: 31.42 | +7: iteration 94090/ 173500 | consumed samples: 24087040 | consumed tokens: 49330257920 | elapsed time per iteration (s): 0.43 | learning rate: 9.937E-05 | global batch size: 256 | lm loss: 2.914468E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.340 | TFLOPs: 31.45 | +7: iteration 94100/ 173500 | consumed samples: 24089600 | consumed tokens: 49335500800 | elapsed time per iteration (s): 0.42 | learning rate: 9.935E-05 | global batch size: 256 | lm loss: 2.938554E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.395 | TFLOPs: 31.76 | +7: iteration 94110/ 173500 | consumed samples: 24092160 | consumed tokens: 49340743680 | elapsed time per iteration (s): 0.43 | learning rate: 9.934E-05 | global batch size: 256 | lm loss: 2.932785E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.837 | TFLOPs: 30.95 | +7: iteration 94120/ 173500 | consumed samples: 24094720 | consumed tokens: 49345986560 | elapsed time per iteration (s): 0.45 | learning rate: 9.932E-05 | global batch size: 256 | lm loss: 2.925287E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.361 | TFLOPs: 30.14 | +7: iteration 94130/ 173500 | consumed samples: 24097280 | consumed tokens: 49351229440 | elapsed time per iteration (s): 0.44 | learning rate: 9.931E-05 | global batch size: 256 | lm loss: 2.934226E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.406 | TFLOPs: 30.30 | +7: iteration 94140/ 173500 | consumed samples: 24099840 | consumed tokens: 49356472320 | elapsed time per iteration (s): 0.44 | learning rate: 9.929E-05 | global batch size: 256 | lm loss: 2.920689E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.268 | TFLOPs: 30.34 | +7: iteration 94150/ 173500 | consumed samples: 24102400 | consumed tokens: 49361715200 | elapsed time per iteration (s): 0.44 | learning rate: 9.927E-05 | global batch size: 256 | lm loss: 2.929906E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.812 | TFLOPs: 30.21 | +7: iteration 94160/ 173500 | consumed samples: 24104960 | consumed tokens: 49366958080 | elapsed time per iteration (s): 0.48 | learning rate: 9.926E-05 | global batch size: 256 | lm loss: 2.930707E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.751 | TFLOPs: 28.27 | +7: iteration 94170/ 173500 | consumed samples: 24107520 | consumed tokens: 49372200960 | elapsed time per iteration (s): 0.44 | learning rate: 9.924E-05 | global batch size: 256 | lm loss: 2.936921E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.464 | TFLOPs: 30.25 | +7: iteration 94180/ 173500 | consumed samples: 24110080 | consumed tokens: 49377443840 | elapsed time per iteration (s): 0.46 | learning rate: 9.922E-05 | global batch size: 256 | lm loss: 2.924184E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.797 | TFLOPs: 28.95 | +7: iteration 94190/ 173500 | consumed samples: 24112640 | consumed tokens: 49382686720 | elapsed time per iteration (s): 0.43 | learning rate: 9.921E-05 | global batch size: 256 | lm loss: 2.930613E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.135 | TFLOPs: 31.59 | +7: iteration 94200/ 173500 | consumed samples: 24115200 | consumed tokens: 49387929600 | elapsed time per iteration (s): 0.43 | learning rate: 9.919E-05 | global batch size: 256 | lm loss: 2.929336E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.956 | TFLOPs: 31.58 | +7: iteration 94210/ 173500 | consumed samples: 24117760 | consumed tokens: 49393172480 | elapsed time per iteration (s): 0.42 | learning rate: 9.917E-05 | global batch size: 256 | lm loss: 2.930083E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.006 | TFLOPs: 31.64 | +7: iteration 94220/ 173500 | consumed samples: 24120320 | consumed tokens: 49398415360 | elapsed time per iteration (s): 0.45 | learning rate: 9.916E-05 | global batch size: 256 | lm loss: 2.923165E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.345 | TFLOPs: 29.66 | +7: iteration 94230/ 173500 | consumed samples: 24122880 | consumed tokens: 49403658240 | elapsed time per iteration (s): 0.45 | learning rate: 9.914E-05 | global batch size: 256 | lm loss: 2.929636E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.427 | TFLOPs: 29.72 | +7: iteration 94240/ 173500 | consumed samples: 24125440 | consumed tokens: 49408901120 | elapsed time per iteration (s): 0.46 | learning rate: 9.913E-05 | global batch size: 256 | lm loss: 2.940373E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.421 | TFLOPs: 29.51 | +7: iteration 94250/ 173500 | consumed samples: 24128000 | consumed tokens: 49414144000 | elapsed time per iteration (s): 0.50 | learning rate: 9.911E-05 | global batch size: 256 | lm loss: 2.942569E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 514.279 | TFLOPs: 26.98 | +7: iteration 94260/ 173500 | consumed samples: 24130560 | consumed tokens: 49419386880 | elapsed time per iteration (s): 0.46 | learning rate: 9.909E-05 | global batch size: 256 | lm loss: 2.928098E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.627 | TFLOPs: 29.15 | +7: iteration 94270/ 173500 | consumed samples: 24133120 | consumed tokens: 49424629760 | elapsed time per iteration (s): 0.48 | learning rate: 9.908E-05 | global batch size: 256 | lm loss: 2.918700E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.198 | TFLOPs: 27.98 | +7: iteration 94280/ 173500 | consumed samples: 24135680 | consumed tokens: 49429872640 | elapsed time per iteration (s): 0.47 | learning rate: 9.906E-05 | global batch size: 256 | lm loss: 2.942278E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.853 | TFLOPs: 28.43 | +7: iteration 94290/ 173500 | consumed samples: 24138240 | consumed tokens: 49435115520 | elapsed time per iteration (s): 0.43 | learning rate: 9.904E-05 | global batch size: 256 | lm loss: 2.933377E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.643 | TFLOPs: 31.04 | +7: iteration 94300/ 173500 | consumed samples: 24140800 | consumed tokens: 49440358400 | elapsed time per iteration (s): 0.46 | learning rate: 9.903E-05 | global batch size: 256 | lm loss: 2.936343E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.976 | TFLOPs: 29.12 | +7: iteration 94310/ 173500 | consumed samples: 24143360 | consumed tokens: 49445601280 | elapsed time per iteration (s): 0.44 | learning rate: 9.901E-05 | global batch size: 256 | lm loss: 2.940827E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.527 | TFLOPs: 30.72 | +7: iteration 94320/ 173500 | consumed samples: 24145920 | consumed tokens: 49450844160 | elapsed time per iteration (s): 0.43 | learning rate: 9.900E-05 | global batch size: 256 | lm loss: 2.912329E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.281 | TFLOPs: 31.29 | +7: iteration 94330/ 173500 | consumed samples: 24148480 | consumed tokens: 49456087040 | elapsed time per iteration (s): 0.42 | learning rate: 9.898E-05 | global batch size: 256 | lm loss: 2.929371E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.731 | TFLOPs: 31.62 | +7: iteration 94340/ 173500 | consumed samples: 24151040 | consumed tokens: 49461329920 | elapsed time per iteration (s): 0.42 | learning rate: 9.896E-05 | global batch size: 256 | lm loss: 2.928491E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.316 | TFLOPs: 31.71 | +7: iteration 94350/ 173500 | consumed samples: 24153600 | consumed tokens: 49466572800 | elapsed time per iteration (s): 0.42 | learning rate: 9.895E-05 | global batch size: 256 | lm loss: 2.937434E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.573 | TFLOPs: 31.83 | +7: iteration 94360/ 173500 | consumed samples: 24156160 | consumed tokens: 49471815680 | elapsed time per iteration (s): 0.42 | learning rate: 9.893E-05 | global batch size: 256 | lm loss: 2.933271E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.175 | TFLOPs: 31.75 | +7: iteration 94370/ 173500 | consumed samples: 24158720 | consumed tokens: 49477058560 | elapsed time per iteration (s): 0.42 | learning rate: 9.891E-05 | global batch size: 256 | lm loss: 2.923234E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.315 | TFLOPs: 31.65 | +7: iteration 94380/ 173500 | consumed samples: 24161280 | consumed tokens: 49482301440 | elapsed time per iteration (s): 0.43 | learning rate: 9.890E-05 | global batch size: 256 | lm loss: 2.924860E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.754 | TFLOPs: 31.21 | +7: iteration 94390/ 173500 | consumed samples: 24163840 | consumed tokens: 49487544320 | elapsed time per iteration (s): 0.43 | learning rate: 9.888E-05 | global batch size: 256 | lm loss: 2.919363E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.948 | TFLOPs: 31.53 | +7: iteration 94400/ 173500 | consumed samples: 24166400 | consumed tokens: 49492787200 | elapsed time per iteration (s): 0.43 | learning rate: 9.886E-05 | global batch size: 256 | lm loss: 2.923415E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.537 | TFLOPs: 31.04 | +7: iteration 94410/ 173500 | consumed samples: 24168960 | consumed tokens: 49498030080 | elapsed time per iteration (s): 0.43 | learning rate: 9.885E-05 | global batch size: 256 | lm loss: 2.926903E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.719 | TFLOPs: 30.89 | +7: iteration 94420/ 173500 | consumed samples: 24171520 | consumed tokens: 49503272960 | elapsed time per iteration (s): 0.43 | learning rate: 9.883E-05 | global batch size: 256 | lm loss: 2.930646E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.916 | TFLOPs: 31.48 | +7: iteration 94430/ 173500 | consumed samples: 24174080 | consumed tokens: 49508515840 | elapsed time per iteration (s): 0.43 | learning rate: 9.882E-05 | global batch size: 256 | lm loss: 2.935527E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.235 | TFLOPs: 31.49 | +7: iteration 94440/ 173500 | consumed samples: 24176640 | consumed tokens: 49513758720 | elapsed time per iteration (s): 0.42 | learning rate: 9.880E-05 | global batch size: 256 | lm loss: 2.911362E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.073 | TFLOPs: 31.80 | +7: iteration 94450/ 173500 | consumed samples: 24179200 | consumed tokens: 49519001600 | elapsed time per iteration (s): 0.42 | learning rate: 9.878E-05 | global batch size: 256 | lm loss: 2.933272E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.580 | TFLOPs: 31.72 | +7: iteration 94460/ 173500 | consumed samples: 24181760 | consumed tokens: 49524244480 | elapsed time per iteration (s): 0.42 | learning rate: 9.877E-05 | global batch size: 256 | lm loss: 2.935636E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.761 | TFLOPs: 31.63 | +7: iteration 94470/ 173500 | consumed samples: 24184320 | consumed tokens: 49529487360 | elapsed time per iteration (s): 0.43 | learning rate: 9.875E-05 | global batch size: 256 | lm loss: 2.937045E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.125 | TFLOPs: 31.54 | +7: iteration 94480/ 173500 | consumed samples: 24186880 | consumed tokens: 49534730240 | elapsed time per iteration (s): 0.42 | learning rate: 9.873E-05 | global batch size: 256 | lm loss: 2.935353E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.938 | TFLOPs: 31.85 | +7: iteration 94490/ 173500 | consumed samples: 24189440 | consumed tokens: 49539973120 | elapsed time per iteration (s): 0.43 | learning rate: 9.872E-05 | global batch size: 256 | lm loss: 2.943677E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.170 | TFLOPs: 31.59 | +7: iteration 94500/ 173500 | consumed samples: 24192000 | consumed tokens: 49545216000 | elapsed time per iteration (s): 0.42 | learning rate: 9.870E-05 | global batch size: 256 | lm loss: 2.949142E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.482 | TFLOPs: 31.61 | +7: iteration 94510/ 173500 | consumed samples: 24194560 | consumed tokens: 49550458880 | elapsed time per iteration (s): 0.42 | learning rate: 9.868E-05 | global batch size: 256 | lm loss: 2.923602E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.106 | TFLOPs: 31.64 | +7: iteration 94520/ 173500 | consumed samples: 24197120 | consumed tokens: 49555701760 | elapsed time per iteration (s): 0.43 | learning rate: 9.867E-05 | global batch size: 256 | lm loss: 2.932860E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.109 | TFLOPs: 31.33 | +7: iteration 94530/ 173500 | consumed samples: 24199680 | consumed tokens: 49560944640 | elapsed time per iteration (s): 0.43 | learning rate: 9.865E-05 | global batch size: 256 | lm loss: 2.932863E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.772 | TFLOPs: 31.57 | +7: iteration 94540/ 173500 | consumed samples: 24202240 | consumed tokens: 49566187520 | elapsed time per iteration (s): 0.42 | learning rate: 9.864E-05 | global batch size: 256 | lm loss: 2.922891E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.337 | TFLOPs: 31.66 | +7: iteration 94550/ 173500 | consumed samples: 24204800 | consumed tokens: 49571430400 | elapsed time per iteration (s): 0.42 | learning rate: 9.862E-05 | global batch size: 256 | lm loss: 2.916436E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.165 | TFLOPs: 31.65 | +7: iteration 94560/ 173500 | consumed samples: 24207360 | consumed tokens: 49576673280 | elapsed time per iteration (s): 0.42 | learning rate: 9.860E-05 | global batch size: 256 | lm loss: 2.926573E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.227 | TFLOPs: 31.86 | +7: iteration 94570/ 173500 | consumed samples: 24209920 | consumed tokens: 49581916160 | elapsed time per iteration (s): 0.43 | learning rate: 9.859E-05 | global batch size: 256 | lm loss: 2.921609E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.530 | TFLOPs: 31.35 | +7: iteration 94580/ 173500 | consumed samples: 24212480 | consumed tokens: 49587159040 | elapsed time per iteration (s): 0.42 | learning rate: 9.857E-05 | global batch size: 256 | lm loss: 2.922859E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.705 | TFLOPs: 31.62 | +7: iteration 94590/ 173500 | consumed samples: 24215040 | consumed tokens: 49592401920 | elapsed time per iteration (s): 0.43 | learning rate: 9.855E-05 | global batch size: 256 | lm loss: 2.933960E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.020 | TFLOPs: 31.32 | +7: iteration 94600/ 173500 | consumed samples: 24217600 | consumed tokens: 49597644800 | elapsed time per iteration (s): 0.43 | learning rate: 9.854E-05 | global batch size: 256 | lm loss: 2.926383E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.985 | TFLOPs: 31.48 | +7: iteration 94610/ 173500 | consumed samples: 24220160 | consumed tokens: 49602887680 | elapsed time per iteration (s): 0.42 | learning rate: 9.852E-05 | global batch size: 256 | lm loss: 2.930956E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.896 | TFLOPs: 31.79 | +7: iteration 94620/ 173500 | consumed samples: 24222720 | consumed tokens: 49608130560 | elapsed time per iteration (s): 0.42 | learning rate: 9.851E-05 | global batch size: 256 | lm loss: 2.920810E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.855 | TFLOPs: 31.74 | +7: iteration 94630/ 173500 | consumed samples: 24225280 | consumed tokens: 49613373440 | elapsed time per iteration (s): 0.43 | learning rate: 9.849E-05 | global batch size: 256 | lm loss: 2.933622E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.764 | TFLOPs: 31.47 | +7: iteration 94640/ 173500 | consumed samples: 24227840 | consumed tokens: 49618616320 | elapsed time per iteration (s): 0.43 | learning rate: 9.847E-05 | global batch size: 256 | lm loss: 2.937160E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.571 | TFLOPs: 31.51 | +7: iteration 94650/ 173500 | consumed samples: 24230400 | consumed tokens: 49623859200 | elapsed time per iteration (s): 0.43 | learning rate: 9.846E-05 | global batch size: 256 | lm loss: 2.934720E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.654 | TFLOPs: 31.46 | +7: iteration 94660/ 173500 | consumed samples: 24232960 | consumed tokens: 49629102080 | elapsed time per iteration (s): 0.43 | learning rate: 9.844E-05 | global batch size: 256 | lm loss: 2.921774E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.834 | TFLOPs: 31.58 | +7: iteration 94670/ 173500 | consumed samples: 24235520 | consumed tokens: 49634344960 | elapsed time per iteration (s): 0.43 | learning rate: 9.842E-05 | global batch size: 256 | lm loss: 2.927063E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.509 | TFLOPs: 31.51 | +7: iteration 94680/ 173500 | consumed samples: 24238080 | consumed tokens: 49639587840 | elapsed time per iteration (s): 0.42 | learning rate: 9.841E-05 | global batch size: 256 | lm loss: 2.929651E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.952 | TFLOPs: 31.69 | +7: iteration 94690/ 173500 | consumed samples: 24240640 | consumed tokens: 49644830720 | elapsed time per iteration (s): 0.42 | learning rate: 9.839E-05 | global batch size: 256 | lm loss: 2.927676E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.517 | TFLOPs: 31.93 | +7: iteration 94700/ 173500 | consumed samples: 24243200 | consumed tokens: 49650073600 | elapsed time per iteration (s): 0.43 | learning rate: 9.837E-05 | global batch size: 256 | lm loss: 2.927797E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.967 | TFLOPs: 31.53 | +7: iteration 94710/ 173500 | consumed samples: 24245760 | consumed tokens: 49655316480 | elapsed time per iteration (s): 0.42 | learning rate: 9.836E-05 | global batch size: 256 | lm loss: 2.931509E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.459 | TFLOPs: 31.66 | +7: iteration 94720/ 173500 | consumed samples: 24248320 | consumed tokens: 49660559360 | elapsed time per iteration (s): 0.43 | learning rate: 9.834E-05 | global batch size: 256 | lm loss: 2.929271E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.471 | TFLOPs: 31.51 | +7: iteration 94730/ 173500 | consumed samples: 24250880 | consumed tokens: 49665802240 | elapsed time per iteration (s): 0.42 | learning rate: 9.833E-05 | global batch size: 256 | lm loss: 2.924669E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.540 | TFLOPs: 31.61 | +7: iteration 94740/ 173500 | consumed samples: 24253440 | consumed tokens: 49671045120 | elapsed time per iteration (s): 0.42 | learning rate: 9.831E-05 | global batch size: 256 | lm loss: 2.924945E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.834 | TFLOPs: 31.73 | +7: iteration 94750/ 173500 | consumed samples: 24256000 | consumed tokens: 49676288000 | elapsed time per iteration (s): 0.42 | learning rate: 9.829E-05 | global batch size: 256 | lm loss: 2.920856E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.253 | TFLOPs: 31.97 | +7: iteration 94760/ 173500 | consumed samples: 24258560 | consumed tokens: 49681530880 | elapsed time per iteration (s): 0.42 | learning rate: 9.828E-05 | global batch size: 256 | lm loss: 2.948494E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.136 | TFLOPs: 31.65 | +7: iteration 94770/ 173500 | consumed samples: 24261120 | consumed tokens: 49686773760 | elapsed time per iteration (s): 0.42 | learning rate: 9.826E-05 | global batch size: 256 | lm loss: 2.941134E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.857 | TFLOPs: 31.95 | +7: iteration 94780/ 173500 | consumed samples: 24263680 | consumed tokens: 49692016640 | elapsed time per iteration (s): 0.44 | learning rate: 9.824E-05 | global batch size: 256 | lm loss: 2.927165E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.621 | TFLOPs: 30.62 | +7: iteration 94790/ 173500 | consumed samples: 24266240 | consumed tokens: 49697259520 | elapsed time per iteration (s): 0.42 | learning rate: 9.823E-05 | global batch size: 256 | lm loss: 2.934363E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.519 | TFLOPs: 31.77 | +7: iteration 94800/ 173500 | consumed samples: 24268800 | consumed tokens: 49702502400 | elapsed time per iteration (s): 0.42 | learning rate: 9.821E-05 | global batch size: 256 | lm loss: 2.923072E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.944 | TFLOPs: 31.79 | +7: iteration 94810/ 173500 | consumed samples: 24271360 | consumed tokens: 49707745280 | elapsed time per iteration (s): 0.43 | learning rate: 9.820E-05 | global batch size: 256 | lm loss: 2.935835E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.930 | TFLOPs: 31.53 | +7: iteration 94820/ 173500 | consumed samples: 24273920 | consumed tokens: 49712988160 | elapsed time per iteration (s): 0.42 | learning rate: 9.818E-05 | global batch size: 256 | lm loss: 2.928765E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.496 | TFLOPs: 31.77 | +7: iteration 94830/ 173500 | consumed samples: 24276480 | consumed tokens: 49718231040 | elapsed time per iteration (s): 0.42 | learning rate: 9.816E-05 | global batch size: 256 | lm loss: 2.921846E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.570 | TFLOPs: 31.93 | +7: iteration 94840/ 173500 | consumed samples: 24279040 | consumed tokens: 49723473920 | elapsed time per iteration (s): 0.42 | learning rate: 9.815E-05 | global batch size: 256 | lm loss: 2.928628E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.659 | TFLOPs: 31.62 | +7: iteration 94850/ 173500 | consumed samples: 24281600 | consumed tokens: 49728716800 | elapsed time per iteration (s): 0.42 | learning rate: 9.813E-05 | global batch size: 256 | lm loss: 2.929900E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.037 | TFLOPs: 31.75 | +7: iteration 94860/ 173500 | consumed samples: 24284160 | consumed tokens: 49733959680 | elapsed time per iteration (s): 0.42 | learning rate: 9.811E-05 | global batch size: 256 | lm loss: 2.937757E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.587 | TFLOPs: 31.83 | +7: iteration 94870/ 173500 | consumed samples: 24286720 | consumed tokens: 49739202560 | elapsed time per iteration (s): 0.43 | learning rate: 9.810E-05 | global batch size: 256 | lm loss: 2.928440E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.100 | TFLOPs: 31.49 | +7: iteration 94880/ 173500 | consumed samples: 24289280 | consumed tokens: 49744445440 | elapsed time per iteration (s): 0.43 | learning rate: 9.808E-05 | global batch size: 256 | lm loss: 2.944416E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.696 | TFLOPs: 31.52 | +7: iteration 94890/ 173500 | consumed samples: 24291840 | consumed tokens: 49749688320 | elapsed time per iteration (s): 0.42 | learning rate: 9.806E-05 | global batch size: 256 | lm loss: 2.941705E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.395 | TFLOPs: 31.71 | +7: iteration 94900/ 173500 | consumed samples: 24294400 | consumed tokens: 49754931200 | elapsed time per iteration (s): 0.42 | learning rate: 9.805E-05 | global batch size: 256 | lm loss: 2.918932E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.234 | TFLOPs: 31.86 | +7: iteration 94910/ 173500 | consumed samples: 24296960 | consumed tokens: 49760174080 | elapsed time per iteration (s): 0.43 | learning rate: 9.803E-05 | global batch size: 256 | lm loss: 2.924023E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.236 | TFLOPs: 31.39 | +7: iteration 94920/ 173500 | consumed samples: 24299520 | consumed tokens: 49765416960 | elapsed time per iteration (s): 0.42 | learning rate: 9.802E-05 | global batch size: 256 | lm loss: 2.933202E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.207 | TFLOPs: 31.75 | +7: iteration 94930/ 173500 | consumed samples: 24302080 | consumed tokens: 49770659840 | elapsed time per iteration (s): 0.42 | learning rate: 9.800E-05 | global batch size: 256 | lm loss: 2.926249E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.446 | TFLOPs: 31.66 | +7: iteration 94940/ 173500 | consumed samples: 24304640 | consumed tokens: 49775902720 | elapsed time per iteration (s): 0.42 | learning rate: 9.798E-05 | global batch size: 256 | lm loss: 2.928452E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.594 | TFLOPs: 31.62 | +7: iteration 94950/ 173500 | consumed samples: 24307200 | consumed tokens: 49781145600 | elapsed time per iteration (s): 0.43 | learning rate: 9.797E-05 | global batch size: 256 | lm loss: 2.934994E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.649 | TFLOPs: 31.20 | +7: iteration 94960/ 173500 | consumed samples: 24309760 | consumed tokens: 49786388480 | elapsed time per iteration (s): 0.43 | learning rate: 9.795E-05 | global batch size: 256 | lm loss: 2.920439E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.080 | TFLOPs: 31.49 | +7: iteration 94970/ 173500 | consumed samples: 24312320 | consumed tokens: 49791631360 | elapsed time per iteration (s): 0.42 | learning rate: 9.793E-05 | global batch size: 256 | lm loss: 2.935632E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.514 | TFLOPs: 31.82 | +7: iteration 94980/ 173500 | consumed samples: 24314880 | consumed tokens: 49796874240 | elapsed time per iteration (s): 0.42 | learning rate: 9.792E-05 | global batch size: 256 | lm loss: 2.932993E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.437 | TFLOPs: 31.61 | +7: iteration 94990/ 173500 | consumed samples: 24317440 | consumed tokens: 49802117120 | elapsed time per iteration (s): 0.42 | learning rate: 9.790E-05 | global batch size: 256 | lm loss: 2.918312E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.204 | TFLOPs: 31.81 | +7: iteration 95000/ 173500 | consumed samples: 24320000 | consumed tokens: 49807360000 | elapsed time per iteration (s): 0.42 | learning rate: 9.789E-05 | global batch size: 256 | lm loss: 2.905612E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.637 | TFLOPs: 31.72 | +7: iteration 95010/ 173500 | consumed samples: 24322560 | consumed tokens: 49812602880 | elapsed time per iteration (s): 0.42 | learning rate: 9.787E-05 | global batch size: 256 | lm loss: 2.935978E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.819 | TFLOPs: 31.79 | +7: iteration 95020/ 173500 | consumed samples: 24325120 | consumed tokens: 49817845760 | elapsed time per iteration (s): 0.42 | learning rate: 9.785E-05 | global batch size: 256 | lm loss: 2.931348E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.363 | TFLOPs: 31.66 | +7: iteration 95030/ 173500 | consumed samples: 24327680 | consumed tokens: 49823088640 | elapsed time per iteration (s): 0.43 | learning rate: 9.784E-05 | global batch size: 256 | lm loss: 2.925153E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.373 | TFLOPs: 31.45 | +7: iteration 95040/ 173500 | consumed samples: 24330240 | consumed tokens: 49828331520 | elapsed time per iteration (s): 0.43 | learning rate: 9.782E-05 | global batch size: 256 | lm loss: 2.925274E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.010 | TFLOPs: 31.17 | +7: iteration 95050/ 173500 | consumed samples: 24332800 | consumed tokens: 49833574400 | elapsed time per iteration (s): 0.42 | learning rate: 9.780E-05 | global batch size: 256 | lm loss: 2.919039E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.763 | TFLOPs: 31.63 | +7: iteration 95060/ 173500 | consumed samples: 24335360 | consumed tokens: 49838817280 | elapsed time per iteration (s): 0.42 | learning rate: 9.779E-05 | global batch size: 256 | lm loss: 2.933673E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.572 | TFLOPs: 31.98 | +7: iteration 95070/ 173500 | consumed samples: 24337920 | consumed tokens: 49844060160 | elapsed time per iteration (s): 0.42 | learning rate: 9.777E-05 | global batch size: 256 | lm loss: 2.915971E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.420 | TFLOPs: 31.66 | +7: iteration 95080/ 173500 | consumed samples: 24340480 | consumed tokens: 49849303040 | elapsed time per iteration (s): 0.43 | learning rate: 9.775E-05 | global batch size: 256 | lm loss: 2.929238E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.739 | TFLOPs: 31.05 | +7: iteration 95090/ 173500 | consumed samples: 24343040 | consumed tokens: 49854545920 | elapsed time per iteration (s): 0.42 | learning rate: 9.774E-05 | global batch size: 256 | lm loss: 2.944254E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.021 | TFLOPs: 31.69 | +7: iteration 95100/ 173500 | consumed samples: 24345600 | consumed tokens: 49859788800 | elapsed time per iteration (s): 0.42 | learning rate: 9.772E-05 | global batch size: 256 | lm loss: 2.927066E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.393 | TFLOPs: 31.76 | +7: iteration 95110/ 173500 | consumed samples: 24348160 | consumed tokens: 49865031680 | elapsed time per iteration (s): 0.43 | learning rate: 9.771E-05 | global batch size: 256 | lm loss: 2.935122E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.792 | TFLOPs: 31.26 | +7: iteration 95120/ 173500 | consumed samples: 24350720 | consumed tokens: 49870274560 | elapsed time per iteration (s): 0.43 | learning rate: 9.769E-05 | global batch size: 256 | lm loss: 2.922588E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.623 | TFLOPs: 31.15 | +7: iteration 95130/ 173500 | consumed samples: 24353280 | consumed tokens: 49875517440 | elapsed time per iteration (s): 0.43 | learning rate: 9.767E-05 | global batch size: 256 | lm loss: 2.931562E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.903 | TFLOPs: 31.32 | +7: iteration 95140/ 173500 | consumed samples: 24355840 | consumed tokens: 49880760320 | elapsed time per iteration (s): 0.42 | learning rate: 9.766E-05 | global batch size: 256 | lm loss: 2.926434E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.676 | TFLOPs: 31.73 | +7: iteration 95150/ 173500 | consumed samples: 24358400 | consumed tokens: 49886003200 | elapsed time per iteration (s): 0.42 | learning rate: 9.764E-05 | global batch size: 256 | lm loss: 2.927743E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.694 | TFLOPs: 31.62 | +7: iteration 95160/ 173500 | consumed samples: 24360960 | consumed tokens: 49891246080 | elapsed time per iteration (s): 0.42 | learning rate: 9.762E-05 | global batch size: 256 | lm loss: 2.929753E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.344 | TFLOPs: 31.76 | +7: iteration 95170/ 173500 | consumed samples: 24363520 | consumed tokens: 49896488960 | elapsed time per iteration (s): 0.42 | learning rate: 9.761E-05 | global batch size: 256 | lm loss: 2.918894E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.927 | TFLOPs: 32.00 | +7: iteration 95180/ 173500 | consumed samples: 24366080 | consumed tokens: 49901731840 | elapsed time per iteration (s): 0.43 | learning rate: 9.759E-05 | global batch size: 256 | lm loss: 2.914695E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.635 | TFLOPs: 31.41 | +7: iteration 95190/ 173500 | consumed samples: 24368640 | consumed tokens: 49906974720 | elapsed time per iteration (s): 0.42 | learning rate: 9.758E-05 | global batch size: 256 | lm loss: 2.931665E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.739 | TFLOPs: 31.99 | +7: iteration 95200/ 173500 | consumed samples: 24371200 | consumed tokens: 49912217600 | elapsed time per iteration (s): 0.42 | learning rate: 9.756E-05 | global batch size: 256 | lm loss: 2.930329E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.669 | TFLOPs: 31.78 | +7: iteration 95210/ 173500 | consumed samples: 24373760 | consumed tokens: 49917460480 | elapsed time per iteration (s): 0.42 | learning rate: 9.754E-05 | global batch size: 256 | lm loss: 2.926458E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.894 | TFLOPs: 31.63 | +7: iteration 95220/ 173500 | consumed samples: 24376320 | consumed tokens: 49922703360 | elapsed time per iteration (s): 0.42 | learning rate: 9.753E-05 | global batch size: 256 | lm loss: 2.929864E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.556 | TFLOPs: 31.98 | +7: iteration 95230/ 173500 | consumed samples: 24378880 | consumed tokens: 49927946240 | elapsed time per iteration (s): 0.42 | learning rate: 9.751E-05 | global batch size: 256 | lm loss: 2.941268E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.175 | TFLOPs: 31.96 | +7: iteration 95240/ 173500 | consumed samples: 24381440 | consumed tokens: 49933189120 | elapsed time per iteration (s): 0.43 | learning rate: 9.749E-05 | global batch size: 256 | lm loss: 2.914913E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.744 | TFLOPs: 31.52 | +7: iteration 95250/ 173500 | consumed samples: 24384000 | consumed tokens: 49938432000 | elapsed time per iteration (s): 0.42 | learning rate: 9.748E-05 | global batch size: 256 | lm loss: 2.939989E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.773 | TFLOPs: 31.99 | +7: iteration 95260/ 173500 | consumed samples: 24386560 | consumed tokens: 49943674880 | elapsed time per iteration (s): 0.43 | learning rate: 9.746E-05 | global batch size: 256 | lm loss: 2.930602E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.114 | TFLOPs: 31.22 | +7: iteration 95270/ 173500 | consumed samples: 24389120 | consumed tokens: 49948917760 | elapsed time per iteration (s): 0.42 | learning rate: 9.744E-05 | global batch size: 256 | lm loss: 2.935879E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.902 | TFLOPs: 31.79 | +7: iteration 95280/ 173500 | consumed samples: 24391680 | consumed tokens: 49954160640 | elapsed time per iteration (s): 0.43 | learning rate: 9.743E-05 | global batch size: 256 | lm loss: 2.921036E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.496 | TFLOPs: 31.03 | +7: iteration 95290/ 173500 | consumed samples: 24394240 | consumed tokens: 49959403520 | elapsed time per iteration (s): 0.43 | learning rate: 9.741E-05 | global batch size: 256 | lm loss: 2.921878E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.186 | TFLOPs: 31.44 | +7: iteration 95300/ 173500 | consumed samples: 24396800 | consumed tokens: 49964646400 | elapsed time per iteration (s): 0.42 | learning rate: 9.740E-05 | global batch size: 256 | lm loss: 2.914222E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.799 | TFLOPs: 32.00 | +7: iteration 95310/ 173500 | consumed samples: 24399360 | consumed tokens: 49969889280 | elapsed time per iteration (s): 0.42 | learning rate: 9.738E-05 | global batch size: 256 | lm loss: 2.937427E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.213 | TFLOPs: 31.70 | +7: iteration 95320/ 173500 | consumed samples: 24401920 | consumed tokens: 49975132160 | elapsed time per iteration (s): 0.42 | learning rate: 9.736E-05 | global batch size: 256 | lm loss: 2.926568E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.638 | TFLOPs: 31.62 | +7: iteration 95330/ 173500 | consumed samples: 24404480 | consumed tokens: 49980375040 | elapsed time per iteration (s): 0.42 | learning rate: 9.735E-05 | global batch size: 256 | lm loss: 2.922493E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.910 | TFLOPs: 31.84 | +7: iteration 95340/ 173500 | consumed samples: 24407040 | consumed tokens: 49985617920 | elapsed time per iteration (s): 0.42 | learning rate: 9.733E-05 | global batch size: 256 | lm loss: 2.923597E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.513 | TFLOPs: 31.82 | +7: iteration 95350/ 173500 | consumed samples: 24409600 | consumed tokens: 49990860800 | elapsed time per iteration (s): 0.42 | learning rate: 9.731E-05 | global batch size: 256 | lm loss: 2.924043E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.597 | TFLOPs: 31.98 | +7: iteration 95360/ 173500 | consumed samples: 24412160 | consumed tokens: 49996103680 | elapsed time per iteration (s): 0.42 | learning rate: 9.730E-05 | global batch size: 256 | lm loss: 2.920269E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.168 | TFLOPs: 31.65 | +7: iteration 95370/ 173500 | consumed samples: 24414720 | consumed tokens: 50001346560 | elapsed time per iteration (s): 0.42 | learning rate: 9.728E-05 | global batch size: 256 | lm loss: 2.937872E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.247 | TFLOPs: 31.76 | +7: iteration 95380/ 173500 | consumed samples: 24417280 | consumed tokens: 50006589440 | elapsed time per iteration (s): 0.42 | learning rate: 9.727E-05 | global batch size: 256 | lm loss: 2.936361E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.645 | TFLOPs: 31.78 | +7: iteration 95390/ 173500 | consumed samples: 24419840 | consumed tokens: 50011832320 | elapsed time per iteration (s): 0.42 | learning rate: 9.725E-05 | global batch size: 256 | lm loss: 2.933134E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.663 | TFLOPs: 31.99 | +7: iteration 95400/ 173500 | consumed samples: 24422400 | consumed tokens: 50017075200 | elapsed time per iteration (s): 0.43 | learning rate: 9.723E-05 | global batch size: 256 | lm loss: 2.933393E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.140 | TFLOPs: 30.96 | +7: iteration 95410/ 173500 | consumed samples: 24424960 | consumed tokens: 50022318080 | elapsed time per iteration (s): 0.42 | learning rate: 9.722E-05 | global batch size: 256 | lm loss: 2.924818E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.670 | TFLOPs: 31.99 | +7: iteration 95420/ 173500 | consumed samples: 24427520 | consumed tokens: 50027560960 | elapsed time per iteration (s): 0.42 | learning rate: 9.720E-05 | global batch size: 256 | lm loss: 2.922163E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.462 | TFLOPs: 31.82 | +7: iteration 95430/ 173500 | consumed samples: 24430080 | consumed tokens: 50032803840 | elapsed time per iteration (s): 0.42 | learning rate: 9.718E-05 | global batch size: 256 | lm loss: 2.929958E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.771 | TFLOPs: 31.84 | +7: iteration 95440/ 173500 | consumed samples: 24432640 | consumed tokens: 50038046720 | elapsed time per iteration (s): 0.42 | learning rate: 9.717E-05 | global batch size: 256 | lm loss: 2.920177E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.943 | TFLOPs: 31.74 | +7: iteration 95450/ 173500 | consumed samples: 24435200 | consumed tokens: 50043289600 | elapsed time per iteration (s): 0.42 | learning rate: 9.715E-05 | global batch size: 256 | lm loss: 2.934714E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.357 | TFLOPs: 31.76 | +7: iteration 95460/ 173500 | consumed samples: 24437760 | consumed tokens: 50048532480 | elapsed time per iteration (s): 0.42 | learning rate: 9.714E-05 | global batch size: 256 | lm loss: 2.935426E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.516 | TFLOPs: 31.88 | +7: iteration 95470/ 173500 | consumed samples: 24440320 | consumed tokens: 50053775360 | elapsed time per iteration (s): 0.42 | learning rate: 9.712E-05 | global batch size: 256 | lm loss: 2.926492E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.405 | TFLOPs: 31.97 | +7: iteration 95480/ 173500 | consumed samples: 24442880 | consumed tokens: 50059018240 | elapsed time per iteration (s): 0.42 | learning rate: 9.710E-05 | global batch size: 256 | lm loss: 2.927520E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.251 | TFLOPs: 31.81 | +7: iteration 95490/ 173500 | consumed samples: 24445440 | consumed tokens: 50064261120 | elapsed time per iteration (s): 0.42 | learning rate: 9.709E-05 | global batch size: 256 | lm loss: 2.938598E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.740 | TFLOPs: 31.78 | +7: iteration 95500/ 173500 | consumed samples: 24448000 | consumed tokens: 50069504000 | elapsed time per iteration (s): 0.42 | learning rate: 9.707E-05 | global batch size: 256 | lm loss: 2.931959E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.693 | TFLOPs: 31.83 | +7: iteration 95510/ 173500 | consumed samples: 24450560 | consumed tokens: 50074746880 | elapsed time per iteration (s): 0.42 | learning rate: 9.705E-05 | global batch size: 256 | lm loss: 2.938265E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.120 | TFLOPs: 31.96 | +7: iteration 95520/ 173500 | consumed samples: 24453120 | consumed tokens: 50079989760 | elapsed time per iteration (s): 0.43 | learning rate: 9.704E-05 | global batch size: 256 | lm loss: 2.928918E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.167 | TFLOPs: 31.38 | +7: iteration 95530/ 173500 | consumed samples: 24455680 | consumed tokens: 50085232640 | elapsed time per iteration (s): 0.44 | learning rate: 9.702E-05 | global batch size: 256 | lm loss: 2.912358E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.578 | TFLOPs: 30.78 | +7: iteration 95540/ 173500 | consumed samples: 24458240 | consumed tokens: 50090475520 | elapsed time per iteration (s): 0.42 | learning rate: 9.700E-05 | global batch size: 256 | lm loss: 2.923927E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.665 | TFLOPs: 31.83 | +7: iteration 95550/ 173500 | consumed samples: 24460800 | consumed tokens: 50095718400 | elapsed time per iteration (s): 0.43 | learning rate: 9.699E-05 | global batch size: 256 | lm loss: 2.939961E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.895 | TFLOPs: 31.58 | +7: iteration 95560/ 173500 | consumed samples: 24463360 | consumed tokens: 50100961280 | elapsed time per iteration (s): 0.43 | learning rate: 9.697E-05 | global batch size: 256 | lm loss: 2.924014E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.813 | TFLOPs: 31.58 | +7: iteration 95570/ 173500 | consumed samples: 24465920 | consumed tokens: 50106204160 | elapsed time per iteration (s): 0.44 | learning rate: 9.696E-05 | global batch size: 256 | lm loss: 2.933707E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.245 | TFLOPs: 30.76 | +7: iteration 95580/ 173500 | consumed samples: 24468480 | consumed tokens: 50111447040 | elapsed time per iteration (s): 0.42 | learning rate: 9.694E-05 | global batch size: 256 | lm loss: 2.917072E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.698 | TFLOPs: 31.62 | +7: iteration 95590/ 173500 | consumed samples: 24471040 | consumed tokens: 50116689920 | elapsed time per iteration (s): 0.43 | learning rate: 9.692E-05 | global batch size: 256 | lm loss: 2.934348E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.786 | TFLOPs: 31.57 | +7: iteration 95600/ 173500 | consumed samples: 24473600 | consumed tokens: 50121932800 | elapsed time per iteration (s): 0.43 | learning rate: 9.691E-05 | global batch size: 256 | lm loss: 2.927243E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.576 | TFLOPs: 31.56 | +7: iteration 95610/ 173500 | consumed samples: 24476160 | consumed tokens: 50127175680 | elapsed time per iteration (s): 0.43 | learning rate: 9.689E-05 | global batch size: 256 | lm loss: 2.920402E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.153 | TFLOPs: 31.54 | +7: iteration 95620/ 173500 | consumed samples: 24478720 | consumed tokens: 50132418560 | elapsed time per iteration (s): 0.42 | learning rate: 9.687E-05 | global batch size: 256 | lm loss: 2.923073E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.424 | TFLOPs: 31.66 | +7: iteration 95630/ 173500 | consumed samples: 24481280 | consumed tokens: 50137661440 | elapsed time per iteration (s): 0.42 | learning rate: 9.686E-05 | global batch size: 256 | lm loss: 2.919592E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.571 | TFLOPs: 31.72 | +7: iteration 95640/ 173500 | consumed samples: 24483840 | consumed tokens: 50142904320 | elapsed time per iteration (s): 0.43 | learning rate: 9.684E-05 | global batch size: 256 | lm loss: 2.932892E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.104 | TFLOPs: 31.49 | +7: iteration 95650/ 173500 | consumed samples: 24486400 | consumed tokens: 50148147200 | elapsed time per iteration (s): 0.42 | learning rate: 9.683E-05 | global batch size: 256 | lm loss: 2.932635E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.399 | TFLOPs: 31.76 | +7: iteration 95660/ 173500 | consumed samples: 24488960 | consumed tokens: 50153390080 | elapsed time per iteration (s): 0.42 | learning rate: 9.681E-05 | global batch size: 256 | lm loss: 2.935428E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.933 | TFLOPs: 31.69 | +7: iteration 95670/ 173500 | consumed samples: 24491520 | consumed tokens: 50158632960 | elapsed time per iteration (s): 0.42 | learning rate: 9.679E-05 | global batch size: 256 | lm loss: 2.943069E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.099 | TFLOPs: 31.75 | +7: iteration 95680/ 173500 | consumed samples: 24494080 | consumed tokens: 50163875840 | elapsed time per iteration (s): 0.42 | learning rate: 9.678E-05 | global batch size: 256 | lm loss: 2.926945E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.774 | TFLOPs: 31.89 | +7: iteration 95690/ 173500 | consumed samples: 24496640 | consumed tokens: 50169118720 | elapsed time per iteration (s): 0.42 | learning rate: 9.676E-05 | global batch size: 256 | lm loss: 2.922523E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.357 | TFLOPs: 31.76 | +7: iteration 95700/ 173500 | consumed samples: 24499200 | consumed tokens: 50174361600 | elapsed time per iteration (s): 0.43 | learning rate: 9.674E-05 | global batch size: 256 | lm loss: 2.932692E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.961 | TFLOPs: 31.58 | +7: iteration 95710/ 173500 | consumed samples: 24501760 | consumed tokens: 50179604480 | elapsed time per iteration (s): 0.42 | learning rate: 9.673E-05 | global batch size: 256 | lm loss: 2.950871E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.350 | TFLOPs: 31.76 | +7: iteration 95720/ 173500 | consumed samples: 24504320 | consumed tokens: 50184847360 | elapsed time per iteration (s): 0.42 | learning rate: 9.671E-05 | global batch size: 256 | lm loss: 2.925421E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.143 | TFLOPs: 31.96 | +7: iteration 95730/ 173500 | consumed samples: 24506880 | consumed tokens: 50190090240 | elapsed time per iteration (s): 0.42 | learning rate: 9.670E-05 | global batch size: 256 | lm loss: 2.923586E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.046 | TFLOPs: 31.80 | +7: iteration 95740/ 173500 | consumed samples: 24509440 | consumed tokens: 50195333120 | elapsed time per iteration (s): 0.42 | learning rate: 9.668E-05 | global batch size: 256 | lm loss: 2.923926E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.263 | TFLOPs: 31.97 | +7: iteration 95750/ 173500 | consumed samples: 24512000 | consumed tokens: 50200576000 | elapsed time per iteration (s): 0.42 | learning rate: 9.666E-05 | global batch size: 256 | lm loss: 2.920785E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.977 | TFLOPs: 31.79 | +7: iteration 95760/ 173500 | consumed samples: 24514560 | consumed tokens: 50205818880 | elapsed time per iteration (s): 0.42 | learning rate: 9.665E-05 | global batch size: 256 | lm loss: 2.928796E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.622 | TFLOPs: 31.83 | +7: iteration 95770/ 173500 | consumed samples: 24517120 | consumed tokens: 50211061760 | elapsed time per iteration (s): 0.42 | learning rate: 9.663E-05 | global batch size: 256 | lm loss: 2.938106E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.086 | TFLOPs: 31.85 | +7: iteration 95780/ 173500 | consumed samples: 24519680 | consumed tokens: 50216304640 | elapsed time per iteration (s): 0.43 | learning rate: 9.661E-05 | global batch size: 256 | lm loss: 2.929118E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.041 | TFLOPs: 31.59 | +7: iteration 95790/ 173500 | consumed samples: 24522240 | consumed tokens: 50221547520 | elapsed time per iteration (s): 0.43 | learning rate: 9.660E-05 | global batch size: 256 | lm loss: 2.936596E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.184 | TFLOPs: 31.60 | +7: iteration 95800/ 173500 | consumed samples: 24524800 | consumed tokens: 50226790400 | elapsed time per iteration (s): 0.42 | learning rate: 9.658E-05 | global batch size: 256 | lm loss: 2.935345E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.050 | TFLOPs: 31.96 | +7: iteration 95810/ 173500 | consumed samples: 24527360 | consumed tokens: 50232033280 | elapsed time per iteration (s): 0.42 | learning rate: 9.657E-05 | global batch size: 256 | lm loss: 2.940854E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.923 | TFLOPs: 31.84 | +7: iteration 95820/ 173500 | consumed samples: 24529920 | consumed tokens: 50237276160 | elapsed time per iteration (s): 0.47 | learning rate: 9.655E-05 | global batch size: 256 | lm loss: 2.925093E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.808 | TFLOPs: 28.43 | +7: iteration 95830/ 173500 | consumed samples: 24532480 | consumed tokens: 50242519040 | elapsed time per iteration (s): 0.43 | learning rate: 9.653E-05 | global batch size: 256 | lm loss: 2.928525E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.010 | TFLOPs: 31.48 | +7: iteration 95840/ 173500 | consumed samples: 24535040 | consumed tokens: 50247761920 | elapsed time per iteration (s): 0.42 | learning rate: 9.652E-05 | global batch size: 256 | lm loss: 2.927439E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.665 | TFLOPs: 31.62 | +7: iteration 95850/ 173500 | consumed samples: 24537600 | consumed tokens: 50253004800 | elapsed time per iteration (s): 0.43 | learning rate: 9.650E-05 | global batch size: 256 | lm loss: 2.944028E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.136 | TFLOPs: 31.17 | +7: iteration 95860/ 173500 | consumed samples: 24540160 | consumed tokens: 50258247680 | elapsed time per iteration (s): 0.43 | learning rate: 9.648E-05 | global batch size: 256 | lm loss: 2.926927E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.758 | TFLOPs: 31.36 | +7: iteration 95870/ 173500 | consumed samples: 24542720 | consumed tokens: 50263490560 | elapsed time per iteration (s): 0.42 | learning rate: 9.647E-05 | global batch size: 256 | lm loss: 2.927456E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.086 | TFLOPs: 31.64 | +7: iteration 95880/ 173500 | consumed samples: 24545280 | consumed tokens: 50268733440 | elapsed time per iteration (s): 0.42 | learning rate: 9.645E-05 | global batch size: 256 | lm loss: 2.942443E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.633 | TFLOPs: 31.62 | +7: iteration 95890/ 173500 | consumed samples: 24547840 | consumed tokens: 50273976320 | elapsed time per iteration (s): 0.42 | learning rate: 9.643E-05 | global batch size: 256 | lm loss: 2.930095E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.967 | TFLOPs: 31.74 | +7: iteration 95900/ 173500 | consumed samples: 24550400 | consumed tokens: 50279219200 | elapsed time per iteration (s): 0.42 | learning rate: 9.642E-05 | global batch size: 256 | lm loss: 2.933250E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.684 | TFLOPs: 31.99 | +7: iteration 95910/ 173500 | consumed samples: 24552960 | consumed tokens: 50284462080 | elapsed time per iteration (s): 0.43 | learning rate: 9.640E-05 | global batch size: 256 | lm loss: 2.925297E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.953 | TFLOPs: 31.53 | +7: iteration 95920/ 173500 | consumed samples: 24555520 | consumed tokens: 50289704960 | elapsed time per iteration (s): 0.42 | learning rate: 9.639E-05 | global batch size: 256 | lm loss: 2.935910E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.284 | TFLOPs: 31.76 | +7: iteration 95930/ 173500 | consumed samples: 24558080 | consumed tokens: 50294947840 | elapsed time per iteration (s): 0.42 | learning rate: 9.637E-05 | global batch size: 256 | lm loss: 2.932254E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.231 | TFLOPs: 31.76 | +7: iteration 95940/ 173500 | consumed samples: 24560640 | consumed tokens: 50300190720 | elapsed time per iteration (s): 0.42 | learning rate: 9.635E-05 | global batch size: 256 | lm loss: 2.927573E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.329 | TFLOPs: 31.66 | +7: iteration 95950/ 173500 | consumed samples: 24563200 | consumed tokens: 50305433600 | elapsed time per iteration (s): 0.43 | learning rate: 9.634E-05 | global batch size: 256 | lm loss: 2.916783E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.343 | TFLOPs: 31.55 | +7: iteration 95960/ 173500 | consumed samples: 24565760 | consumed tokens: 50310676480 | elapsed time per iteration (s): 0.43 | learning rate: 9.632E-05 | global batch size: 256 | lm loss: 2.928734E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.924 | TFLOPs: 31.32 | +7: iteration 95970/ 173500 | consumed samples: 24568320 | consumed tokens: 50315919360 | elapsed time per iteration (s): 0.42 | learning rate: 9.630E-05 | global batch size: 256 | lm loss: 2.942115E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.804 | TFLOPs: 31.79 | +7: iteration 95980/ 173500 | consumed samples: 24570880 | consumed tokens: 50321162240 | elapsed time per iteration (s): 0.42 | learning rate: 9.629E-05 | global batch size: 256 | lm loss: 2.926770E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.972 | TFLOPs: 31.79 | +7: iteration 95990/ 173500 | consumed samples: 24573440 | consumed tokens: 50326405120 | elapsed time per iteration (s): 0.42 | learning rate: 9.627E-05 | global batch size: 256 | lm loss: 2.924195E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.627 | TFLOPs: 31.62 | +0: [2023-03-17 10:34:45,558] [INFO] [logging.py:68:log_dist] [Rank 0] step=96000, skipped=0, lr=[9.625601507010446e-05, 9.625601507010446e-05, 9.625601507010446e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 96000/ 173500 | consumed samples: 24576000 | consumed tokens: 50331648000 | elapsed time per iteration (s): 0.42 | learning rate: 9.626E-05 | global batch size: 256 | lm loss: 2.923021E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.030 | TFLOPs: 31.95 | +0: steps: 96000 loss: 2.9372 iter time (s): 0.425 samples/sec: 602.009 +7: iteration 96010/ 173500 | consumed samples: 24578560 | consumed tokens: 50336890880 | elapsed time per iteration (s): 0.42 | learning rate: 9.624E-05 | global batch size: 256 | lm loss: 2.935173E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.134 | TFLOPs: 31.70 | +7: iteration 96020/ 173500 | consumed samples: 24581120 | consumed tokens: 50342133760 | elapsed time per iteration (s): 0.43 | learning rate: 9.622E-05 | global batch size: 256 | lm loss: 2.913526E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.728 | TFLOPs: 31.47 | +7: iteration 96030/ 173500 | consumed samples: 24583680 | consumed tokens: 50347376640 | elapsed time per iteration (s): 0.42 | learning rate: 9.621E-05 | global batch size: 256 | lm loss: 2.936440E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.650 | TFLOPs: 31.78 | +7: iteration 96040/ 173500 | consumed samples: 24586240 | consumed tokens: 50352619520 | elapsed time per iteration (s): 0.42 | learning rate: 9.619E-05 | global batch size: 256 | lm loss: 2.935745E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.031 | TFLOPs: 31.75 | +7: iteration 96050/ 173500 | consumed samples: 24588800 | consumed tokens: 50357862400 | elapsed time per iteration (s): 0.42 | learning rate: 9.617E-05 | global batch size: 256 | lm loss: 2.931654E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.047 | TFLOPs: 31.85 | +7: iteration 96060/ 173500 | consumed samples: 24591360 | consumed tokens: 50363105280 | elapsed time per iteration (s): 0.43 | learning rate: 9.616E-05 | global batch size: 256 | lm loss: 2.909262E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.292 | TFLOPs: 31.34 | +7: iteration 96070/ 173500 | consumed samples: 24593920 | consumed tokens: 50368348160 | elapsed time per iteration (s): 0.43 | learning rate: 9.614E-05 | global batch size: 256 | lm loss: 2.924607E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.168 | TFLOPs: 31.18 | +7: iteration 96080/ 173500 | consumed samples: 24596480 | consumed tokens: 50373591040 | elapsed time per iteration (s): 0.42 | learning rate: 9.613E-05 | global batch size: 256 | lm loss: 2.925605E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.206 | TFLOPs: 31.96 | +7: iteration 96090/ 173500 | consumed samples: 24599040 | consumed tokens: 50378833920 | elapsed time per iteration (s): 0.43 | learning rate: 9.611E-05 | global batch size: 256 | lm loss: 2.928825E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.000 | TFLOPs: 31.48 | +7: iteration 96100/ 173500 | consumed samples: 24601600 | consumed tokens: 50384076800 | elapsed time per iteration (s): 0.42 | learning rate: 9.609E-05 | global batch size: 256 | lm loss: 2.934295E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.041 | TFLOPs: 31.75 | +7: iteration 96110/ 173500 | consumed samples: 24604160 | consumed tokens: 50389319680 | elapsed time per iteration (s): 0.43 | learning rate: 9.608E-05 | global batch size: 256 | lm loss: 2.931151E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.546 | TFLOPs: 31.56 | +7: iteration 96120/ 173500 | consumed samples: 24606720 | consumed tokens: 50394562560 | elapsed time per iteration (s): 0.43 | learning rate: 9.606E-05 | global batch size: 256 | lm loss: 2.923884E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.063 | TFLOPs: 31.38 | +7: iteration 96130/ 173500 | consumed samples: 24609280 | consumed tokens: 50399805440 | elapsed time per iteration (s): 0.42 | learning rate: 9.604E-05 | global batch size: 256 | lm loss: 2.930440E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.917 | TFLOPs: 31.79 | +7: iteration 96140/ 173500 | consumed samples: 24611840 | consumed tokens: 50405048320 | elapsed time per iteration (s): 0.42 | learning rate: 9.603E-05 | global batch size: 256 | lm loss: 2.920417E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.696 | TFLOPs: 31.94 | +7: iteration 96150/ 173500 | consumed samples: 24614400 | consumed tokens: 50410291200 | elapsed time per iteration (s): 0.42 | learning rate: 9.601E-05 | global batch size: 256 | lm loss: 2.923576E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.153 | TFLOPs: 31.96 | +7: iteration 96160/ 173500 | consumed samples: 24616960 | consumed tokens: 50415534080 | elapsed time per iteration (s): 0.42 | learning rate: 9.600E-05 | global batch size: 256 | lm loss: 2.918612E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.630 | TFLOPs: 31.72 | +7: iteration 96170/ 173500 | consumed samples: 24619520 | consumed tokens: 50420776960 | elapsed time per iteration (s): 0.42 | learning rate: 9.598E-05 | global batch size: 256 | lm loss: 2.925661E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.813 | TFLOPs: 31.73 | +7: iteration 96180/ 173500 | consumed samples: 24622080 | consumed tokens: 50426019840 | elapsed time per iteration (s): 0.42 | learning rate: 9.596E-05 | global batch size: 256 | lm loss: 2.933950E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.662 | TFLOPs: 31.62 | +7: iteration 96190/ 173500 | consumed samples: 24624640 | consumed tokens: 50431262720 | elapsed time per iteration (s): 0.42 | learning rate: 9.595E-05 | global batch size: 256 | lm loss: 2.923445E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.261 | TFLOPs: 31.70 | +7: iteration 96200/ 173500 | consumed samples: 24627200 | consumed tokens: 50436505600 | elapsed time per iteration (s): 0.42 | learning rate: 9.593E-05 | global batch size: 256 | lm loss: 2.925101E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.556 | TFLOPs: 31.72 | +7: iteration 96210/ 173500 | consumed samples: 24629760 | consumed tokens: 50441748480 | elapsed time per iteration (s): 0.44 | learning rate: 9.591E-05 | global batch size: 256 | lm loss: 2.925927E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.134 | TFLOPs: 30.81 | +7: iteration 96220/ 173500 | consumed samples: 24632320 | consumed tokens: 50446991360 | elapsed time per iteration (s): 0.42 | learning rate: 9.590E-05 | global batch size: 256 | lm loss: 2.924405E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.375 | TFLOPs: 31.76 | +7: iteration 96230/ 173500 | consumed samples: 24634880 | consumed tokens: 50452234240 | elapsed time per iteration (s): 0.42 | learning rate: 9.588E-05 | global batch size: 256 | lm loss: 2.933072E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.310 | TFLOPs: 31.76 | +7: iteration 96240/ 173500 | consumed samples: 24637440 | consumed tokens: 50457477120 | elapsed time per iteration (s): 0.42 | learning rate: 9.587E-05 | global batch size: 256 | lm loss: 2.927197E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.027 | TFLOPs: 31.69 | +7: iteration 96250/ 173500 | consumed samples: 24640000 | consumed tokens: 50462720000 | elapsed time per iteration (s): 0.43 | learning rate: 9.585E-05 | global batch size: 256 | lm loss: 2.922093E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.068 | TFLOPs: 31.48 | +7: iteration 96260/ 173500 | consumed samples: 24642560 | consumed tokens: 50467962880 | elapsed time per iteration (s): 0.43 | learning rate: 9.583E-05 | global batch size: 256 | lm loss: 2.936836E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.298 | TFLOPs: 31.34 | +7: iteration 96270/ 173500 | consumed samples: 24645120 | consumed tokens: 50473205760 | elapsed time per iteration (s): 0.43 | learning rate: 9.582E-05 | global batch size: 256 | lm loss: 2.926872E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.878 | TFLOPs: 31.53 | +7: iteration 96280/ 173500 | consumed samples: 24647680 | consumed tokens: 50478448640 | elapsed time per iteration (s): 0.43 | learning rate: 9.580E-05 | global batch size: 256 | lm loss: 2.921613E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.127 | TFLOPs: 31.23 | +7: iteration 96290/ 173500 | consumed samples: 24650240 | consumed tokens: 50483691520 | elapsed time per iteration (s): 0.42 | learning rate: 9.578E-05 | global batch size: 256 | lm loss: 2.910476E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.394 | TFLOPs: 31.71 | +7: iteration 96300/ 173500 | consumed samples: 24652800 | consumed tokens: 50488934400 | elapsed time per iteration (s): 0.42 | learning rate: 9.577E-05 | global batch size: 256 | lm loss: 2.918692E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.270 | TFLOPs: 31.91 | +7: iteration 96310/ 173500 | consumed samples: 24655360 | consumed tokens: 50494177280 | elapsed time per iteration (s): 0.44 | learning rate: 9.575E-05 | global batch size: 256 | lm loss: 2.926542E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.632 | TFLOPs: 30.41 | +7: iteration 96320/ 173500 | consumed samples: 24657920 | consumed tokens: 50499420160 | elapsed time per iteration (s): 0.42 | learning rate: 9.574E-05 | global batch size: 256 | lm loss: 2.928022E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.803 | TFLOPs: 31.73 | +7: iteration 96330/ 173500 | consumed samples: 24660480 | consumed tokens: 50504663040 | elapsed time per iteration (s): 0.42 | learning rate: 9.572E-05 | global batch size: 256 | lm loss: 2.931450E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.226 | TFLOPs: 31.65 | +7: iteration 96340/ 173500 | consumed samples: 24663040 | consumed tokens: 50509905920 | elapsed time per iteration (s): 0.43 | learning rate: 9.570E-05 | global batch size: 256 | lm loss: 2.927989E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.508 | TFLOPs: 31.35 | +7: iteration 96350/ 173500 | consumed samples: 24665600 | consumed tokens: 50515148800 | elapsed time per iteration (s): 0.42 | learning rate: 9.569E-05 | global batch size: 256 | lm loss: 2.928653E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.594 | TFLOPs: 31.62 | +7: iteration 96360/ 173500 | consumed samples: 24668160 | consumed tokens: 50520391680 | elapsed time per iteration (s): 0.42 | learning rate: 9.567E-05 | global batch size: 256 | lm loss: 2.921434E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.603 | TFLOPs: 31.62 | +7: iteration 96370/ 173500 | consumed samples: 24670720 | consumed tokens: 50525634560 | elapsed time per iteration (s): 0.43 | learning rate: 9.565E-05 | global batch size: 256 | lm loss: 2.924218E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.366 | TFLOPs: 31.34 | +7: iteration 96380/ 173500 | consumed samples: 24673280 | consumed tokens: 50530877440 | elapsed time per iteration (s): 0.43 | learning rate: 9.564E-05 | global batch size: 256 | lm loss: 2.918920E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.729 | TFLOPs: 31.57 | +7: iteration 96390/ 173500 | consumed samples: 24675840 | consumed tokens: 50536120320 | elapsed time per iteration (s): 0.42 | learning rate: 9.562E-05 | global batch size: 256 | lm loss: 2.909082E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.784 | TFLOPs: 31.63 | +7: iteration 96400/ 173500 | consumed samples: 24678400 | consumed tokens: 50541363200 | elapsed time per iteration (s): 0.42 | learning rate: 9.561E-05 | global batch size: 256 | lm loss: 2.933020E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.209 | TFLOPs: 31.70 | +7: iteration 96410/ 173500 | consumed samples: 24680960 | consumed tokens: 50546606080 | elapsed time per iteration (s): 0.43 | learning rate: 9.559E-05 | global batch size: 256 | lm loss: 2.917777E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.003 | TFLOPs: 31.53 | +7: iteration 96420/ 173500 | consumed samples: 24683520 | consumed tokens: 50551848960 | elapsed time per iteration (s): 0.42 | learning rate: 9.557E-05 | global batch size: 256 | lm loss: 2.929531E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.863 | TFLOPs: 31.74 | +7: iteration 96430/ 173500 | consumed samples: 24686080 | consumed tokens: 50557091840 | elapsed time per iteration (s): 0.43 | learning rate: 9.556E-05 | global batch size: 256 | lm loss: 2.931935E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.121 | TFLOPs: 31.59 | +7: iteration 96440/ 173500 | consumed samples: 24688640 | consumed tokens: 50562334720 | elapsed time per iteration (s): 0.42 | learning rate: 9.554E-05 | global batch size: 256 | lm loss: 2.938441E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.250 | TFLOPs: 31.97 | +7: iteration 96450/ 173500 | consumed samples: 24691200 | consumed tokens: 50567577600 | elapsed time per iteration (s): 0.42 | learning rate: 9.552E-05 | global batch size: 256 | lm loss: 2.924203E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.418 | TFLOPs: 31.82 | +7: iteration 96460/ 173500 | consumed samples: 24693760 | consumed tokens: 50572820480 | elapsed time per iteration (s): 0.43 | learning rate: 9.551E-05 | global batch size: 256 | lm loss: 2.933868E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.138 | TFLOPs: 31.38 | +7: iteration 96470/ 173500 | consumed samples: 24696320 | consumed tokens: 50578063360 | elapsed time per iteration (s): 0.42 | learning rate: 9.549E-05 | global batch size: 256 | lm loss: 2.928437E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.932 | TFLOPs: 31.74 | +7: iteration 96480/ 173500 | consumed samples: 24698880 | consumed tokens: 50583306240 | elapsed time per iteration (s): 0.43 | learning rate: 9.548E-05 | global batch size: 256 | lm loss: 2.917221E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.114 | TFLOPs: 31.59 | +7: iteration 96490/ 173500 | consumed samples: 24701440 | consumed tokens: 50588549120 | elapsed time per iteration (s): 0.42 | learning rate: 9.546E-05 | global batch size: 256 | lm loss: 2.913547E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.583 | TFLOPs: 31.77 | +7: iteration 96500/ 173500 | consumed samples: 24704000 | consumed tokens: 50593792000 | elapsed time per iteration (s): 0.42 | learning rate: 9.544E-05 | global batch size: 256 | lm loss: 2.928782E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.991 | TFLOPs: 31.69 | +7: iteration 96510/ 173500 | consumed samples: 24706560 | consumed tokens: 50599034880 | elapsed time per iteration (s): 0.42 | learning rate: 9.543E-05 | global batch size: 256 | lm loss: 2.935117E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.231 | TFLOPs: 31.97 | +7: iteration 96520/ 173500 | consumed samples: 24709120 | consumed tokens: 50604277760 | elapsed time per iteration (s): 0.42 | learning rate: 9.541E-05 | global batch size: 256 | lm loss: 2.929114E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.970 | TFLOPs: 31.95 | +7: iteration 96530/ 173500 | consumed samples: 24711680 | consumed tokens: 50609520640 | elapsed time per iteration (s): 0.42 | learning rate: 9.539E-05 | global batch size: 256 | lm loss: 2.928395E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.947 | TFLOPs: 31.79 | +7: iteration 96540/ 173500 | consumed samples: 24714240 | consumed tokens: 50614763520 | elapsed time per iteration (s): 0.42 | learning rate: 9.538E-05 | global batch size: 256 | lm loss: 2.920131E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.346 | TFLOPs: 31.81 | +7: iteration 96550/ 173500 | consumed samples: 24716800 | consumed tokens: 50620006400 | elapsed time per iteration (s): 0.42 | learning rate: 9.536E-05 | global batch size: 256 | lm loss: 2.928236E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.148 | TFLOPs: 31.96 | +7: iteration 96560/ 173500 | consumed samples: 24719360 | consumed tokens: 50625249280 | elapsed time per iteration (s): 0.72 | learning rate: 9.535E-05 | global batch size: 256 | lm loss: 2.925616E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 356.763 | TFLOPs: 18.72 | +7: iteration 96570/ 173500 | consumed samples: 24721920 | consumed tokens: 50630492160 | elapsed time per iteration (s): 0.42 | learning rate: 9.533E-05 | global batch size: 256 | lm loss: 2.933456E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.773 | TFLOPs: 32.26 | +7: iteration 96580/ 173500 | consumed samples: 24724480 | consumed tokens: 50635735040 | elapsed time per iteration (s): 0.42 | learning rate: 9.531E-05 | global batch size: 256 | lm loss: 2.924367E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.756 | TFLOPs: 32.20 | +7: iteration 96590/ 173500 | consumed samples: 24727040 | consumed tokens: 50640977920 | elapsed time per iteration (s): 0.42 | learning rate: 9.530E-05 | global batch size: 256 | lm loss: 2.939291E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.016 | TFLOPs: 31.95 | +7: iteration 96600/ 173500 | consumed samples: 24729600 | consumed tokens: 50646220800 | elapsed time per iteration (s): 0.43 | learning rate: 9.528E-05 | global batch size: 256 | lm loss: 2.920397E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.484 | TFLOPs: 31.30 | +7: iteration 96610/ 173500 | consumed samples: 24732160 | consumed tokens: 50651463680 | elapsed time per iteration (s): 0.42 | learning rate: 9.526E-05 | global batch size: 256 | lm loss: 2.932401E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.987 | TFLOPs: 31.85 | +7: iteration 96620/ 173500 | consumed samples: 24734720 | consumed tokens: 50656706560 | elapsed time per iteration (s): 0.42 | learning rate: 9.525E-05 | global batch size: 256 | lm loss: 2.918405E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.540 | TFLOPs: 31.77 | +7: iteration 96630/ 173500 | consumed samples: 24737280 | consumed tokens: 50661949440 | elapsed time per iteration (s): 0.76 | learning rate: 9.523E-05 | global batch size: 256 | lm loss: 2.929202E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 334.904 | TFLOPs: 17.57 | +7: iteration 96640/ 173500 | consumed samples: 24739840 | consumed tokens: 50667192320 | elapsed time per iteration (s): 0.42 | learning rate: 9.522E-05 | global batch size: 256 | lm loss: 2.932675E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.702 | TFLOPs: 31.73 | +7: iteration 96650/ 173500 | consumed samples: 24742400 | consumed tokens: 50672435200 | elapsed time per iteration (s): 0.43 | learning rate: 9.520E-05 | global batch size: 256 | lm loss: 2.928409E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.122 | TFLOPs: 30.96 | +7: iteration 96660/ 173500 | consumed samples: 24744960 | consumed tokens: 50677678080 | elapsed time per iteration (s): 0.42 | learning rate: 9.518E-05 | global batch size: 256 | lm loss: 2.938133E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.745 | TFLOPs: 31.94 | +7: iteration 96670/ 173500 | consumed samples: 24747520 | consumed tokens: 50682920960 | elapsed time per iteration (s): 0.43 | learning rate: 9.517E-05 | global batch size: 256 | lm loss: 2.932757E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.758 | TFLOPs: 31.52 | +7: iteration 96680/ 173500 | consumed samples: 24750080 | consumed tokens: 50688163840 | elapsed time per iteration (s): 0.42 | learning rate: 9.515E-05 | global batch size: 256 | lm loss: 2.935409E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.829 | TFLOPs: 31.89 | +7: iteration 96690/ 173500 | consumed samples: 24752640 | consumed tokens: 50693406720 | elapsed time per iteration (s): 0.42 | learning rate: 9.513E-05 | global batch size: 256 | lm loss: 2.925496E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.409 | TFLOPs: 31.97 | +7: iteration 96700/ 173500 | consumed samples: 24755200 | consumed tokens: 50698649600 | elapsed time per iteration (s): 0.42 | learning rate: 9.512E-05 | global batch size: 256 | lm loss: 2.937358E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.643 | TFLOPs: 32.09 | +7: iteration 96710/ 173500 | consumed samples: 24757760 | consumed tokens: 50703892480 | elapsed time per iteration (s): 0.43 | learning rate: 9.510E-05 | global batch size: 256 | lm loss: 2.918082E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.410 | TFLOPs: 31.55 | +7: iteration 96720/ 173500 | consumed samples: 24760320 | consumed tokens: 50709135360 | elapsed time per iteration (s): 0.42 | learning rate: 9.509E-05 | global batch size: 256 | lm loss: 2.930389E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.213 | TFLOPs: 31.86 | +7: iteration 96730/ 173500 | consumed samples: 24762880 | consumed tokens: 50714378240 | elapsed time per iteration (s): 0.42 | learning rate: 9.507E-05 | global batch size: 256 | lm loss: 2.917854E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.582 | TFLOPs: 31.77 | +7: iteration 96740/ 173500 | consumed samples: 24765440 | consumed tokens: 50719621120 | elapsed time per iteration (s): 0.44 | learning rate: 9.505E-05 | global batch size: 256 | lm loss: 2.947289E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.466 | TFLOPs: 30.82 | +7: iteration 96750/ 173500 | consumed samples: 24768000 | consumed tokens: 50724864000 | elapsed time per iteration (s): 0.42 | learning rate: 9.504E-05 | global batch size: 256 | lm loss: 2.917412E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.770 | TFLOPs: 32.10 | +7: iteration 96760/ 173500 | consumed samples: 24770560 | consumed tokens: 50730106880 | elapsed time per iteration (s): 0.42 | learning rate: 9.502E-05 | global batch size: 256 | lm loss: 2.922714E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.367 | TFLOPs: 31.87 | +7: iteration 96770/ 173500 | consumed samples: 24773120 | consumed tokens: 50735349760 | elapsed time per iteration (s): 0.43 | learning rate: 9.500E-05 | global batch size: 256 | lm loss: 2.935333E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.330 | TFLOPs: 31.60 | +7: iteration 96780/ 173500 | consumed samples: 24775680 | consumed tokens: 50740592640 | elapsed time per iteration (s): 0.42 | learning rate: 9.499E-05 | global batch size: 256 | lm loss: 2.936723E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.226 | TFLOPs: 31.65 | +7: iteration 96790/ 173500 | consumed samples: 24778240 | consumed tokens: 50745835520 | elapsed time per iteration (s): 0.42 | learning rate: 9.497E-05 | global batch size: 256 | lm loss: 2.923804E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.934 | TFLOPs: 31.84 | +7: iteration 96800/ 173500 | consumed samples: 24780800 | consumed tokens: 50751078400 | elapsed time per iteration (s): 0.42 | learning rate: 9.496E-05 | global batch size: 256 | lm loss: 2.924659E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.667 | TFLOPs: 31.83 | +7: iteration 96810/ 173500 | consumed samples: 24783360 | consumed tokens: 50756321280 | elapsed time per iteration (s): 0.42 | learning rate: 9.494E-05 | global batch size: 256 | lm loss: 2.905751E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.278 | TFLOPs: 31.86 | +7: iteration 96820/ 173500 | consumed samples: 24785920 | consumed tokens: 50761564160 | elapsed time per iteration (s): 0.43 | learning rate: 9.492E-05 | global batch size: 256 | lm loss: 2.935122E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.732 | TFLOPs: 31.57 | +7: iteration 96830/ 173500 | consumed samples: 24788480 | consumed tokens: 50766807040 | elapsed time per iteration (s): 0.42 | learning rate: 9.491E-05 | global batch size: 256 | lm loss: 2.922419E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.341 | TFLOPs: 32.02 | +7: iteration 96840/ 173500 | consumed samples: 24791040 | consumed tokens: 50772049920 | elapsed time per iteration (s): 0.42 | learning rate: 9.489E-05 | global batch size: 256 | lm loss: 2.922452E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.987 | TFLOPs: 31.90 | +7: iteration 96850/ 173500 | consumed samples: 24793600 | consumed tokens: 50777292800 | elapsed time per iteration (s): 0.42 | learning rate: 9.487E-05 | global batch size: 256 | lm loss: 2.921495E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.009 | TFLOPs: 31.80 | +7: iteration 96860/ 173500 | consumed samples: 24796160 | consumed tokens: 50782535680 | elapsed time per iteration (s): 0.42 | learning rate: 9.486E-05 | global batch size: 256 | lm loss: 2.938827E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.794 | TFLOPs: 31.84 | +7: iteration 96870/ 173500 | consumed samples: 24798720 | consumed tokens: 50787778560 | elapsed time per iteration (s): 0.43 | learning rate: 9.484E-05 | global batch size: 256 | lm loss: 2.922104E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.355 | TFLOPs: 31.34 | +7: iteration 96880/ 173500 | consumed samples: 24801280 | consumed tokens: 50793021440 | elapsed time per iteration (s): 0.42 | learning rate: 9.483E-05 | global batch size: 256 | lm loss: 2.922833E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.965 | TFLOPs: 31.74 | +7: iteration 96890/ 173500 | consumed samples: 24803840 | consumed tokens: 50798264320 | elapsed time per iteration (s): 0.43 | learning rate: 9.481E-05 | global batch size: 256 | lm loss: 2.923941E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.144 | TFLOPs: 31.49 | +7: iteration 96900/ 173500 | consumed samples: 24806400 | consumed tokens: 50803507200 | elapsed time per iteration (s): 0.43 | learning rate: 9.479E-05 | global batch size: 256 | lm loss: 2.934758E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.603 | TFLOPs: 31.57 | +7: iteration 96910/ 173500 | consumed samples: 24808960 | consumed tokens: 50808750080 | elapsed time per iteration (s): 0.42 | learning rate: 9.478E-05 | global batch size: 256 | lm loss: 2.930060E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.824 | TFLOPs: 31.63 | +7: iteration 96920/ 173500 | consumed samples: 24811520 | consumed tokens: 50813992960 | elapsed time per iteration (s): 0.42 | learning rate: 9.476E-05 | global batch size: 256 | lm loss: 2.946884E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.359 | TFLOPs: 31.81 | +7: iteration 96930/ 173500 | consumed samples: 24814080 | consumed tokens: 50819235840 | elapsed time per iteration (s): 0.43 | learning rate: 9.475E-05 | global batch size: 256 | lm loss: 2.926479E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.129 | TFLOPs: 31.49 | +7: iteration 96940/ 173500 | consumed samples: 24816640 | consumed tokens: 50824478720 | elapsed time per iteration (s): 0.42 | learning rate: 9.473E-05 | global batch size: 256 | lm loss: 2.928065E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.227 | TFLOPs: 31.91 | +7: iteration 96950/ 173500 | consumed samples: 24819200 | consumed tokens: 50829721600 | elapsed time per iteration (s): 0.43 | learning rate: 9.471E-05 | global batch size: 256 | lm loss: 2.945893E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.402 | TFLOPs: 31.55 | +7: iteration 96960/ 173500 | consumed samples: 24821760 | consumed tokens: 50834964480 | elapsed time per iteration (s): 0.42 | learning rate: 9.470E-05 | global batch size: 256 | lm loss: 2.931971E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.385 | TFLOPs: 31.66 | +7: iteration 96970/ 173500 | consumed samples: 24824320 | consumed tokens: 50840207360 | elapsed time per iteration (s): 0.42 | learning rate: 9.468E-05 | global batch size: 256 | lm loss: 2.913511E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.274 | TFLOPs: 31.65 | +7: iteration 96980/ 173500 | consumed samples: 24826880 | consumed tokens: 50845450240 | elapsed time per iteration (s): 0.42 | learning rate: 9.466E-05 | global batch size: 256 | lm loss: 2.936930E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.565 | TFLOPs: 31.88 | +7: iteration 96990/ 173500 | consumed samples: 24829440 | consumed tokens: 50850693120 | elapsed time per iteration (s): 0.42 | learning rate: 9.465E-05 | global batch size: 256 | lm loss: 2.929769E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.501 | TFLOPs: 31.82 | +7: iteration 97000/ 173500 | consumed samples: 24832000 | consumed tokens: 50855936000 | elapsed time per iteration (s): 0.42 | learning rate: 9.463E-05 | global batch size: 256 | lm loss: 2.930958E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.876 | TFLOPs: 31.79 | +7: iteration 97010/ 173500 | consumed samples: 24834560 | consumed tokens: 50861178880 | elapsed time per iteration (s): 0.43 | learning rate: 9.462E-05 | global batch size: 256 | lm loss: 2.921922E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.372 | TFLOPs: 31.45 | +7: iteration 97020/ 173500 | consumed samples: 24837120 | consumed tokens: 50866421760 | elapsed time per iteration (s): 0.42 | learning rate: 9.460E-05 | global batch size: 256 | lm loss: 2.944671E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.238 | TFLOPs: 31.70 | +7: iteration 97030/ 173500 | consumed samples: 24839680 | consumed tokens: 50871664640 | elapsed time per iteration (s): 0.42 | learning rate: 9.458E-05 | global batch size: 256 | lm loss: 2.933009E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.414 | TFLOPs: 31.71 | +7: iteration 97040/ 173500 | consumed samples: 24842240 | consumed tokens: 50876907520 | elapsed time per iteration (s): 0.42 | learning rate: 9.457E-05 | global batch size: 256 | lm loss: 2.931243E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.583 | TFLOPs: 31.83 | +7: iteration 97050/ 173500 | consumed samples: 24844800 | consumed tokens: 50882150400 | elapsed time per iteration (s): 0.42 | learning rate: 9.455E-05 | global batch size: 256 | lm loss: 2.926206E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.160 | TFLOPs: 32.01 | +7: iteration 97060/ 173500 | consumed samples: 24847360 | consumed tokens: 50887393280 | elapsed time per iteration (s): 0.42 | learning rate: 9.453E-05 | global batch size: 256 | lm loss: 2.931879E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.187 | TFLOPs: 31.70 | +7: iteration 97070/ 173500 | consumed samples: 24849920 | consumed tokens: 50892636160 | elapsed time per iteration (s): 0.43 | learning rate: 9.452E-05 | global batch size: 256 | lm loss: 2.938586E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.811 | TFLOPs: 31.42 | +7: iteration 97080/ 173500 | consumed samples: 24852480 | consumed tokens: 50897879040 | elapsed time per iteration (s): 0.42 | learning rate: 9.450E-05 | global batch size: 256 | lm loss: 2.916599E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.871 | TFLOPs: 31.95 | +7: iteration 97090/ 173500 | consumed samples: 24855040 | consumed tokens: 50903121920 | elapsed time per iteration (s): 0.42 | learning rate: 9.449E-05 | global batch size: 256 | lm loss: 2.942446E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.261 | TFLOPs: 31.81 | +7: iteration 97100/ 173500 | consumed samples: 24857600 | consumed tokens: 50908364800 | elapsed time per iteration (s): 0.42 | learning rate: 9.447E-05 | global batch size: 256 | lm loss: 2.936401E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.740 | TFLOPs: 31.78 | +7: iteration 97110/ 173500 | consumed samples: 24860160 | consumed tokens: 50913607680 | elapsed time per iteration (s): 0.43 | learning rate: 9.445E-05 | global batch size: 256 | lm loss: 2.909857E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.262 | TFLOPs: 31.60 | +7: iteration 97120/ 173500 | consumed samples: 24862720 | consumed tokens: 50918850560 | elapsed time per iteration (s): 0.42 | learning rate: 9.444E-05 | global batch size: 256 | lm loss: 2.915000E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.080 | TFLOPs: 31.64 | +7: iteration 97130/ 173500 | consumed samples: 24865280 | consumed tokens: 50924093440 | elapsed time per iteration (s): 0.42 | learning rate: 9.442E-05 | global batch size: 256 | lm loss: 2.925734E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.717 | TFLOPs: 31.83 | +7: iteration 97140/ 173500 | consumed samples: 24867840 | consumed tokens: 50929336320 | elapsed time per iteration (s): 0.43 | learning rate: 9.440E-05 | global batch size: 256 | lm loss: 2.928937E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.303 | TFLOPs: 31.50 | +7: iteration 97150/ 173500 | consumed samples: 24870400 | consumed tokens: 50934579200 | elapsed time per iteration (s): 0.42 | learning rate: 9.439E-05 | global batch size: 256 | lm loss: 2.912972E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.625 | TFLOPs: 31.62 | +7: iteration 97160/ 173500 | consumed samples: 24872960 | consumed tokens: 50939822080 | elapsed time per iteration (s): 0.42 | learning rate: 9.437E-05 | global batch size: 256 | lm loss: 2.925329E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.195 | TFLOPs: 31.65 | +7: iteration 97170/ 173500 | consumed samples: 24875520 | consumed tokens: 50945064960 | elapsed time per iteration (s): 0.42 | learning rate: 9.436E-05 | global batch size: 256 | lm loss: 2.920537E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.675 | TFLOPs: 31.83 | +7: iteration 97180/ 173500 | consumed samples: 24878080 | consumed tokens: 50950307840 | elapsed time per iteration (s): 0.42 | learning rate: 9.434E-05 | global batch size: 256 | lm loss: 2.930651E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.837 | TFLOPs: 31.79 | +7: iteration 97190/ 173500 | consumed samples: 24880640 | consumed tokens: 50955550720 | elapsed time per iteration (s): 0.42 | learning rate: 9.432E-05 | global batch size: 256 | lm loss: 2.913970E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.344 | TFLOPs: 31.92 | +7: iteration 97200/ 173500 | consumed samples: 24883200 | consumed tokens: 50960793600 | elapsed time per iteration (s): 0.43 | learning rate: 9.431E-05 | global batch size: 256 | lm loss: 2.912947E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.692 | TFLOPs: 31.25 | +7: iteration 97210/ 173500 | consumed samples: 24885760 | consumed tokens: 50966036480 | elapsed time per iteration (s): 0.42 | learning rate: 9.429E-05 | global batch size: 256 | lm loss: 2.929017E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.020 | TFLOPs: 31.74 | +7: iteration 97220/ 173500 | consumed samples: 24888320 | consumed tokens: 50971279360 | elapsed time per iteration (s): 0.42 | learning rate: 9.427E-05 | global batch size: 256 | lm loss: 2.937650E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.123 | TFLOPs: 31.96 | +7: iteration 97230/ 173500 | consumed samples: 24890880 | consumed tokens: 50976522240 | elapsed time per iteration (s): 0.43 | learning rate: 9.426E-05 | global batch size: 256 | lm loss: 2.933538E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.227 | TFLOPs: 31.55 | +7: iteration 97240/ 173500 | consumed samples: 24893440 | consumed tokens: 50981765120 | elapsed time per iteration (s): 0.42 | learning rate: 9.424E-05 | global batch size: 256 | lm loss: 2.935749E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.327 | TFLOPs: 31.92 | +7: iteration 97250/ 173500 | consumed samples: 24896000 | consumed tokens: 50987008000 | elapsed time per iteration (s): 0.42 | learning rate: 9.423E-05 | global batch size: 256 | lm loss: 2.921569E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.895 | TFLOPs: 31.95 | +7: iteration 97260/ 173500 | consumed samples: 24898560 | consumed tokens: 50992250880 | elapsed time per iteration (s): 0.42 | learning rate: 9.421E-05 | global batch size: 256 | lm loss: 2.936127E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.816 | TFLOPs: 31.73 | +7: iteration 97270/ 173500 | consumed samples: 24901120 | consumed tokens: 50997493760 | elapsed time per iteration (s): 0.42 | learning rate: 9.419E-05 | global batch size: 256 | lm loss: 2.933310E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.310 | TFLOPs: 31.92 | +7: iteration 97280/ 173500 | consumed samples: 24903680 | consumed tokens: 51002736640 | elapsed time per iteration (s): 0.42 | learning rate: 9.418E-05 | global batch size: 256 | lm loss: 2.924106E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.980 | TFLOPs: 31.90 | +7: iteration 97290/ 173500 | consumed samples: 24906240 | consumed tokens: 51007979520 | elapsed time per iteration (s): 0.42 | learning rate: 9.416E-05 | global batch size: 256 | lm loss: 2.914246E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.403 | TFLOPs: 31.92 | +7: iteration 97300/ 173500 | consumed samples: 24908800 | consumed tokens: 51013222400 | elapsed time per iteration (s): 0.42 | learning rate: 9.415E-05 | global batch size: 256 | lm loss: 2.931809E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.262 | TFLOPs: 31.70 | +7: iteration 97310/ 173500 | consumed samples: 24911360 | consumed tokens: 51018465280 | elapsed time per iteration (s): 0.42 | learning rate: 9.413E-05 | global batch size: 256 | lm loss: 2.918877E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.117 | TFLOPs: 31.85 | +7: iteration 97320/ 173500 | consumed samples: 24913920 | consumed tokens: 51023708160 | elapsed time per iteration (s): 0.43 | learning rate: 9.411E-05 | global batch size: 256 | lm loss: 2.922173E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.834 | TFLOPs: 31.58 | +7: iteration 97330/ 173500 | consumed samples: 24916480 | consumed tokens: 51028951040 | elapsed time per iteration (s): 0.42 | learning rate: 9.410E-05 | global batch size: 256 | lm loss: 2.932947E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.289 | TFLOPs: 31.76 | +7: iteration 97340/ 173500 | consumed samples: 24919040 | consumed tokens: 51034193920 | elapsed time per iteration (s): 0.42 | learning rate: 9.408E-05 | global batch size: 256 | lm loss: 2.928378E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.436 | TFLOPs: 31.71 | +7: iteration 97350/ 173500 | consumed samples: 24921600 | consumed tokens: 51039436800 | elapsed time per iteration (s): 0.42 | learning rate: 9.406E-05 | global batch size: 256 | lm loss: 2.932510E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.692 | TFLOPs: 31.78 | +7: iteration 97360/ 173500 | consumed samples: 24924160 | consumed tokens: 51044679680 | elapsed time per iteration (s): 0.42 | learning rate: 9.405E-05 | global batch size: 256 | lm loss: 2.932224E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.833 | TFLOPs: 31.63 | +7: iteration 97370/ 173500 | consumed samples: 24926720 | consumed tokens: 51049922560 | elapsed time per iteration (s): 0.42 | learning rate: 9.403E-05 | global batch size: 256 | lm loss: 2.932960E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.454 | TFLOPs: 31.71 | +7: iteration 97380/ 173500 | consumed samples: 24929280 | consumed tokens: 51055165440 | elapsed time per iteration (s): 0.43 | learning rate: 9.402E-05 | global batch size: 256 | lm loss: 2.914288E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.675 | TFLOPs: 31.57 | +7: iteration 97390/ 173500 | consumed samples: 24931840 | consumed tokens: 51060408320 | elapsed time per iteration (s): 0.43 | learning rate: 9.400E-05 | global batch size: 256 | lm loss: 2.917278E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.263 | TFLOPs: 31.49 | +7: iteration 97400/ 173500 | consumed samples: 24934400 | consumed tokens: 51065651200 | elapsed time per iteration (s): 0.42 | learning rate: 9.398E-05 | global batch size: 256 | lm loss: 2.912406E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.976 | TFLOPs: 31.79 | +7: iteration 97410/ 173500 | consumed samples: 24936960 | consumed tokens: 51070894080 | elapsed time per iteration (s): 0.42 | learning rate: 9.397E-05 | global batch size: 256 | lm loss: 2.926706E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.169 | TFLOPs: 31.70 | +7: iteration 97420/ 173500 | consumed samples: 24939520 | consumed tokens: 51076136960 | elapsed time per iteration (s): 0.42 | learning rate: 9.395E-05 | global batch size: 256 | lm loss: 2.934980E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.636 | TFLOPs: 31.88 | +7: iteration 97430/ 173500 | consumed samples: 24942080 | consumed tokens: 51081379840 | elapsed time per iteration (s): 0.42 | learning rate: 9.393E-05 | global batch size: 256 | lm loss: 2.932251E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.916 | TFLOPs: 31.74 | +7: iteration 97440/ 173500 | consumed samples: 24944640 | consumed tokens: 51086622720 | elapsed time per iteration (s): 0.42 | learning rate: 9.392E-05 | global batch size: 256 | lm loss: 2.915026E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.748 | TFLOPs: 31.89 | +7: iteration 97450/ 173500 | consumed samples: 24947200 | consumed tokens: 51091865600 | elapsed time per iteration (s): 0.42 | learning rate: 9.390E-05 | global batch size: 256 | lm loss: 2.924581E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.379 | TFLOPs: 31.76 | +7: iteration 97460/ 173500 | consumed samples: 24949760 | consumed tokens: 51097108480 | elapsed time per iteration (s): 0.43 | learning rate: 9.389E-05 | global batch size: 256 | lm loss: 2.920601E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.529 | TFLOPs: 31.46 | +7: iteration 97470/ 173500 | consumed samples: 24952320 | consumed tokens: 51102351360 | elapsed time per iteration (s): 0.42 | learning rate: 9.387E-05 | global batch size: 256 | lm loss: 2.929865E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.411 | TFLOPs: 31.76 | +7: iteration 97480/ 173500 | consumed samples: 24954880 | consumed tokens: 51107594240 | elapsed time per iteration (s): 0.43 | learning rate: 9.385E-05 | global batch size: 256 | lm loss: 2.928428E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.039 | TFLOPs: 31.06 | +7: iteration 97490/ 173500 | consumed samples: 24957440 | consumed tokens: 51112837120 | elapsed time per iteration (s): 0.42 | learning rate: 9.384E-05 | global batch size: 256 | lm loss: 2.924210E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.170 | TFLOPs: 31.91 | +7: iteration 97500/ 173500 | consumed samples: 24960000 | consumed tokens: 51118080000 | elapsed time per iteration (s): 0.42 | learning rate: 9.382E-05 | global batch size: 256 | lm loss: 2.927897E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.593 | TFLOPs: 31.93 | +7: iteration 97510/ 173500 | consumed samples: 24962560 | consumed tokens: 51123322880 | elapsed time per iteration (s): 0.42 | learning rate: 9.381E-05 | global batch size: 256 | lm loss: 2.917898E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.216 | TFLOPs: 31.91 | +7: iteration 97520/ 173500 | consumed samples: 24965120 | consumed tokens: 51128565760 | elapsed time per iteration (s): 0.42 | learning rate: 9.379E-05 | global batch size: 256 | lm loss: 2.919390E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.830 | TFLOPs: 31.89 | +7: iteration 97530/ 173500 | consumed samples: 24967680 | consumed tokens: 51133808640 | elapsed time per iteration (s): 0.43 | learning rate: 9.377E-05 | global batch size: 256 | lm loss: 2.918358E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.297 | TFLOPs: 30.97 | +7: iteration 97540/ 173500 | consumed samples: 24970240 | consumed tokens: 51139051520 | elapsed time per iteration (s): 0.43 | learning rate: 9.376E-05 | global batch size: 256 | lm loss: 2.916848E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.670 | TFLOPs: 31.52 | +7: iteration 97550/ 173500 | consumed samples: 24972800 | consumed tokens: 51144294400 | elapsed time per iteration (s): 0.42 | learning rate: 9.374E-05 | global batch size: 256 | lm loss: 2.922533E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.670 | TFLOPs: 31.94 | +7: iteration 97560/ 173500 | consumed samples: 24975360 | consumed tokens: 51149537280 | elapsed time per iteration (s): 0.42 | learning rate: 9.372E-05 | global batch size: 256 | lm loss: 2.920436E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.022 | TFLOPs: 31.90 | +7: iteration 97570/ 173500 | consumed samples: 24977920 | consumed tokens: 51154780160 | elapsed time per iteration (s): 0.42 | learning rate: 9.371E-05 | global batch size: 256 | lm loss: 2.918338E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.048 | TFLOPs: 31.90 | +7: iteration 97580/ 173500 | consumed samples: 24980480 | consumed tokens: 51160023040 | elapsed time per iteration (s): 0.43 | learning rate: 9.369E-05 | global batch size: 256 | lm loss: 2.922521E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.523 | TFLOPs: 31.46 | +7: iteration 97590/ 173500 | consumed samples: 24983040 | consumed tokens: 51165265920 | elapsed time per iteration (s): 0.45 | learning rate: 9.368E-05 | global batch size: 256 | lm loss: 2.921809E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.785 | TFLOPs: 30.05 | +7: iteration 97600/ 173500 | consumed samples: 24985600 | consumed tokens: 51170508800 | elapsed time per iteration (s): 0.43 | learning rate: 9.366E-05 | global batch size: 256 | lm loss: 2.923470E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.398 | TFLOPs: 31.50 | +7: iteration 97610/ 173500 | consumed samples: 24988160 | consumed tokens: 51175751680 | elapsed time per iteration (s): 0.42 | learning rate: 9.364E-05 | global batch size: 256 | lm loss: 2.928592E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.135 | TFLOPs: 31.70 | +7: iteration 97620/ 173500 | consumed samples: 24990720 | consumed tokens: 51180994560 | elapsed time per iteration (s): 0.42 | learning rate: 9.363E-05 | global batch size: 256 | lm loss: 2.922506E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.889 | TFLOPs: 31.95 | +7: iteration 97630/ 173500 | consumed samples: 24993280 | consumed tokens: 51186237440 | elapsed time per iteration (s): 0.42 | learning rate: 9.361E-05 | global batch size: 256 | lm loss: 2.917284E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.228 | TFLOPs: 31.70 | +7: iteration 97640/ 173500 | consumed samples: 24995840 | consumed tokens: 51191480320 | elapsed time per iteration (s): 0.42 | learning rate: 9.359E-05 | global batch size: 256 | lm loss: 2.921392E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.035 | TFLOPs: 31.64 | +7: iteration 97650/ 173500 | consumed samples: 24998400 | consumed tokens: 51196723200 | elapsed time per iteration (s): 0.42 | learning rate: 9.358E-05 | global batch size: 256 | lm loss: 2.929065E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.714 | TFLOPs: 31.89 | +7: iteration 97660/ 173500 | consumed samples: 25000960 | consumed tokens: 51201966080 | elapsed time per iteration (s): 0.42 | learning rate: 9.356E-05 | global batch size: 256 | lm loss: 2.919816E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.606 | TFLOPs: 31.88 | +7: iteration 97670/ 173500 | consumed samples: 25003520 | consumed tokens: 51207208960 | elapsed time per iteration (s): 0.42 | learning rate: 9.355E-05 | global batch size: 256 | lm loss: 2.940476E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.587 | TFLOPs: 31.62 | +7: iteration 97680/ 173500 | consumed samples: 25006080 | consumed tokens: 51212451840 | elapsed time per iteration (s): 0.42 | learning rate: 9.353E-05 | global batch size: 256 | lm loss: 2.930352E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.865 | TFLOPs: 31.68 | +7: iteration 97690/ 173500 | consumed samples: 25008640 | consumed tokens: 51217694720 | elapsed time per iteration (s): 0.43 | learning rate: 9.351E-05 | global batch size: 256 | lm loss: 2.911058E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.111 | TFLOPs: 31.59 | +7: iteration 97700/ 173500 | consumed samples: 25011200 | consumed tokens: 51222937600 | elapsed time per iteration (s): 0.42 | learning rate: 9.350E-05 | global batch size: 256 | lm loss: 2.922848E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.257 | TFLOPs: 31.91 | +7: iteration 97710/ 173500 | consumed samples: 25013760 | consumed tokens: 51228180480 | elapsed time per iteration (s): 0.42 | learning rate: 9.348E-05 | global batch size: 256 | lm loss: 2.921242E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.939 | TFLOPs: 31.90 | +7: iteration 97720/ 173500 | consumed samples: 25016320 | consumed tokens: 51233423360 | elapsed time per iteration (s): 0.43 | learning rate: 9.347E-05 | global batch size: 256 | lm loss: 2.919595E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.239 | TFLOPs: 30.97 | +7: iteration 97730/ 173500 | consumed samples: 25018880 | consumed tokens: 51238666240 | elapsed time per iteration (s): 0.43 | learning rate: 9.345E-05 | global batch size: 256 | lm loss: 2.930225E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.590 | TFLOPs: 31.51 | +7: iteration 97740/ 173500 | consumed samples: 25021440 | consumed tokens: 51243909120 | elapsed time per iteration (s): 0.43 | learning rate: 9.343E-05 | global batch size: 256 | lm loss: 2.928435E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.363 | TFLOPs: 31.03 | +7: iteration 97750/ 173500 | consumed samples: 25024000 | consumed tokens: 51249152000 | elapsed time per iteration (s): 0.42 | learning rate: 9.342E-05 | global batch size: 256 | lm loss: 2.926498E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.876 | TFLOPs: 31.95 | +7: iteration 97760/ 173500 | consumed samples: 25026560 | consumed tokens: 51254394880 | elapsed time per iteration (s): 0.43 | learning rate: 9.340E-05 | global batch size: 256 | lm loss: 2.917003E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.685 | TFLOPs: 31.04 | +7: iteration 97770/ 173500 | consumed samples: 25029120 | consumed tokens: 51259637760 | elapsed time per iteration (s): 0.42 | learning rate: 9.338E-05 | global batch size: 256 | lm loss: 2.924610E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.837 | TFLOPs: 31.63 | +7: iteration 97780/ 173500 | consumed samples: 25031680 | consumed tokens: 51264880640 | elapsed time per iteration (s): 0.42 | learning rate: 9.337E-05 | global batch size: 256 | lm loss: 2.928788E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.125 | TFLOPs: 31.96 | +7: iteration 97790/ 173500 | consumed samples: 25034240 | consumed tokens: 51270123520 | elapsed time per iteration (s): 0.42 | learning rate: 9.335E-05 | global batch size: 256 | lm loss: 2.913615E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.512 | TFLOPs: 31.72 | +7: iteration 97800/ 173500 | consumed samples: 25036800 | consumed tokens: 51275366400 | elapsed time per iteration (s): 0.42 | learning rate: 9.334E-05 | global batch size: 256 | lm loss: 2.921667E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.122 | TFLOPs: 31.91 | +7: iteration 97810/ 173500 | consumed samples: 25039360 | consumed tokens: 51280609280 | elapsed time per iteration (s): 0.42 | learning rate: 9.332E-05 | global batch size: 256 | lm loss: 2.928632E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.396 | TFLOPs: 31.92 | +7: iteration 97820/ 173500 | consumed samples: 25041920 | consumed tokens: 51285852160 | elapsed time per iteration (s): 0.42 | learning rate: 9.330E-05 | global batch size: 256 | lm loss: 2.924384E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.435 | TFLOPs: 31.66 | +7: iteration 97830/ 173500 | consumed samples: 25044480 | consumed tokens: 51291095040 | elapsed time per iteration (s): 0.42 | learning rate: 9.329E-05 | global batch size: 256 | lm loss: 2.929367E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.519 | TFLOPs: 31.93 | +7: iteration 97840/ 173500 | consumed samples: 25047040 | consumed tokens: 51296337920 | elapsed time per iteration (s): 0.42 | learning rate: 9.327E-05 | global batch size: 256 | lm loss: 2.925123E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.201 | TFLOPs: 31.65 | +7: iteration 97850/ 173500 | consumed samples: 25049600 | consumed tokens: 51301580800 | elapsed time per iteration (s): 0.42 | learning rate: 9.325E-05 | global batch size: 256 | lm loss: 2.929314E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.429 | TFLOPs: 31.77 | +7: iteration 97860/ 173500 | consumed samples: 25052160 | consumed tokens: 51306823680 | elapsed time per iteration (s): 0.42 | learning rate: 9.324E-05 | global batch size: 256 | lm loss: 2.913265E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.287 | TFLOPs: 31.97 | +7: iteration 97870/ 173500 | consumed samples: 25054720 | consumed tokens: 51312066560 | elapsed time per iteration (s): 0.42 | learning rate: 9.322E-05 | global batch size: 256 | lm loss: 2.934178E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.805 | TFLOPs: 31.63 | +7: iteration 97880/ 173500 | consumed samples: 25057280 | consumed tokens: 51317309440 | elapsed time per iteration (s): 0.43 | learning rate: 9.321E-05 | global batch size: 256 | lm loss: 2.912644E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.148 | TFLOPs: 31.12 | +7: iteration 97890/ 173500 | consumed samples: 25059840 | consumed tokens: 51322552320 | elapsed time per iteration (s): 0.46 | learning rate: 9.319E-05 | global batch size: 256 | lm loss: 2.935794E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.878 | TFLOPs: 29.17 | +7: iteration 97900/ 173500 | consumed samples: 25062400 | consumed tokens: 51327795200 | elapsed time per iteration (s): 0.43 | learning rate: 9.317E-05 | global batch size: 256 | lm loss: 2.932972E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.349 | TFLOPs: 31.18 | +7: iteration 97910/ 173500 | consumed samples: 25064960 | consumed tokens: 51333038080 | elapsed time per iteration (s): 0.42 | learning rate: 9.316E-05 | global batch size: 256 | lm loss: 2.937012E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.056 | TFLOPs: 31.85 | +7: iteration 97920/ 173500 | consumed samples: 25067520 | consumed tokens: 51338280960 | elapsed time per iteration (s): 0.43 | learning rate: 9.314E-05 | global batch size: 256 | lm loss: 2.919849E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.374 | TFLOPs: 31.13 | +7: iteration 97930/ 173500 | consumed samples: 25070080 | consumed tokens: 51343523840 | elapsed time per iteration (s): 0.44 | learning rate: 9.313E-05 | global batch size: 256 | lm loss: 2.930608E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.833 | TFLOPs: 30.32 | +7: iteration 97940/ 173500 | consumed samples: 25072640 | consumed tokens: 51348766720 | elapsed time per iteration (s): 0.44 | learning rate: 9.311E-05 | global batch size: 256 | lm loss: 2.920090E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.054 | TFLOPs: 30.28 | +7: iteration 97950/ 173500 | consumed samples: 25075200 | consumed tokens: 51354009600 | elapsed time per iteration (s): 0.47 | learning rate: 9.309E-05 | global batch size: 256 | lm loss: 2.923466E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 547.667 | TFLOPs: 28.74 | +7: iteration 97960/ 173500 | consumed samples: 25077760 | consumed tokens: 51359252480 | elapsed time per iteration (s): 0.43 | learning rate: 9.308E-05 | global batch size: 256 | lm loss: 2.919667E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.946 | TFLOPs: 31.22 | +7: iteration 97970/ 173500 | consumed samples: 25080320 | consumed tokens: 51364495360 | elapsed time per iteration (s): 0.44 | learning rate: 9.306E-05 | global batch size: 256 | lm loss: 2.918537E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.246 | TFLOPs: 30.39 | +7: iteration 97980/ 173500 | consumed samples: 25082880 | consumed tokens: 51369738240 | elapsed time per iteration (s): 0.46 | learning rate: 9.304E-05 | global batch size: 256 | lm loss: 2.915992E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.027 | TFLOPs: 29.33 | +7: iteration 97990/ 173500 | consumed samples: 25085440 | consumed tokens: 51374981120 | elapsed time per iteration (s): 0.42 | learning rate: 9.303E-05 | global batch size: 256 | lm loss: 2.913201E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.441 | TFLOPs: 32.13 | +0: [2023-03-17 10:49:02,148] [INFO] [logging.py:68:log_dist] [Rank 0] step=98000, skipped=0, lr=[9.301234885879047e-05, 9.301234885879047e-05, 9.301234885879047e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 98000/ 173500 | consumed samples: 25088000 | consumed tokens: 51380224000 | elapsed time per iteration (s): 0.46 | learning rate: 9.301E-05 | global batch size: 256 | lm loss: 2.927502E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.027 | TFLOPs: 29.07 | +0: steps: 98000 loss: 2.9073 iter time (s): 0.426 samples/sec: 600.929 +7: iteration 98010/ 173500 | consumed samples: 25090560 | consumed tokens: 51385466880 | elapsed time per iteration (s): 0.44 | learning rate: 9.300E-05 | global batch size: 256 | lm loss: 2.919745E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.583 | TFLOPs: 30.36 | +7: iteration 98020/ 173500 | consumed samples: 25093120 | consumed tokens: 51390709760 | elapsed time per iteration (s): 0.44 | learning rate: 9.298E-05 | global batch size: 256 | lm loss: 2.930387E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.975 | TFLOPs: 30.27 | +7: iteration 98030/ 173500 | consumed samples: 25095680 | consumed tokens: 51395952640 | elapsed time per iteration (s): 0.46 | learning rate: 9.296E-05 | global batch size: 256 | lm loss: 2.928629E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.720 | TFLOPs: 29.37 | +7: iteration 98040/ 173500 | consumed samples: 25098240 | consumed tokens: 51401195520 | elapsed time per iteration (s): 0.45 | learning rate: 9.295E-05 | global batch size: 256 | lm loss: 2.917337E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.134 | TFLOPs: 30.07 | +7: iteration 98050/ 173500 | consumed samples: 25100800 | consumed tokens: 51406438400 | elapsed time per iteration (s): 0.49 | learning rate: 9.293E-05 | global batch size: 256 | lm loss: 2.936897E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 526.240 | TFLOPs: 27.61 | +7: iteration 98060/ 173500 | consumed samples: 25103360 | consumed tokens: 51411681280 | elapsed time per iteration (s): 0.47 | learning rate: 9.292E-05 | global batch size: 256 | lm loss: 2.924302E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 543.782 | TFLOPs: 28.53 | +7: iteration 98070/ 173500 | consumed samples: 25105920 | consumed tokens: 51416924160 | elapsed time per iteration (s): 0.47 | learning rate: 9.290E-05 | global batch size: 256 | lm loss: 2.920800E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 548.935 | TFLOPs: 28.80 | +7: iteration 98080/ 173500 | consumed samples: 25108480 | consumed tokens: 51422167040 | elapsed time per iteration (s): 0.43 | learning rate: 9.288E-05 | global batch size: 256 | lm loss: 2.924772E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.996 | TFLOPs: 31.38 | +7: iteration 98090/ 173500 | consumed samples: 25111040 | consumed tokens: 51427409920 | elapsed time per iteration (s): 0.45 | learning rate: 9.287E-05 | global batch size: 256 | lm loss: 2.930832E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.195 | TFLOPs: 30.18 | +7: iteration 98100/ 173500 | consumed samples: 25113600 | consumed tokens: 51432652800 | elapsed time per iteration (s): 0.43 | learning rate: 9.285E-05 | global batch size: 256 | lm loss: 2.921273E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.700 | TFLOPs: 31.10 | +7: iteration 98110/ 173500 | consumed samples: 25116160 | consumed tokens: 51437895680 | elapsed time per iteration (s): 0.43 | learning rate: 9.283E-05 | global batch size: 256 | lm loss: 2.912519E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.160 | TFLOPs: 31.17 | +7: iteration 98120/ 173500 | consumed samples: 25118720 | consumed tokens: 51443138560 | elapsed time per iteration (s): 0.42 | learning rate: 9.282E-05 | global batch size: 256 | lm loss: 2.940826E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.616 | TFLOPs: 32.14 | +7: iteration 98130/ 173500 | consumed samples: 25121280 | consumed tokens: 51448381440 | elapsed time per iteration (s): 0.42 | learning rate: 9.280E-05 | global batch size: 256 | lm loss: 2.915881E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.496 | TFLOPs: 32.14 | +7: iteration 98140/ 173500 | consumed samples: 25123840 | consumed tokens: 51453624320 | elapsed time per iteration (s): 0.42 | learning rate: 9.279E-05 | global batch size: 256 | lm loss: 2.930639E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.077 | TFLOPs: 32.11 | +7: iteration 98150/ 173500 | consumed samples: 25126400 | consumed tokens: 51458867200 | elapsed time per iteration (s): 0.42 | learning rate: 9.277E-05 | global batch size: 256 | lm loss: 2.929805E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.833 | TFLOPs: 32.10 | +7: iteration 98160/ 173500 | consumed samples: 25128960 | consumed tokens: 51464110080 | elapsed time per iteration (s): 0.43 | learning rate: 9.275E-05 | global batch size: 256 | lm loss: 2.929099E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.997 | TFLOPs: 31.27 | +7: iteration 98170/ 173500 | consumed samples: 25131520 | consumed tokens: 51469352960 | elapsed time per iteration (s): 0.42 | learning rate: 9.274E-05 | global batch size: 256 | lm loss: 2.918437E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.583 | TFLOPs: 32.09 | +7: iteration 98180/ 173500 | consumed samples: 25134080 | consumed tokens: 51474595840 | elapsed time per iteration (s): 0.42 | learning rate: 9.272E-05 | global batch size: 256 | lm loss: 2.915192E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.867 | TFLOPs: 31.84 | +7: iteration 98190/ 173500 | consumed samples: 25136640 | consumed tokens: 51479838720 | elapsed time per iteration (s): 0.42 | learning rate: 9.271E-05 | global batch size: 256 | lm loss: 2.918659E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.596 | TFLOPs: 31.83 | +7: iteration 98200/ 173500 | consumed samples: 25139200 | consumed tokens: 51485081600 | elapsed time per iteration (s): 0.43 | learning rate: 9.269E-05 | global batch size: 256 | lm loss: 2.929933E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.721 | TFLOPs: 30.94 | +7: iteration 98210/ 173500 | consumed samples: 25141760 | consumed tokens: 51490324480 | elapsed time per iteration (s): 0.42 | learning rate: 9.267E-05 | global batch size: 256 | lm loss: 2.923124E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.207 | TFLOPs: 32.12 | +7: iteration 98220/ 173500 | consumed samples: 25144320 | consumed tokens: 51495567360 | elapsed time per iteration (s): 0.42 | learning rate: 9.266E-05 | global batch size: 256 | lm loss: 2.929562E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.521 | TFLOPs: 32.09 | +7: iteration 98230/ 173500 | consumed samples: 25146880 | consumed tokens: 51500810240 | elapsed time per iteration (s): 0.42 | learning rate: 9.264E-05 | global batch size: 256 | lm loss: 2.913253E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.683 | TFLOPs: 32.09 | +7: iteration 98240/ 173500 | consumed samples: 25149440 | consumed tokens: 51506053120 | elapsed time per iteration (s): 0.42 | learning rate: 9.262E-05 | global batch size: 256 | lm loss: 2.907998E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.045 | TFLOPs: 32.06 | +7: iteration 98250/ 173500 | consumed samples: 25152000 | consumed tokens: 51511296000 | elapsed time per iteration (s): 0.42 | learning rate: 9.261E-05 | global batch size: 256 | lm loss: 2.921676E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.909 | TFLOPs: 32.05 | +7: iteration 98260/ 173500 | consumed samples: 25154560 | consumed tokens: 51516538880 | elapsed time per iteration (s): 0.42 | learning rate: 9.259E-05 | global batch size: 256 | lm loss: 2.923609E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.031 | TFLOPs: 31.90 | +7: iteration 98270/ 173500 | consumed samples: 25157120 | consumed tokens: 51521781760 | elapsed time per iteration (s): 0.42 | learning rate: 9.258E-05 | global batch size: 256 | lm loss: 2.918344E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.012 | TFLOPs: 32.06 | +7: iteration 98280/ 173500 | consumed samples: 25159680 | consumed tokens: 51527024640 | elapsed time per iteration (s): 0.42 | learning rate: 9.256E-05 | global batch size: 256 | lm loss: 2.919316E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.319 | TFLOPs: 32.02 | +7: iteration 98290/ 173500 | consumed samples: 25162240 | consumed tokens: 51532267520 | elapsed time per iteration (s): 0.42 | learning rate: 9.254E-05 | global batch size: 256 | lm loss: 2.917026E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.535 | TFLOPs: 31.72 | +7: iteration 98300/ 173500 | consumed samples: 25164800 | consumed tokens: 51537510400 | elapsed time per iteration (s): 0.42 | learning rate: 9.253E-05 | global batch size: 256 | lm loss: 2.919632E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.930 | TFLOPs: 31.90 | +7: iteration 98310/ 173500 | consumed samples: 25167360 | consumed tokens: 51542753280 | elapsed time per iteration (s): 0.42 | learning rate: 9.251E-05 | global batch size: 256 | lm loss: 2.925322E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.502 | TFLOPs: 31.77 | +7: iteration 98320/ 173500 | consumed samples: 25169920 | consumed tokens: 51547996160 | elapsed time per iteration (s): 0.42 | learning rate: 9.250E-05 | global batch size: 256 | lm loss: 2.910106E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.768 | TFLOPs: 31.78 | +7: iteration 98330/ 173500 | consumed samples: 25172480 | consumed tokens: 51553239040 | elapsed time per iteration (s): 0.42 | learning rate: 9.248E-05 | global batch size: 256 | lm loss: 2.935821E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.331 | TFLOPs: 31.87 | +7: iteration 98340/ 173500 | consumed samples: 25175040 | consumed tokens: 51558481920 | elapsed time per iteration (s): 0.43 | learning rate: 9.246E-05 | global batch size: 256 | lm loss: 2.918942E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.953 | TFLOPs: 31.48 | +7: iteration 98350/ 173500 | consumed samples: 25177600 | consumed tokens: 51563724800 | elapsed time per iteration (s): 0.42 | learning rate: 9.245E-05 | global batch size: 256 | lm loss: 2.921556E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.724 | TFLOPs: 31.89 | +7: iteration 98360/ 173500 | consumed samples: 25180160 | consumed tokens: 51568967680 | elapsed time per iteration (s): 0.44 | learning rate: 9.243E-05 | global batch size: 256 | lm loss: 2.933422E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.402 | TFLOPs: 30.82 | +7: iteration 98370/ 173500 | consumed samples: 25182720 | consumed tokens: 51574210560 | elapsed time per iteration (s): 0.42 | learning rate: 9.241E-05 | global batch size: 256 | lm loss: 2.923155E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.574 | TFLOPs: 31.67 | +7: iteration 98380/ 173500 | consumed samples: 25185280 | consumed tokens: 51579453440 | elapsed time per iteration (s): 0.42 | learning rate: 9.240E-05 | global batch size: 256 | lm loss: 2.923509E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.639 | TFLOPs: 31.88 | +7: iteration 98390/ 173500 | consumed samples: 25187840 | consumed tokens: 51584696320 | elapsed time per iteration (s): 0.42 | learning rate: 9.238E-05 | global batch size: 256 | lm loss: 2.911474E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.016 | TFLOPs: 32.06 | +7: iteration 98400/ 173500 | consumed samples: 25190400 | consumed tokens: 51589939200 | elapsed time per iteration (s): 0.43 | learning rate: 9.237E-05 | global batch size: 256 | lm loss: 2.932558E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.102 | TFLOPs: 31.59 | +7: iteration 98410/ 173500 | consumed samples: 25192960 | consumed tokens: 51595182080 | elapsed time per iteration (s): 0.42 | learning rate: 9.235E-05 | global batch size: 256 | lm loss: 2.926646E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.917 | TFLOPs: 32.05 | +7: iteration 98420/ 173500 | consumed samples: 25195520 | consumed tokens: 51600424960 | elapsed time per iteration (s): 0.42 | learning rate: 9.233E-05 | global batch size: 256 | lm loss: 2.912775E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.760 | TFLOPs: 32.05 | +7: iteration 98430/ 173500 | consumed samples: 25198080 | consumed tokens: 51605667840 | elapsed time per iteration (s): 0.42 | learning rate: 9.232E-05 | global batch size: 256 | lm loss: 2.922260E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.211 | TFLOPs: 32.02 | +7: iteration 98440/ 173500 | consumed samples: 25200640 | consumed tokens: 51610910720 | elapsed time per iteration (s): 0.42 | learning rate: 9.230E-05 | global batch size: 256 | lm loss: 2.921036E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.680 | TFLOPs: 32.04 | +7: iteration 98450/ 173500 | consumed samples: 25203200 | consumed tokens: 51616153600 | elapsed time per iteration (s): 0.42 | learning rate: 9.229E-05 | global batch size: 256 | lm loss: 2.905605E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.392 | TFLOPs: 32.03 | +7: iteration 98460/ 173500 | consumed samples: 25205760 | consumed tokens: 51621396480 | elapsed time per iteration (s): 0.42 | learning rate: 9.227E-05 | global batch size: 256 | lm loss: 2.906063E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.876 | TFLOPs: 31.74 | +7: iteration 98470/ 173500 | consumed samples: 25208320 | consumed tokens: 51626639360 | elapsed time per iteration (s): 0.45 | learning rate: 9.225E-05 | global batch size: 256 | lm loss: 2.921049E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.684 | TFLOPs: 29.58 | +7: iteration 98480/ 173500 | consumed samples: 25210880 | consumed tokens: 51631882240 | elapsed time per iteration (s): 0.42 | learning rate: 9.224E-05 | global batch size: 256 | lm loss: 2.899400E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.518 | TFLOPs: 32.09 | +7: iteration 98490/ 173500 | consumed samples: 25213440 | consumed tokens: 51637125120 | elapsed time per iteration (s): 0.42 | learning rate: 9.222E-05 | global batch size: 256 | lm loss: 2.930143E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.367 | TFLOPs: 32.08 | +7: iteration 98500/ 173500 | consumed samples: 25216000 | consumed tokens: 51642368000 | elapsed time per iteration (s): 0.42 | learning rate: 9.220E-05 | global batch size: 256 | lm loss: 2.928596E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.752 | TFLOPs: 31.89 | +7: iteration 98510/ 173500 | consumed samples: 25218560 | consumed tokens: 51647610880 | elapsed time per iteration (s): 0.42 | learning rate: 9.219E-05 | global batch size: 256 | lm loss: 2.929461E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.344 | TFLOPs: 31.76 | +7: iteration 98520/ 173500 | consumed samples: 25221120 | consumed tokens: 51652853760 | elapsed time per iteration (s): 0.42 | learning rate: 9.217E-05 | global batch size: 256 | lm loss: 2.932563E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.507 | TFLOPs: 32.03 | +7: iteration 98530/ 173500 | consumed samples: 25223680 | consumed tokens: 51658096640 | elapsed time per iteration (s): 0.42 | learning rate: 9.216E-05 | global batch size: 256 | lm loss: 2.918090E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.308 | TFLOPs: 32.02 | +7: iteration 98540/ 173500 | consumed samples: 25226240 | consumed tokens: 51663339520 | elapsed time per iteration (s): 0.42 | learning rate: 9.214E-05 | global batch size: 256 | lm loss: 2.932499E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.372 | TFLOPs: 32.03 | +7: iteration 98550/ 173500 | consumed samples: 25228800 | consumed tokens: 51668582400 | elapsed time per iteration (s): 0.42 | learning rate: 9.212E-05 | global batch size: 256 | lm loss: 2.916512E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.074 | TFLOPs: 32.01 | +7: iteration 98560/ 173500 | consumed samples: 25231360 | consumed tokens: 51673825280 | elapsed time per iteration (s): 0.42 | learning rate: 9.211E-05 | global batch size: 256 | lm loss: 2.912017E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.327 | TFLOPs: 32.02 | +7: iteration 98570/ 173500 | consumed samples: 25233920 | consumed tokens: 51679068160 | elapsed time per iteration (s): 0.42 | learning rate: 9.209E-05 | global batch size: 256 | lm loss: 2.931897E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.584 | TFLOPs: 31.98 | +7: iteration 98580/ 173500 | consumed samples: 25236480 | consumed tokens: 51684311040 | elapsed time per iteration (s): 0.42 | learning rate: 9.208E-05 | global batch size: 256 | lm loss: 2.920387E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.689 | TFLOPs: 31.99 | +7: iteration 98590/ 173500 | consumed samples: 25239040 | consumed tokens: 51689553920 | elapsed time per iteration (s): 0.42 | learning rate: 9.206E-05 | global batch size: 256 | lm loss: 2.932772E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.422 | TFLOPs: 31.71 | +7: iteration 98600/ 173500 | consumed samples: 25241600 | consumed tokens: 51694796800 | elapsed time per iteration (s): 0.42 | learning rate: 9.204E-05 | global batch size: 256 | lm loss: 2.940437E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.623 | TFLOPs: 31.99 | +7: iteration 98610/ 173500 | consumed samples: 25244160 | consumed tokens: 51700039680 | elapsed time per iteration (s): 0.42 | learning rate: 9.203E-05 | global batch size: 256 | lm loss: 2.932883E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.405 | TFLOPs: 31.97 | +7: iteration 98620/ 173500 | consumed samples: 25246720 | consumed tokens: 51705282560 | elapsed time per iteration (s): 0.42 | learning rate: 9.201E-05 | global batch size: 256 | lm loss: 2.923935E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.520 | TFLOPs: 31.61 | +7: iteration 98630/ 173500 | consumed samples: 25249280 | consumed tokens: 51710525440 | elapsed time per iteration (s): 0.42 | learning rate: 9.200E-05 | global batch size: 256 | lm loss: 2.926259E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.064 | TFLOPs: 32.01 | +7: iteration 98640/ 173500 | consumed samples: 25251840 | consumed tokens: 51715768320 | elapsed time per iteration (s): 0.42 | learning rate: 9.198E-05 | global batch size: 256 | lm loss: 2.916867E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.694 | TFLOPs: 31.99 | +7: iteration 98650/ 173500 | consumed samples: 25254400 | consumed tokens: 51721011200 | elapsed time per iteration (s): 0.42 | learning rate: 9.196E-05 | global batch size: 256 | lm loss: 2.928547E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.372 | TFLOPs: 31.97 | +7: iteration 98660/ 173500 | consumed samples: 25256960 | consumed tokens: 51726254080 | elapsed time per iteration (s): 0.42 | learning rate: 9.195E-05 | global batch size: 256 | lm loss: 2.924560E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.187 | TFLOPs: 31.96 | +7: iteration 98670/ 173500 | consumed samples: 25259520 | consumed tokens: 51731496960 | elapsed time per iteration (s): 0.42 | learning rate: 9.193E-05 | global batch size: 256 | lm loss: 2.923209E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.900 | TFLOPs: 31.95 | +7: iteration 98680/ 173500 | consumed samples: 25262080 | consumed tokens: 51736739840 | elapsed time per iteration (s): 0.42 | learning rate: 9.191E-05 | global batch size: 256 | lm loss: 2.925701E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.841 | TFLOPs: 31.94 | +7: iteration 98690/ 173500 | consumed samples: 25264640 | consumed tokens: 51741982720 | elapsed time per iteration (s): 0.42 | learning rate: 9.190E-05 | global batch size: 256 | lm loss: 2.910943E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.758 | TFLOPs: 31.94 | +7: iteration 98700/ 173500 | consumed samples: 25267200 | consumed tokens: 51747225600 | elapsed time per iteration (s): 0.42 | learning rate: 9.188E-05 | global batch size: 256 | lm loss: 2.930728E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.013 | TFLOPs: 31.74 | +7: iteration 98710/ 173500 | consumed samples: 25269760 | consumed tokens: 51752468480 | elapsed time per iteration (s): 0.42 | learning rate: 9.187E-05 | global batch size: 256 | lm loss: 2.914697E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.142 | TFLOPs: 31.80 | +7: iteration 98720/ 173500 | consumed samples: 25272320 | consumed tokens: 51757711360 | elapsed time per iteration (s): 0.42 | learning rate: 9.185E-05 | global batch size: 256 | lm loss: 2.920313E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.960 | TFLOPs: 31.69 | +7: iteration 98730/ 173500 | consumed samples: 25274880 | consumed tokens: 51762954240 | elapsed time per iteration (s): 0.42 | learning rate: 9.183E-05 | global batch size: 256 | lm loss: 2.935985E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.261 | TFLOPs: 31.97 | +7: iteration 98740/ 173500 | consumed samples: 25277440 | consumed tokens: 51768197120 | elapsed time per iteration (s): 0.42 | learning rate: 9.182E-05 | global batch size: 256 | lm loss: 2.907866E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.117 | TFLOPs: 31.80 | +7: iteration 98750/ 173500 | consumed samples: 25280000 | consumed tokens: 51773440000 | elapsed time per iteration (s): 0.42 | learning rate: 9.180E-05 | global batch size: 256 | lm loss: 2.934190E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.911 | TFLOPs: 32.00 | +7: iteration 98760/ 173500 | consumed samples: 25282560 | consumed tokens: 51778682880 | elapsed time per iteration (s): 0.42 | learning rate: 9.179E-05 | global batch size: 256 | lm loss: 2.936648E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.596 | TFLOPs: 31.98 | +7: iteration 98770/ 173500 | consumed samples: 25285120 | consumed tokens: 51783925760 | elapsed time per iteration (s): 0.42 | learning rate: 9.177E-05 | global batch size: 256 | lm loss: 2.917605E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.305 | TFLOPs: 31.97 | +7: iteration 98780/ 173500 | consumed samples: 25287680 | consumed tokens: 51789168640 | elapsed time per iteration (s): 0.42 | learning rate: 9.175E-05 | global batch size: 256 | lm loss: 2.929558E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.573 | TFLOPs: 31.72 | +7: iteration 98790/ 173500 | consumed samples: 25290240 | consumed tokens: 51794411520 | elapsed time per iteration (s): 0.42 | learning rate: 9.174E-05 | global batch size: 256 | lm loss: 2.923095E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.799 | TFLOPs: 31.94 | +7: iteration 98800/ 173500 | consumed samples: 25292800 | consumed tokens: 51799654400 | elapsed time per iteration (s): 0.42 | learning rate: 9.172E-05 | global batch size: 256 | lm loss: 2.907551E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.971 | TFLOPs: 31.95 | +7: iteration 98810/ 173500 | consumed samples: 25295360 | consumed tokens: 51804897280 | elapsed time per iteration (s): 0.42 | learning rate: 9.170E-05 | global batch size: 256 | lm loss: 2.922634E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.924 | TFLOPs: 31.95 | +7: iteration 98820/ 173500 | consumed samples: 25297920 | consumed tokens: 51810140160 | elapsed time per iteration (s): 0.42 | learning rate: 9.169E-05 | global batch size: 256 | lm loss: 2.931982E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.458 | TFLOPs: 31.71 | +7: iteration 98830/ 173500 | consumed samples: 25300480 | consumed tokens: 51815383040 | elapsed time per iteration (s): 0.42 | learning rate: 9.167E-05 | global batch size: 256 | lm loss: 2.919071E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.656 | TFLOPs: 31.88 | +7: iteration 98840/ 173500 | consumed samples: 25303040 | consumed tokens: 51820625920 | elapsed time per iteration (s): 0.42 | learning rate: 9.166E-05 | global batch size: 256 | lm loss: 2.928871E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.507 | TFLOPs: 31.98 | +7: iteration 98850/ 173500 | consumed samples: 25305600 | consumed tokens: 51825868800 | elapsed time per iteration (s): 0.42 | learning rate: 9.164E-05 | global batch size: 256 | lm loss: 2.916876E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.216 | TFLOPs: 31.96 | +7: iteration 98860/ 173500 | consumed samples: 25308160 | consumed tokens: 51831111680 | elapsed time per iteration (s): 0.42 | learning rate: 9.162E-05 | global batch size: 256 | lm loss: 2.934819E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.926 | TFLOPs: 31.63 | +7: iteration 98870/ 173500 | consumed samples: 25310720 | consumed tokens: 51836354560 | elapsed time per iteration (s): 0.42 | learning rate: 9.161E-05 | global batch size: 256 | lm loss: 2.924112E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.623 | TFLOPs: 31.99 | +7: iteration 98880/ 173500 | consumed samples: 25313280 | consumed tokens: 51841597440 | elapsed time per iteration (s): 0.43 | learning rate: 9.159E-05 | global batch size: 256 | lm loss: 2.918717E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.467 | TFLOPs: 31.19 | +7: iteration 98890/ 173500 | consumed samples: 25315840 | consumed tokens: 51846840320 | elapsed time per iteration (s): 0.42 | learning rate: 9.158E-05 | global batch size: 256 | lm loss: 2.919981E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.219 | TFLOPs: 32.02 | +7: iteration 98900/ 173500 | consumed samples: 25318400 | consumed tokens: 51852083200 | elapsed time per iteration (s): 0.42 | learning rate: 9.156E-05 | global batch size: 256 | lm loss: 2.925548E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.907 | TFLOPs: 31.84 | +7: iteration 98910/ 173500 | consumed samples: 25320960 | consumed tokens: 51857326080 | elapsed time per iteration (s): 0.42 | learning rate: 9.154E-05 | global batch size: 256 | lm loss: 2.910097E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.065 | TFLOPs: 32.01 | +7: iteration 98920/ 173500 | consumed samples: 25323520 | consumed tokens: 51862568960 | elapsed time per iteration (s): 0.42 | learning rate: 9.153E-05 | global batch size: 256 | lm loss: 2.930104E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.482 | TFLOPs: 31.77 | +7: iteration 98930/ 173500 | consumed samples: 25326080 | consumed tokens: 51867811840 | elapsed time per iteration (s): 0.42 | learning rate: 9.151E-05 | global batch size: 256 | lm loss: 2.914722E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.707 | TFLOPs: 31.73 | +7: iteration 98940/ 173500 | consumed samples: 25328640 | consumed tokens: 51873054720 | elapsed time per iteration (s): 0.42 | learning rate: 9.150E-05 | global batch size: 256 | lm loss: 2.934510E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.735 | TFLOPs: 31.83 | +7: iteration 98950/ 173500 | consumed samples: 25331200 | consumed tokens: 51878297600 | elapsed time per iteration (s): 0.42 | learning rate: 9.148E-05 | global batch size: 256 | lm loss: 2.917569E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.057 | TFLOPs: 31.96 | +7: iteration 98960/ 173500 | consumed samples: 25333760 | consumed tokens: 51883540480 | elapsed time per iteration (s): 0.42 | learning rate: 9.146E-05 | global batch size: 256 | lm loss: 2.930359E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.059 | TFLOPs: 31.96 | +7: iteration 98970/ 173500 | consumed samples: 25336320 | consumed tokens: 51888783360 | elapsed time per iteration (s): 0.42 | learning rate: 9.145E-05 | global batch size: 256 | lm loss: 2.907252E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.998 | TFLOPs: 31.80 | +7: iteration 98980/ 173500 | consumed samples: 25338880 | consumed tokens: 51894026240 | elapsed time per iteration (s): 0.42 | learning rate: 9.143E-05 | global batch size: 256 | lm loss: 2.910865E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.195 | TFLOPs: 31.96 | +7: iteration 98990/ 173500 | consumed samples: 25341440 | consumed tokens: 51899269120 | elapsed time per iteration (s): 0.42 | learning rate: 9.141E-05 | global batch size: 256 | lm loss: 2.918067E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.193 | TFLOPs: 31.96 | +7: iteration 99000/ 173500 | consumed samples: 25344000 | consumed tokens: 51904512000 | elapsed time per iteration (s): 0.42 | learning rate: 9.140E-05 | global batch size: 256 | lm loss: 2.913466E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.965 | TFLOPs: 31.95 | +7: iteration 99010/ 173500 | consumed samples: 25346560 | consumed tokens: 51909754880 | elapsed time per iteration (s): 0.42 | learning rate: 9.138E-05 | global batch size: 256 | lm loss: 2.926916E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.915 | TFLOPs: 31.74 | +7: iteration 99020/ 173500 | consumed samples: 25349120 | consumed tokens: 51914997760 | elapsed time per iteration (s): 0.42 | learning rate: 9.137E-05 | global batch size: 256 | lm loss: 2.935341E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.366 | TFLOPs: 31.97 | +7: iteration 99030/ 173500 | consumed samples: 25351680 | consumed tokens: 51920240640 | elapsed time per iteration (s): 0.42 | learning rate: 9.135E-05 | global batch size: 256 | lm loss: 2.918548E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.836 | TFLOPs: 31.94 | +7: iteration 99040/ 173500 | consumed samples: 25354240 | consumed tokens: 51925483520 | elapsed time per iteration (s): 0.42 | learning rate: 9.133E-05 | global batch size: 256 | lm loss: 2.917682E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.808 | TFLOPs: 31.79 | +7: iteration 99050/ 173500 | consumed samples: 25356800 | consumed tokens: 51930726400 | elapsed time per iteration (s): 0.42 | learning rate: 9.132E-05 | global batch size: 256 | lm loss: 2.932532E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.534 | TFLOPs: 31.88 | +7: iteration 99060/ 173500 | consumed samples: 25359360 | consumed tokens: 51935969280 | elapsed time per iteration (s): 0.42 | learning rate: 9.130E-05 | global batch size: 256 | lm loss: 2.929309E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.508 | TFLOPs: 31.93 | +7: iteration 99070/ 173500 | consumed samples: 25361920 | consumed tokens: 51941212160 | elapsed time per iteration (s): 0.42 | learning rate: 9.129E-05 | global batch size: 256 | lm loss: 2.928274E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.435 | TFLOPs: 31.92 | +7: iteration 99080/ 173500 | consumed samples: 25364480 | consumed tokens: 51946455040 | elapsed time per iteration (s): 0.42 | learning rate: 9.127E-05 | global batch size: 256 | lm loss: 2.929361E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.304 | TFLOPs: 31.92 | +7: iteration 99090/ 173500 | consumed samples: 25367040 | consumed tokens: 51951697920 | elapsed time per iteration (s): 0.42 | learning rate: 9.125E-05 | global batch size: 256 | lm loss: 2.921138E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.533 | TFLOPs: 31.93 | +7: iteration 99100/ 173500 | consumed samples: 25369600 | consumed tokens: 51956940800 | elapsed time per iteration (s): 0.42 | learning rate: 9.124E-05 | global batch size: 256 | lm loss: 2.928604E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.973 | TFLOPs: 31.95 | +7: iteration 99110/ 173500 | consumed samples: 25372160 | consumed tokens: 51962183680 | elapsed time per iteration (s): 0.43 | learning rate: 9.122E-05 | global batch size: 256 | lm loss: 2.923997E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.671 | TFLOPs: 30.94 | +7: iteration 99120/ 173500 | consumed samples: 25374720 | consumed tokens: 51967426560 | elapsed time per iteration (s): 0.43 | learning rate: 9.121E-05 | global batch size: 256 | lm loss: 2.924833E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.189 | TFLOPs: 31.02 | +7: iteration 99130/ 173500 | consumed samples: 25377280 | consumed tokens: 51972669440 | elapsed time per iteration (s): 0.44 | learning rate: 9.119E-05 | global batch size: 256 | lm loss: 2.919734E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.589 | TFLOPs: 30.67 | +7: iteration 99140/ 173500 | consumed samples: 25379840 | consumed tokens: 51977912320 | elapsed time per iteration (s): 0.42 | learning rate: 9.117E-05 | global batch size: 256 | lm loss: 2.921675E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.942 | TFLOPs: 32.00 | +7: iteration 99150/ 173500 | consumed samples: 25382400 | consumed tokens: 51983155200 | elapsed time per iteration (s): 0.42 | learning rate: 9.116E-05 | global batch size: 256 | lm loss: 2.917649E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.822 | TFLOPs: 31.73 | +7: iteration 99160/ 173500 | consumed samples: 25384960 | consumed tokens: 51988398080 | elapsed time per iteration (s): 0.42 | learning rate: 9.114E-05 | global batch size: 256 | lm loss: 2.930234E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.766 | TFLOPs: 31.99 | +7: iteration 99170/ 173500 | consumed samples: 25387520 | consumed tokens: 51993640960 | elapsed time per iteration (s): 0.42 | learning rate: 9.113E-05 | global batch size: 256 | lm loss: 2.926030E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.193 | TFLOPs: 31.96 | +7: iteration 99180/ 173500 | consumed samples: 25390080 | consumed tokens: 51998883840 | elapsed time per iteration (s): 0.42 | learning rate: 9.111E-05 | global batch size: 256 | lm loss: 2.924602E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.630 | TFLOPs: 31.99 | +7: iteration 99190/ 173500 | consumed samples: 25392640 | consumed tokens: 52004126720 | elapsed time per iteration (s): 0.43 | learning rate: 9.109E-05 | global batch size: 256 | lm loss: 2.924322E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.354 | TFLOPs: 31.55 | +7: iteration 99200/ 173500 | consumed samples: 25395200 | consumed tokens: 52009369600 | elapsed time per iteration (s): 0.43 | learning rate: 9.108E-05 | global batch size: 256 | lm loss: 2.927085E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.285 | TFLOPs: 31.18 | +7: iteration 99210/ 173500 | consumed samples: 25397760 | consumed tokens: 52014612480 | elapsed time per iteration (s): 0.42 | learning rate: 9.106E-05 | global batch size: 256 | lm loss: 2.928653E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.293 | TFLOPs: 32.02 | +7: iteration 99220/ 173500 | consumed samples: 25400320 | consumed tokens: 52019855360 | elapsed time per iteration (s): 0.42 | learning rate: 9.104E-05 | global batch size: 256 | lm loss: 2.926553E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.639 | TFLOPs: 31.99 | +7: iteration 99230/ 173500 | consumed samples: 25402880 | consumed tokens: 52025098240 | elapsed time per iteration (s): 0.42 | learning rate: 9.103E-05 | global batch size: 256 | lm loss: 2.924460E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.908 | TFLOPs: 32.00 | +7: iteration 99240/ 173500 | consumed samples: 25405440 | consumed tokens: 52030341120 | elapsed time per iteration (s): 0.42 | learning rate: 9.101E-05 | global batch size: 256 | lm loss: 2.921429E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.338 | TFLOPs: 31.97 | +7: iteration 99250/ 173500 | consumed samples: 25408000 | consumed tokens: 52035584000 | elapsed time per iteration (s): 0.42 | learning rate: 9.100E-05 | global batch size: 256 | lm loss: 2.922804E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.003 | TFLOPs: 31.80 | +7: iteration 99260/ 173500 | consumed samples: 25410560 | consumed tokens: 52040826880 | elapsed time per iteration (s): 0.42 | learning rate: 9.098E-05 | global batch size: 256 | lm loss: 2.928345E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.360 | TFLOPs: 31.97 | +7: iteration 99270/ 173500 | consumed samples: 25413120 | consumed tokens: 52046069760 | elapsed time per iteration (s): 0.42 | learning rate: 9.096E-05 | global batch size: 256 | lm loss: 2.918038E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.333 | TFLOPs: 31.97 | +7: iteration 99280/ 173500 | consumed samples: 25415680 | consumed tokens: 52051312640 | elapsed time per iteration (s): 0.42 | learning rate: 9.095E-05 | global batch size: 256 | lm loss: 2.934900E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.919 | TFLOPs: 31.84 | +7: iteration 99290/ 173500 | consumed samples: 25418240 | consumed tokens: 52056555520 | elapsed time per iteration (s): 0.42 | learning rate: 9.093E-05 | global batch size: 256 | lm loss: 2.911737E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.825 | TFLOPs: 32.00 | +7: iteration 99300/ 173500 | consumed samples: 25420800 | consumed tokens: 52061798400 | elapsed time per iteration (s): 0.42 | learning rate: 9.092E-05 | global batch size: 256 | lm loss: 2.914348E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.915 | TFLOPs: 31.79 | +7: iteration 99310/ 173500 | consumed samples: 25423360 | consumed tokens: 52067041280 | elapsed time per iteration (s): 0.42 | learning rate: 9.090E-05 | global batch size: 256 | lm loss: 2.913521E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.745 | TFLOPs: 31.99 | +7: iteration 99320/ 173500 | consumed samples: 25425920 | consumed tokens: 52072284160 | elapsed time per iteration (s): 0.42 | learning rate: 9.088E-05 | global batch size: 256 | lm loss: 2.941055E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.406 | TFLOPs: 31.97 | +7: iteration 99330/ 173500 | consumed samples: 25428480 | consumed tokens: 52077527040 | elapsed time per iteration (s): 0.42 | learning rate: 9.087E-05 | global batch size: 256 | lm loss: 2.939150E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.833 | TFLOPs: 32.00 | +7: iteration 99340/ 173500 | consumed samples: 25431040 | consumed tokens: 52082769920 | elapsed time per iteration (s): 0.42 | learning rate: 9.085E-05 | global batch size: 256 | lm loss: 2.926687E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.648 | TFLOPs: 31.99 | +7: iteration 99350/ 173500 | consumed samples: 25433600 | consumed tokens: 52088012800 | elapsed time per iteration (s): 0.42 | learning rate: 9.084E-05 | global batch size: 256 | lm loss: 2.915582E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.143 | TFLOPs: 31.86 | +7: iteration 99360/ 173500 | consumed samples: 25436160 | consumed tokens: 52093255680 | elapsed time per iteration (s): 0.42 | learning rate: 9.082E-05 | global batch size: 256 | lm loss: 2.925783E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.626 | TFLOPs: 31.62 | +7: iteration 99370/ 173500 | consumed samples: 25438720 | consumed tokens: 52098498560 | elapsed time per iteration (s): 0.42 | learning rate: 9.080E-05 | global batch size: 256 | lm loss: 2.911850E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.241 | TFLOPs: 31.97 | +7: iteration 99380/ 173500 | consumed samples: 25441280 | consumed tokens: 52103741440 | elapsed time per iteration (s): 0.42 | learning rate: 9.079E-05 | global batch size: 256 | lm loss: 2.935150E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.907 | TFLOPs: 31.90 | +7: iteration 99390/ 173500 | consumed samples: 25443840 | consumed tokens: 52108984320 | elapsed time per iteration (s): 0.42 | learning rate: 9.077E-05 | global batch size: 256 | lm loss: 2.928639E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.841 | TFLOPs: 31.84 | +7: iteration 99400/ 173500 | consumed samples: 25446400 | consumed tokens: 52114227200 | elapsed time per iteration (s): 0.42 | learning rate: 9.076E-05 | global batch size: 256 | lm loss: 2.917352E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.893 | TFLOPs: 32.00 | +7: iteration 99410/ 173500 | consumed samples: 25448960 | consumed tokens: 52119470080 | elapsed time per iteration (s): 0.43 | learning rate: 9.074E-05 | global batch size: 256 | lm loss: 2.930627E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.312 | TFLOPs: 31.60 | +7: iteration 99420/ 173500 | consumed samples: 25451520 | consumed tokens: 52124712960 | elapsed time per iteration (s): 0.42 | learning rate: 9.072E-05 | global batch size: 256 | lm loss: 2.932793E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.247 | TFLOPs: 32.02 | +7: iteration 99430/ 173500 | consumed samples: 25454080 | consumed tokens: 52129955840 | elapsed time per iteration (s): 0.42 | learning rate: 9.071E-05 | global batch size: 256 | lm loss: 2.922822E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.309 | TFLOPs: 31.76 | +7: iteration 99440/ 173500 | consumed samples: 25456640 | consumed tokens: 52135198720 | elapsed time per iteration (s): 0.42 | learning rate: 9.069E-05 | global batch size: 256 | lm loss: 2.911143E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.869 | TFLOPs: 31.74 | +7: iteration 99450/ 173500 | consumed samples: 25459200 | consumed tokens: 52140441600 | elapsed time per iteration (s): 0.42 | learning rate: 9.067E-05 | global batch size: 256 | lm loss: 2.922955E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.089 | TFLOPs: 31.80 | +7: iteration 99460/ 173500 | consumed samples: 25461760 | consumed tokens: 52145684480 | elapsed time per iteration (s): 0.42 | learning rate: 9.066E-05 | global batch size: 256 | lm loss: 2.926484E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.101 | TFLOPs: 32.01 | +7: iteration 99470/ 173500 | consumed samples: 25464320 | consumed tokens: 52150927360 | elapsed time per iteration (s): 0.42 | learning rate: 9.064E-05 | global batch size: 256 | lm loss: 2.926423E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.190 | TFLOPs: 32.02 | +7: iteration 99480/ 173500 | consumed samples: 25466880 | consumed tokens: 52156170240 | elapsed time per iteration (s): 0.42 | learning rate: 9.063E-05 | global batch size: 256 | lm loss: 2.916564E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.004 | TFLOPs: 32.01 | +7: iteration 99490/ 173500 | consumed samples: 25469440 | consumed tokens: 52161413120 | elapsed time per iteration (s): 0.42 | learning rate: 9.061E-05 | global batch size: 256 | lm loss: 2.926336E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.145 | TFLOPs: 32.01 | +7: iteration 99500/ 173500 | consumed samples: 25472000 | consumed tokens: 52166656000 | elapsed time per iteration (s): 0.42 | learning rate: 9.059E-05 | global batch size: 256 | lm loss: 2.927580E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.871 | TFLOPs: 32.00 | +7: iteration 99510/ 173500 | consumed samples: 25474560 | consumed tokens: 52171898880 | elapsed time per iteration (s): 0.42 | learning rate: 9.058E-05 | global batch size: 256 | lm loss: 2.920907E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.753 | TFLOPs: 31.99 | +7: iteration 99520/ 173500 | consumed samples: 25477120 | consumed tokens: 52177141760 | elapsed time per iteration (s): 0.42 | learning rate: 9.056E-05 | global batch size: 256 | lm loss: 2.919853E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.920 | TFLOPs: 32.00 | +7: iteration 99530/ 173500 | consumed samples: 25479680 | consumed tokens: 52182384640 | elapsed time per iteration (s): 0.42 | learning rate: 9.055E-05 | global batch size: 256 | lm loss: 2.907842E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.769 | TFLOPs: 31.99 | +7: iteration 99540/ 173500 | consumed samples: 25482240 | consumed tokens: 52187627520 | elapsed time per iteration (s): 0.42 | learning rate: 9.053E-05 | global batch size: 256 | lm loss: 2.918769E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.902 | TFLOPs: 31.90 | +7: iteration 99550/ 173500 | consumed samples: 25484800 | consumed tokens: 52192870400 | elapsed time per iteration (s): 0.42 | learning rate: 9.051E-05 | global batch size: 256 | lm loss: 2.906420E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.714 | TFLOPs: 31.89 | +7: iteration 99560/ 173500 | consumed samples: 25487360 | consumed tokens: 52198113280 | elapsed time per iteration (s): 0.43 | learning rate: 9.050E-05 | global batch size: 256 | lm loss: 2.926049E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.357 | TFLOPs: 30.98 | +7: iteration 99570/ 173500 | consumed samples: 25489920 | consumed tokens: 52203356160 | elapsed time per iteration (s): 0.42 | learning rate: 9.048E-05 | global batch size: 256 | lm loss: 2.909580E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.597 | TFLOPs: 32.04 | +7: iteration 99580/ 173500 | consumed samples: 25492480 | consumed tokens: 52208599040 | elapsed time per iteration (s): 0.43 | learning rate: 9.047E-05 | global batch size: 256 | lm loss: 2.936384E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.100 | TFLOPs: 31.33 | +7: iteration 99590/ 173500 | consumed samples: 25495040 | consumed tokens: 52213841920 | elapsed time per iteration (s): 0.42 | learning rate: 9.045E-05 | global batch size: 256 | lm loss: 2.933557E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.135 | TFLOPs: 31.86 | +7: iteration 99600/ 173500 | consumed samples: 25497600 | consumed tokens: 52219084800 | elapsed time per iteration (s): 0.42 | learning rate: 9.043E-05 | global batch size: 256 | lm loss: 2.915154E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.333 | TFLOPs: 31.81 | +7: iteration 99610/ 173500 | consumed samples: 25500160 | consumed tokens: 52224327680 | elapsed time per iteration (s): 0.42 | learning rate: 9.042E-05 | global batch size: 256 | lm loss: 2.919332E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.504 | TFLOPs: 32.03 | +7: iteration 99620/ 173500 | consumed samples: 25502720 | consumed tokens: 52229570560 | elapsed time per iteration (s): 0.42 | learning rate: 9.040E-05 | global batch size: 256 | lm loss: 2.921795E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.081 | TFLOPs: 31.80 | +7: iteration 99630/ 173500 | consumed samples: 25505280 | consumed tokens: 52234813440 | elapsed time per iteration (s): 0.42 | learning rate: 9.039E-05 | global batch size: 256 | lm loss: 2.935019E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.088 | TFLOPs: 32.01 | +7: iteration 99640/ 173500 | consumed samples: 25507840 | consumed tokens: 52240056320 | elapsed time per iteration (s): 0.42 | learning rate: 9.037E-05 | global batch size: 256 | lm loss: 2.933430E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.432 | TFLOPs: 31.77 | +7: iteration 99650/ 173500 | consumed samples: 25510400 | consumed tokens: 52245299200 | elapsed time per iteration (s): 0.42 | learning rate: 9.035E-05 | global batch size: 256 | lm loss: 2.921847E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.219 | TFLOPs: 32.02 | +7: iteration 99660/ 173500 | consumed samples: 25512960 | consumed tokens: 52250542080 | elapsed time per iteration (s): 0.42 | learning rate: 9.034E-05 | global batch size: 256 | lm loss: 2.928925E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.119 | TFLOPs: 32.01 | +7: iteration 99670/ 173500 | consumed samples: 25515520 | consumed tokens: 52255784960 | elapsed time per iteration (s): 0.42 | learning rate: 9.032E-05 | global batch size: 256 | lm loss: 2.940360E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.178 | TFLOPs: 32.02 | +7: iteration 99680/ 173500 | consumed samples: 25518080 | consumed tokens: 52261027840 | elapsed time per iteration (s): 0.42 | learning rate: 9.031E-05 | global batch size: 256 | lm loss: 2.921293E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.302 | TFLOPs: 32.02 | +7: iteration 99690/ 173500 | consumed samples: 25520640 | consumed tokens: 52266270720 | elapsed time per iteration (s): 0.42 | learning rate: 9.029E-05 | global batch size: 256 | lm loss: 2.923449E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.793 | TFLOPs: 31.99 | +7: iteration 99700/ 173500 | consumed samples: 25523200 | consumed tokens: 52271513600 | elapsed time per iteration (s): 0.42 | learning rate: 9.027E-05 | global batch size: 256 | lm loss: 2.915438E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.673 | TFLOPs: 31.99 | +7: iteration 99710/ 173500 | consumed samples: 25525760 | consumed tokens: 52276756480 | elapsed time per iteration (s): 0.42 | learning rate: 9.026E-05 | global batch size: 256 | lm loss: 2.918147E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.898 | TFLOPs: 32.00 | +7: iteration 99720/ 173500 | consumed samples: 25528320 | consumed tokens: 52281999360 | elapsed time per iteration (s): 0.42 | learning rate: 9.024E-05 | global batch size: 256 | lm loss: 2.924895E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.436 | TFLOPs: 32.03 | +7: iteration 99730/ 173500 | consumed samples: 25530880 | consumed tokens: 52287242240 | elapsed time per iteration (s): 0.42 | learning rate: 9.022E-05 | global batch size: 256 | lm loss: 2.913358E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.426 | TFLOPs: 31.66 | +7: iteration 99740/ 173500 | consumed samples: 25533440 | consumed tokens: 52292485120 | elapsed time per iteration (s): 0.42 | learning rate: 9.021E-05 | global batch size: 256 | lm loss: 2.933542E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.930 | TFLOPs: 32.00 | +7: iteration 99750/ 173500 | consumed samples: 25536000 | consumed tokens: 52297728000 | elapsed time per iteration (s): 0.42 | learning rate: 9.019E-05 | global batch size: 256 | lm loss: 2.921769E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.805 | TFLOPs: 32.00 | +7: iteration 99760/ 173500 | consumed samples: 25538560 | consumed tokens: 52302970880 | elapsed time per iteration (s): 0.42 | learning rate: 9.018E-05 | global batch size: 256 | lm loss: 2.936035E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.387 | TFLOPs: 31.87 | +7: iteration 99770/ 173500 | consumed samples: 25541120 | consumed tokens: 52308213760 | elapsed time per iteration (s): 0.42 | learning rate: 9.016E-05 | global batch size: 256 | lm loss: 2.918094E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.217 | TFLOPs: 32.02 | +7: iteration 99780/ 173500 | consumed samples: 25543680 | consumed tokens: 52313456640 | elapsed time per iteration (s): 0.42 | learning rate: 9.014E-05 | global batch size: 256 | lm loss: 2.917079E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.871 | TFLOPs: 32.00 | +7: iteration 99790/ 173500 | consumed samples: 25546240 | consumed tokens: 52318699520 | elapsed time per iteration (s): 0.42 | learning rate: 9.013E-05 | global batch size: 256 | lm loss: 2.920504E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.172 | TFLOPs: 32.01 | +7: iteration 99800/ 173500 | consumed samples: 25548800 | consumed tokens: 52323942400 | elapsed time per iteration (s): 0.42 | learning rate: 9.011E-05 | global batch size: 256 | lm loss: 2.918364E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.753 | TFLOPs: 31.99 | +7: iteration 99810/ 173500 | consumed samples: 25551360 | consumed tokens: 52329185280 | elapsed time per iteration (s): 0.42 | learning rate: 9.010E-05 | global batch size: 256 | lm loss: 2.929024E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.878 | TFLOPs: 31.68 | +7: iteration 99820/ 173500 | consumed samples: 25553920 | consumed tokens: 52334428160 | elapsed time per iteration (s): 0.42 | learning rate: 9.008E-05 | global batch size: 256 | lm loss: 2.926345E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.888 | TFLOPs: 32.00 | +7: iteration 99830/ 173500 | consumed samples: 25556480 | consumed tokens: 52339671040 | elapsed time per iteration (s): 0.42 | learning rate: 9.006E-05 | global batch size: 256 | lm loss: 2.929350E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.121 | TFLOPs: 32.01 | +7: iteration 99840/ 173500 | consumed samples: 25559040 | consumed tokens: 52344913920 | elapsed time per iteration (s): 0.42 | learning rate: 9.005E-05 | global batch size: 256 | lm loss: 2.930899E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.390 | TFLOPs: 31.97 | +7: iteration 99850/ 173500 | consumed samples: 25561600 | consumed tokens: 52350156800 | elapsed time per iteration (s): 0.42 | learning rate: 9.003E-05 | global batch size: 256 | lm loss: 2.914121E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.532 | TFLOPs: 31.98 | +7: iteration 99860/ 173500 | consumed samples: 25564160 | consumed tokens: 52355399680 | elapsed time per iteration (s): 0.42 | learning rate: 9.002E-05 | global batch size: 256 | lm loss: 2.921257E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.972 | TFLOPs: 32.00 | +7: iteration 99870/ 173500 | consumed samples: 25566720 | consumed tokens: 52360642560 | elapsed time per iteration (s): 0.42 | learning rate: 9.000E-05 | global batch size: 256 | lm loss: 2.944667E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.556 | TFLOPs: 31.98 | +7: iteration 99880/ 173500 | consumed samples: 25569280 | consumed tokens: 52365885440 | elapsed time per iteration (s): 0.42 | learning rate: 8.998E-05 | global batch size: 256 | lm loss: 2.928543E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.677 | TFLOPs: 31.99 | +7: iteration 99890/ 173500 | consumed samples: 25571840 | consumed tokens: 52371128320 | elapsed time per iteration (s): 0.42 | learning rate: 8.997E-05 | global batch size: 256 | lm loss: 2.919786E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.916 | TFLOPs: 32.00 | +7: iteration 99900/ 173500 | consumed samples: 25574400 | consumed tokens: 52376371200 | elapsed time per iteration (s): 0.42 | learning rate: 8.995E-05 | global batch size: 256 | lm loss: 2.918406E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.842 | TFLOPs: 32.00 | +7: iteration 99910/ 173500 | consumed samples: 25576960 | consumed tokens: 52381614080 | elapsed time per iteration (s): 0.42 | learning rate: 8.994E-05 | global batch size: 256 | lm loss: 2.907648E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.751 | TFLOPs: 31.99 | +7: iteration 99920/ 173500 | consumed samples: 25579520 | consumed tokens: 52386856960 | elapsed time per iteration (s): 0.42 | learning rate: 8.992E-05 | global batch size: 256 | lm loss: 2.921557E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.427 | TFLOPs: 31.98 | +7: iteration 99930/ 173500 | consumed samples: 25582080 | consumed tokens: 52392099840 | elapsed time per iteration (s): 0.42 | learning rate: 8.990E-05 | global batch size: 256 | lm loss: 2.918829E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.925 | TFLOPs: 32.00 | +7: iteration 99940/ 173500 | consumed samples: 25584640 | consumed tokens: 52397342720 | elapsed time per iteration (s): 0.42 | learning rate: 8.989E-05 | global batch size: 256 | lm loss: 2.924158E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.658 | TFLOPs: 31.99 | +7: iteration 99950/ 173500 | consumed samples: 25587200 | consumed tokens: 52402585600 | elapsed time per iteration (s): 0.42 | learning rate: 8.987E-05 | global batch size: 256 | lm loss: 2.914882E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.718 | TFLOPs: 31.99 | +7: iteration 99960/ 173500 | consumed samples: 25589760 | consumed tokens: 52407828480 | elapsed time per iteration (s): 0.42 | learning rate: 8.986E-05 | global batch size: 256 | lm loss: 2.920063E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.441 | TFLOPs: 32.03 | +7: iteration 99970/ 173500 | consumed samples: 25592320 | consumed tokens: 52413071360 | elapsed time per iteration (s): 0.42 | learning rate: 8.984E-05 | global batch size: 256 | lm loss: 2.926584E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.036 | TFLOPs: 32.01 | +7: iteration 99980/ 173500 | consumed samples: 25594880 | consumed tokens: 52418314240 | elapsed time per iteration (s): 0.42 | learning rate: 8.982E-05 | global batch size: 256 | lm loss: 2.927337E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.156 | TFLOPs: 32.01 | +7: iteration 99990/ 173500 | consumed samples: 25597440 | consumed tokens: 52423557120 | elapsed time per iteration (s): 0.42 | learning rate: 8.981E-05 | global batch size: 256 | lm loss: 2.939693E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.902 | TFLOPs: 32.00 | +0: [2023-03-17 11:03:08,290] [INFO] [logging.py:68:log_dist] [Rank 0] step=100000, skipped=0, lr=[8.979141123724914e-05, 8.979141123724914e-05, 8.979141123724914e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 100000/ 173500 | consumed samples: 25600000 | consumed tokens: 52428800000 | elapsed time per iteration (s): 0.42 | learning rate: 8.979E-05 | global batch size: 256 | lm loss: 2.917713E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.969 | TFLOPs: 32.00 | +0: steps: 100000 loss: 2.9206 iter time (s): 0.421 samples/sec: 608.713 +7: ------------------------------------------------------------------------------------------------- +7: validation loss at iteration 100000 | lm loss value: 3.201147E+00 | lm loss PPL: 2.456068E+01 | +7: ------------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 100000 to checkpoints_221m91b400m +0: [2023-03-17 11:03:08,481] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step100000 is begin to save! +0: [2023-03-17 11:03:08,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_01-model_00-model_states.pt... +0: [2023-03-17 11:03:08,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_01-model_00-model_states.pt. +0: [2023-03-17 11:03:08,862] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_03-model_00-model_states.pt... +0: [2023-03-17 11:03:08,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_03-model_00-model_states.pt. +0: [2023-03-17 11:03:08,885] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_04-model_00-model_states.pt... +0: [2023-03-17 11:03:08,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_04-model_00-model_states.pt. +0: [2023-03-17 11:03:08,911] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_05-model_00-model_states.pt... +0: [2023-03-17 11:03:08,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_05-model_00-model_states.pt. +0: [2023-03-17 11:03:08,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_06-model_00-model_states.pt... +0: [2023-03-17 11:03:08,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_06-model_00-model_states.pt. +0: [2023-03-17 11:03:08,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_07-model_00-model_states.pt... +0: [2023-03-17 11:03:08,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_07-model_00-model_states.pt. +0: [2023-03-17 11:03:08,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_08-model_00-model_states.pt... +0: [2023-03-17 11:03:09,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_08-model_00-model_states.pt. +0: [2023-03-17 11:03:09,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_09-model_00-model_states.pt... +0: [2023-03-17 11:03:09,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_09-model_00-model_states.pt. +0: [2023-03-17 11:03:09,038] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_10-model_00-model_states.pt... +0: [2023-03-17 11:03:09,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_10-model_00-model_states.pt. +0: [2023-03-17 11:03:09,063] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_11-model_00-model_states.pt... +0: [2023-03-17 11:03:09,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_11-model_00-model_states.pt. +0: [2023-03-17 11:03:09,088] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_12-model_00-model_states.pt... +0: [2023-03-17 11:03:09,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_12-model_00-model_states.pt. +0: [2023-03-17 11:03:09,114] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_13-model_00-model_states.pt... +0: [2023-03-17 11:03:09,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_13-model_00-model_states.pt. +0: [2023-03-17 11:03:09,139] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_14-model_00-model_states.pt... +0: [2023-03-17 11:03:09,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_14-model_00-model_states.pt. +0: [2023-03-17 11:03:09,164] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_15-model_00-model_states.pt... +0: [2023-03-17 11:03:09,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_15-model_00-model_states.pt. +0: [2023-03-17 11:03:09,189] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_16-model_00-model_states.pt... +0: [2023-03-17 11:03:09,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_16-model_00-model_states.pt. +0: [2023-03-17 11:03:09,215] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_17-model_00-model_states.pt... +0: [2023-03-17 11:03:09,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_17-model_00-model_states.pt. +0: [2023-03-17 11:03:09,240] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_18-model_00-model_states.pt... +0: [2023-03-17 11:03:09,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_18-model_00-model_states.pt. +0: [2023-03-17 11:03:09,266] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_19-model_00-model_states.pt... +0: [2023-03-17 11:03:09,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_19-model_00-model_states.pt. +0: [2023-03-17 11:03:09,290] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_20-model_00-model_states.pt... +0: [2023-03-17 11:03:09,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_20-model_00-model_states.pt. +0: [2023-03-17 11:03:09,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/layer_22-model_00-model_states.pt... +0: [2023-03-17 11:03:09,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/layer_22-model_00-model_states.pt. +0: [2023-03-17 11:03:09,323] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step100000/mp_rank_00_model_states.pt +0: [2023-03-17 11:03:09,323] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/mp_rank_00_model_states.pt... +0: [2023-03-17 11:03:09,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/mp_rank_00_model_states.pt. +0: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +7: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +6: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +2: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +4: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +0: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +3: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +1: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +7: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +6: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +2: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +4: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +5: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +0: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +1: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +6: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +5: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +0: [2023-03-17 11:03:09,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 11:03:09,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 11:03:09,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 11:03:09,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 11:03:09,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 11:03:09,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +0: [2023-03-17 11:03:09,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +1: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 11:03:09,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 11:03:09,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 11:03:09,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 11:03:09,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +1: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +1: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +1: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +0: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 11:03:09,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 11:03:09,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 11:03:09,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +0: [2023-03-17 11:03:09,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 11:03:09,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +0: [2023-03-17 11:03:09,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 11:03:09,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 11:03:09,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +0: [2023-03-17 11:03:09,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 11:03:09,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 11:03:09,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +1: [2023-03-17 11:03:09,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 11:03:09,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 11:03:09,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 11:03:09,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 11:03:09,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 11:03:09,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 11:03:09,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 11:03:09,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 11:03:09,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +1: [2023-03-17 11:03:09,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +1: [2023-03-17 11:03:09,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +1: [2023-03-17 11:03:09,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +0: [2023-03-17 11:03:09,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 11:03:09,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 11:03:09,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 11:03:09,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +3: [2023-03-17 11:03:09,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 11:03:09,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 11:03:09,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +3: [2023-03-17 11:03:09,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 11:03:09,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 11:03:09,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +7: [2023-03-17 11:03:09,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +3: [2023-03-17 11:03:09,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 11:03:09,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 11:03:09,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +3: [2023-03-17 11:03:09,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 11:03:09,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 11:03:09,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +3: [2023-03-17 11:03:09,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 11:03:09,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 11:03:09,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 11:03:09,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 11:03:09,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 11:03:09,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 11:03:09,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 11:03:09,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 11:03:09,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +3: [2023-03-17 11:03:09,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +3: [2023-03-17 11:03:09,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +3: [2023-03-17 11:03:09,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +7: [2023-03-17 11:03:09,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 11:03:09,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +7: [2023-03-17 11:03:09,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 11:03:09,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 11:03:09,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +7: [2023-03-17 11:03:09,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 11:03:09,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 11:03:09,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +7: [2023-03-17 11:03:09,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 11:03:09,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 11:03:09,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +7: [2023-03-17 11:03:09,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 11:03:09,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 11:03:09,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +7: [2023-03-17 11:03:09,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 11:03:09,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 11:03:09,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +7: [2023-03-17 11:03:09,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 11:03:09,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 11:03:09,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +7: [2023-03-17 11:03:09,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 11:03:09,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 11:03:09,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +4: [2023-03-17 11:03:09,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 11:03:09,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 11:03:09,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 11:03:09,461] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 11:03:09,461] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 11:03:09,461] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 11:03:09,461] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +4: [2023-03-17 11:03:09,461] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +4: [2023-03-17 11:03:09,461] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 11:03:09,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 11:03:09,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 11:03:09,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 11:03:09,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 11:03:09,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +4: [2023-03-17 11:03:09,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +0: [2023-03-17 11:03:09,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 11:03:09,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 11:03:09,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 11:03:09,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 11:03:09,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 11:03:09,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 11:03:09,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 11:03:09,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 11:03:09,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +6: [2023-03-17 11:03:09,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 11:03:09,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 11:03:09,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 11:03:09,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 11:03:09,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +5: [2023-03-17 11:03:09,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +5: [2023-03-17 11:03:09,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 11:03:09,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 11:03:09,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +5: [2023-03-17 11:03:09,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +5: [2023-03-17 11:03:09,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +5: [2023-03-17 11:03:09,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +5: [2023-03-17 11:03:09,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +5: [2023-03-17 11:03:09,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 11:03:09,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 11:03:09,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +2: [2023-03-17 11:03:09,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +2: [2023-03-17 11:03:09,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 11:03:09,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 11:03:09,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 11:03:09,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 11:03:09,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +2: [2023-03-17 11:03:09,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! +0: successfully saved checkpoint at iteration 100000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 1063.44 +7: iteration 100010/ 173500 | consumed samples: 25602560 | consumed tokens: 52434042880 | elapsed time per iteration (s): 0.54 | learning rate: 8.978E-05 | global batch size: 256 | lm loss: 2.923940E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 474.415 | TFLOPs: 24.89 | +7: iteration 100020/ 173500 | consumed samples: 25605120 | consumed tokens: 52439285760 | elapsed time per iteration (s): 0.42 | learning rate: 8.976E-05 | global batch size: 256 | lm loss: 2.917898E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.892 | TFLOPs: 32.00 | +7: iteration 100030/ 173500 | consumed samples: 25607680 | consumed tokens: 52444528640 | elapsed time per iteration (s): 0.43 | learning rate: 8.974E-05 | global batch size: 256 | lm loss: 2.917600E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.913 | TFLOPs: 31.53 | +7: iteration 100040/ 173500 | consumed samples: 25610240 | consumed tokens: 52449771520 | elapsed time per iteration (s): 0.42 | learning rate: 8.973E-05 | global batch size: 256 | lm loss: 2.923831E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.282 | TFLOPs: 31.92 | +7: iteration 100050/ 173500 | consumed samples: 25612800 | consumed tokens: 52455014400 | elapsed time per iteration (s): 0.42 | learning rate: 8.971E-05 | global batch size: 256 | lm loss: 2.930213E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.345 | TFLOPs: 31.76 | +7: iteration 100060/ 173500 | consumed samples: 25615360 | consumed tokens: 52460257280 | elapsed time per iteration (s): 0.42 | learning rate: 8.970E-05 | global batch size: 256 | lm loss: 2.911623E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.652 | TFLOPs: 31.73 | +7: iteration 100070/ 173500 | consumed samples: 25617920 | consumed tokens: 52465500160 | elapsed time per iteration (s): 0.43 | learning rate: 8.968E-05 | global batch size: 256 | lm loss: 2.923357E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.144 | TFLOPs: 31.38 | +7: iteration 100080/ 173500 | consumed samples: 25620480 | consumed tokens: 52470743040 | elapsed time per iteration (s): 0.42 | learning rate: 8.966E-05 | global batch size: 256 | lm loss: 2.923257E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.511 | TFLOPs: 31.72 | +7: iteration 100090/ 173500 | consumed samples: 25623040 | consumed tokens: 52475985920 | elapsed time per iteration (s): 0.42 | learning rate: 8.965E-05 | global batch size: 256 | lm loss: 2.927769E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.748 | TFLOPs: 32.10 | +7: iteration 100100/ 173500 | consumed samples: 25625600 | consumed tokens: 52481228800 | elapsed time per iteration (s): 0.42 | learning rate: 8.963E-05 | global batch size: 256 | lm loss: 2.917138E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.819 | TFLOPs: 31.73 | +7: iteration 100110/ 173500 | consumed samples: 25628160 | consumed tokens: 52486471680 | elapsed time per iteration (s): 0.44 | learning rate: 8.962E-05 | global batch size: 256 | lm loss: 2.928370E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.556 | TFLOPs: 30.46 | +7: iteration 100120/ 173500 | consumed samples: 25630720 | consumed tokens: 52491714560 | elapsed time per iteration (s): 0.45 | learning rate: 8.960E-05 | global batch size: 256 | lm loss: 2.920701E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.803 | TFLOPs: 29.84 | +7: iteration 100130/ 173500 | consumed samples: 25633280 | consumed tokens: 52496957440 | elapsed time per iteration (s): 0.43 | learning rate: 8.958E-05 | global batch size: 256 | lm loss: 2.912681E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.379 | TFLOPs: 31.50 | +7: iteration 100140/ 173500 | consumed samples: 25635840 | consumed tokens: 52502200320 | elapsed time per iteration (s): 0.42 | learning rate: 8.957E-05 | global batch size: 256 | lm loss: 2.913954E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.952 | TFLOPs: 32.11 | +7: iteration 100150/ 173500 | consumed samples: 25638400 | consumed tokens: 52507443200 | elapsed time per iteration (s): 0.42 | learning rate: 8.955E-05 | global batch size: 256 | lm loss: 2.915559E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.588 | TFLOPs: 32.09 | +7: iteration 100160/ 173500 | consumed samples: 25640960 | consumed tokens: 52512686080 | elapsed time per iteration (s): 0.42 | learning rate: 8.953E-05 | global batch size: 256 | lm loss: 2.913751E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.769 | TFLOPs: 31.63 | +7: iteration 100170/ 173500 | consumed samples: 25643520 | consumed tokens: 52517928960 | elapsed time per iteration (s): 0.43 | learning rate: 8.952E-05 | global batch size: 256 | lm loss: 2.916330E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.010 | TFLOPs: 31.43 | +7: iteration 100180/ 173500 | consumed samples: 25646080 | consumed tokens: 52523171840 | elapsed time per iteration (s): 0.42 | learning rate: 8.950E-05 | global batch size: 256 | lm loss: 2.925000E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.438 | TFLOPs: 31.66 | +7: iteration 100190/ 173500 | consumed samples: 25648640 | consumed tokens: 52528414720 | elapsed time per iteration (s): 0.42 | learning rate: 8.949E-05 | global batch size: 256 | lm loss: 2.927157E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.424 | TFLOPs: 31.61 | +7: iteration 100200/ 173500 | consumed samples: 25651200 | consumed tokens: 52533657600 | elapsed time per iteration (s): 0.42 | learning rate: 8.947E-05 | global batch size: 256 | lm loss: 2.914592E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.090 | TFLOPs: 31.80 | +7: iteration 100210/ 173500 | consumed samples: 25653760 | consumed tokens: 52538900480 | elapsed time per iteration (s): 0.42 | learning rate: 8.945E-05 | global batch size: 256 | lm loss: 2.913954E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.358 | TFLOPs: 31.87 | +7: iteration 100220/ 173500 | consumed samples: 25656320 | consumed tokens: 52544143360 | elapsed time per iteration (s): 0.42 | learning rate: 8.944E-05 | global batch size: 256 | lm loss: 2.921066E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.268 | TFLOPs: 32.07 | +7: iteration 100230/ 173500 | consumed samples: 25658880 | consumed tokens: 52549386240 | elapsed time per iteration (s): 0.42 | learning rate: 8.942E-05 | global batch size: 256 | lm loss: 2.919867E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.103 | TFLOPs: 32.06 | +7: iteration 100240/ 173500 | consumed samples: 25661440 | consumed tokens: 52554629120 | elapsed time per iteration (s): 0.43 | learning rate: 8.941E-05 | global batch size: 256 | lm loss: 2.917824E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.455 | TFLOPs: 31.50 | +7: iteration 100250/ 173500 | consumed samples: 25664000 | consumed tokens: 52559872000 | elapsed time per iteration (s): 0.43 | learning rate: 8.939E-05 | global batch size: 256 | lm loss: 2.924456E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.653 | TFLOPs: 31.57 | +7: iteration 100260/ 173500 | consumed samples: 25666560 | consumed tokens: 52565114880 | elapsed time per iteration (s): 0.42 | learning rate: 8.937E-05 | global batch size: 256 | lm loss: 2.905553E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.188 | TFLOPs: 31.86 | +7: iteration 100270/ 173500 | consumed samples: 25669120 | consumed tokens: 52570357760 | elapsed time per iteration (s): 0.43 | learning rate: 8.936E-05 | global batch size: 256 | lm loss: 2.909832E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.681 | TFLOPs: 31.41 | +7: iteration 100280/ 173500 | consumed samples: 25671680 | consumed tokens: 52575600640 | elapsed time per iteration (s): 0.42 | learning rate: 8.934E-05 | global batch size: 256 | lm loss: 2.921999E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.672 | TFLOPs: 31.94 | +7: iteration 100290/ 173500 | consumed samples: 25674240 | consumed tokens: 52580843520 | elapsed time per iteration (s): 0.42 | learning rate: 8.933E-05 | global batch size: 256 | lm loss: 2.914563E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.132 | TFLOPs: 31.75 | +7: iteration 100300/ 173500 | consumed samples: 25676800 | consumed tokens: 52586086400 | elapsed time per iteration (s): 0.42 | learning rate: 8.931E-05 | global batch size: 256 | lm loss: 2.929876E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.734 | TFLOPs: 31.73 | +7: iteration 100310/ 173500 | consumed samples: 25679360 | consumed tokens: 52591329280 | elapsed time per iteration (s): 0.42 | learning rate: 8.929E-05 | global batch size: 256 | lm loss: 2.924914E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.647 | TFLOPs: 31.83 | +7: iteration 100320/ 173500 | consumed samples: 25681920 | consumed tokens: 52596572160 | elapsed time per iteration (s): 0.42 | learning rate: 8.928E-05 | global batch size: 256 | lm loss: 2.927146E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.973 | TFLOPs: 31.85 | +7: iteration 100330/ 173500 | consumed samples: 25684480 | consumed tokens: 52601815040 | elapsed time per iteration (s): 0.42 | learning rate: 8.926E-05 | global batch size: 256 | lm loss: 2.920058E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.494 | TFLOPs: 31.72 | +7: iteration 100340/ 173500 | consumed samples: 25687040 | consumed tokens: 52607057920 | elapsed time per iteration (s): 0.42 | learning rate: 8.925E-05 | global batch size: 256 | lm loss: 2.911721E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.531 | TFLOPs: 31.82 | +7: iteration 100350/ 173500 | consumed samples: 25689600 | consumed tokens: 52612300800 | elapsed time per iteration (s): 0.43 | learning rate: 8.923E-05 | global batch size: 256 | lm loss: 2.914244E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.817 | TFLOPs: 31.52 | +7: iteration 100360/ 173500 | consumed samples: 25692160 | consumed tokens: 52617543680 | elapsed time per iteration (s): 0.43 | learning rate: 8.921E-05 | global batch size: 256 | lm loss: 2.920973E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.303 | TFLOPs: 31.44 | +7: iteration 100370/ 173500 | consumed samples: 25694720 | consumed tokens: 52622786560 | elapsed time per iteration (s): 0.43 | learning rate: 8.920E-05 | global batch size: 256 | lm loss: 2.937351E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.909 | TFLOPs: 31.53 | +7: iteration 100380/ 173500 | consumed samples: 25697280 | consumed tokens: 52628029440 | elapsed time per iteration (s): 0.43 | learning rate: 8.918E-05 | global batch size: 256 | lm loss: 2.919799E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.517 | TFLOPs: 31.14 | +7: iteration 100390/ 173500 | consumed samples: 25699840 | consumed tokens: 52633272320 | elapsed time per iteration (s): 0.44 | learning rate: 8.917E-05 | global batch size: 256 | lm loss: 2.917348E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.928 | TFLOPs: 30.38 | +7: iteration 100400/ 173500 | consumed samples: 25702400 | consumed tokens: 52638515200 | elapsed time per iteration (s): 0.44 | learning rate: 8.915E-05 | global batch size: 256 | lm loss: 2.909015E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.091 | TFLOPs: 30.59 | +7: iteration 100410/ 173500 | consumed samples: 25704960 | consumed tokens: 52643758080 | elapsed time per iteration (s): 0.43 | learning rate: 8.913E-05 | global batch size: 256 | lm loss: 2.923277E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.570 | TFLOPs: 31.04 | +7: iteration 100420/ 173500 | consumed samples: 25707520 | consumed tokens: 52649000960 | elapsed time per iteration (s): 0.43 | learning rate: 8.912E-05 | global batch size: 256 | lm loss: 2.915591E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.604 | TFLOPs: 31.20 | +7: iteration 100430/ 173500 | consumed samples: 25710080 | consumed tokens: 52654243840 | elapsed time per iteration (s): 0.44 | learning rate: 8.910E-05 | global batch size: 256 | lm loss: 2.915825E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.559 | TFLOPs: 30.41 | +7: iteration 100440/ 173500 | consumed samples: 25712640 | consumed tokens: 52659486720 | elapsed time per iteration (s): 0.43 | learning rate: 8.909E-05 | global batch size: 256 | lm loss: 2.925297E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.720 | TFLOPs: 30.99 | +7: iteration 100450/ 173500 | consumed samples: 25715200 | consumed tokens: 52664729600 | elapsed time per iteration (s): 0.43 | learning rate: 8.907E-05 | global batch size: 256 | lm loss: 2.923006E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.645 | TFLOPs: 31.46 | +7: iteration 100460/ 173500 | consumed samples: 25717760 | consumed tokens: 52669972480 | elapsed time per iteration (s): 0.42 | learning rate: 8.905E-05 | global batch size: 256 | lm loss: 2.926161E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.628 | TFLOPs: 31.67 | +7: iteration 100470/ 173500 | consumed samples: 25720320 | consumed tokens: 52675215360 | elapsed time per iteration (s): 0.43 | learning rate: 8.904E-05 | global batch size: 256 | lm loss: 2.929597E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.625 | TFLOPs: 31.25 | +7: iteration 100480/ 173500 | consumed samples: 25722880 | consumed tokens: 52680458240 | elapsed time per iteration (s): 0.43 | learning rate: 8.902E-05 | global batch size: 256 | lm loss: 2.922406E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.086 | TFLOPs: 31.28 | +7: iteration 100490/ 173500 | consumed samples: 25725440 | consumed tokens: 52685701120 | elapsed time per iteration (s): 0.42 | learning rate: 8.901E-05 | global batch size: 256 | lm loss: 2.914119E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.015 | TFLOPs: 31.69 | +7: iteration 100500/ 173500 | consumed samples: 25728000 | consumed tokens: 52690944000 | elapsed time per iteration (s): 0.44 | learning rate: 8.899E-05 | global batch size: 256 | lm loss: 2.921798E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.522 | TFLOPs: 30.30 | +7: iteration 100510/ 173500 | consumed samples: 25730560 | consumed tokens: 52696186880 | elapsed time per iteration (s): 0.45 | learning rate: 8.897E-05 | global batch size: 256 | lm loss: 2.935121E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.258 | TFLOPs: 30.13 | +7: iteration 100520/ 173500 | consumed samples: 25733120 | consumed tokens: 52701429760 | elapsed time per iteration (s): 0.43 | learning rate: 8.896E-05 | global batch size: 256 | lm loss: 2.931412E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.836 | TFLOPs: 31.37 | +7: iteration 100530/ 173500 | consumed samples: 25735680 | consumed tokens: 52706672640 | elapsed time per iteration (s): 0.45 | learning rate: 8.894E-05 | global batch size: 256 | lm loss: 2.917354E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.755 | TFLOPs: 30.10 | +7: iteration 100540/ 173500 | consumed samples: 25738240 | consumed tokens: 52711915520 | elapsed time per iteration (s): 0.43 | learning rate: 8.893E-05 | global batch size: 256 | lm loss: 2.916121E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.378 | TFLOPs: 30.92 | +7: iteration 100550/ 173500 | consumed samples: 25740800 | consumed tokens: 52717158400 | elapsed time per iteration (s): 0.43 | learning rate: 8.891E-05 | global batch size: 256 | lm loss: 2.916132E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.265 | TFLOPs: 31.23 | +7: iteration 100560/ 173500 | consumed samples: 25743360 | consumed tokens: 52722401280 | elapsed time per iteration (s): 0.43 | learning rate: 8.889E-05 | global batch size: 256 | lm loss: 2.932632E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.028 | TFLOPs: 31.17 | +7: iteration 100570/ 173500 | consumed samples: 25745920 | consumed tokens: 52727644160 | elapsed time per iteration (s): 0.44 | learning rate: 8.888E-05 | global batch size: 256 | lm loss: 2.922162E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.477 | TFLOPs: 30.30 | +7: iteration 100580/ 173500 | consumed samples: 25748480 | consumed tokens: 52732887040 | elapsed time per iteration (s): 0.43 | learning rate: 8.886E-05 | global batch size: 256 | lm loss: 2.918768E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.658 | TFLOPs: 30.89 | +7: iteration 100590/ 173500 | consumed samples: 25751040 | consumed tokens: 52738129920 | elapsed time per iteration (s): 0.43 | learning rate: 8.885E-05 | global batch size: 256 | lm loss: 2.913720E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.814 | TFLOPs: 31.10 | +7: iteration 100600/ 173500 | consumed samples: 25753600 | consumed tokens: 52743372800 | elapsed time per iteration (s): 0.42 | learning rate: 8.883E-05 | global batch size: 256 | lm loss: 2.908926E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.506 | TFLOPs: 31.82 | +7: iteration 100610/ 173500 | consumed samples: 25756160 | consumed tokens: 52748615680 | elapsed time per iteration (s): 0.43 | learning rate: 8.881E-05 | global batch size: 256 | lm loss: 2.911443E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.799 | TFLOPs: 31.42 | +7: iteration 100620/ 173500 | consumed samples: 25758720 | consumed tokens: 52753858560 | elapsed time per iteration (s): 0.45 | learning rate: 8.880E-05 | global batch size: 256 | lm loss: 2.926720E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.012 | TFLOPs: 30.01 | +7: iteration 100630/ 173500 | consumed samples: 25761280 | consumed tokens: 52759101440 | elapsed time per iteration (s): 0.43 | learning rate: 8.878E-05 | global batch size: 256 | lm loss: 2.910276E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.860 | TFLOPs: 31.42 | +7: iteration 100640/ 173500 | consumed samples: 25763840 | consumed tokens: 52764344320 | elapsed time per iteration (s): 0.42 | learning rate: 8.877E-05 | global batch size: 256 | lm loss: 2.941213E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.435 | TFLOPs: 31.66 | +7: iteration 100650/ 173500 | consumed samples: 25766400 | consumed tokens: 52769587200 | elapsed time per iteration (s): 0.43 | learning rate: 8.875E-05 | global batch size: 256 | lm loss: 2.912506E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.495 | TFLOPs: 31.56 | +7: iteration 100660/ 173500 | consumed samples: 25768960 | consumed tokens: 52774830080 | elapsed time per iteration (s): 0.42 | learning rate: 8.873E-05 | global batch size: 256 | lm loss: 2.897624E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.399 | TFLOPs: 31.66 | +7: iteration 100670/ 173500 | consumed samples: 25771520 | consumed tokens: 52780072960 | elapsed time per iteration (s): 0.43 | learning rate: 8.872E-05 | global batch size: 256 | lm loss: 2.913357E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.355 | TFLOPs: 31.08 | +7: iteration 100680/ 173500 | consumed samples: 25774080 | consumed tokens: 52785315840 | elapsed time per iteration (s): 0.43 | learning rate: 8.870E-05 | global batch size: 256 | lm loss: 2.915969E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.380 | TFLOPs: 31.08 | +7: iteration 100690/ 173500 | consumed samples: 25776640 | consumed tokens: 52790558720 | elapsed time per iteration (s): 0.43 | learning rate: 8.869E-05 | global batch size: 256 | lm loss: 2.914434E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.173 | TFLOPs: 31.44 | +7: iteration 100700/ 173500 | consumed samples: 25779200 | consumed tokens: 52795801600 | elapsed time per iteration (s): 0.43 | learning rate: 8.867E-05 | global batch size: 256 | lm loss: 2.926914E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.072 | TFLOPs: 31.12 | +7: iteration 100710/ 173500 | consumed samples: 25781760 | consumed tokens: 52801044480 | elapsed time per iteration (s): 0.43 | learning rate: 8.865E-05 | global batch size: 256 | lm loss: 2.920401E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.557 | TFLOPs: 31.09 | +7: iteration 100720/ 173500 | consumed samples: 25784320 | consumed tokens: 52806287360 | elapsed time per iteration (s): 0.43 | learning rate: 8.864E-05 | global batch size: 256 | lm loss: 2.912472E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.532 | TFLOPs: 31.40 | +7: iteration 100730/ 173500 | consumed samples: 25786880 | consumed tokens: 52811530240 | elapsed time per iteration (s): 0.43 | learning rate: 8.862E-05 | global batch size: 256 | lm loss: 2.922359E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.314 | TFLOPs: 31.24 | +7: iteration 100740/ 173500 | consumed samples: 25789440 | consumed tokens: 52816773120 | elapsed time per iteration (s): 0.43 | learning rate: 8.861E-05 | global batch size: 256 | lm loss: 2.922060E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.666 | TFLOPs: 31.31 | +7: iteration 100750/ 173500 | consumed samples: 25792000 | consumed tokens: 52822016000 | elapsed time per iteration (s): 0.43 | learning rate: 8.859E-05 | global batch size: 256 | lm loss: 2.918364E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.202 | TFLOPs: 31.02 | +7: iteration 100760/ 173500 | consumed samples: 25794560 | consumed tokens: 52827258880 | elapsed time per iteration (s): 0.43 | learning rate: 8.857E-05 | global batch size: 256 | lm loss: 2.918244E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.655 | TFLOPs: 31.36 | +7: iteration 100770/ 173500 | consumed samples: 25797120 | consumed tokens: 52832501760 | elapsed time per iteration (s): 0.43 | learning rate: 8.856E-05 | global batch size: 256 | lm loss: 2.924067E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.363 | TFLOPs: 31.08 | +7: iteration 100780/ 173500 | consumed samples: 25799680 | consumed tokens: 52837744640 | elapsed time per iteration (s): 0.43 | learning rate: 8.854E-05 | global batch size: 256 | lm loss: 2.918708E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.568 | TFLOPs: 31.04 | +7: iteration 100790/ 173500 | consumed samples: 25802240 | consumed tokens: 52842987520 | elapsed time per iteration (s): 0.44 | learning rate: 8.853E-05 | global batch size: 256 | lm loss: 2.925361E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.046 | TFLOPs: 30.49 | +7: iteration 100800/ 173500 | consumed samples: 25804800 | consumed tokens: 52848230400 | elapsed time per iteration (s): 0.43 | learning rate: 8.851E-05 | global batch size: 256 | lm loss: 2.914014E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.870 | TFLOPs: 31.26 | +7: iteration 100810/ 173500 | consumed samples: 25807360 | consumed tokens: 52853473280 | elapsed time per iteration (s): 0.43 | learning rate: 8.849E-05 | global batch size: 256 | lm loss: 2.921022E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.930 | TFLOPs: 31.01 | +7: iteration 100820/ 173500 | consumed samples: 25809920 | consumed tokens: 52858716160 | elapsed time per iteration (s): 0.43 | learning rate: 8.848E-05 | global batch size: 256 | lm loss: 2.919302E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.515 | TFLOPs: 31.04 | +7: iteration 100830/ 173500 | consumed samples: 25812480 | consumed tokens: 52863959040 | elapsed time per iteration (s): 0.43 | learning rate: 8.846E-05 | global batch size: 256 | lm loss: 2.920008E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.696 | TFLOPs: 31.47 | +7: iteration 100840/ 173500 | consumed samples: 25815040 | consumed tokens: 52869201920 | elapsed time per iteration (s): 0.43 | learning rate: 8.845E-05 | global batch size: 256 | lm loss: 2.925491E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.901 | TFLOPs: 31.06 | +7: iteration 100850/ 173500 | consumed samples: 25817600 | consumed tokens: 52874444800 | elapsed time per iteration (s): 0.43 | learning rate: 8.843E-05 | global batch size: 256 | lm loss: 2.927141E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.763 | TFLOPs: 30.94 | +7: iteration 100860/ 173500 | consumed samples: 25820160 | consumed tokens: 52879687680 | elapsed time per iteration (s): 0.44 | learning rate: 8.841E-05 | global batch size: 256 | lm loss: 2.913243E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.719 | TFLOPs: 30.73 | +7: iteration 100870/ 173500 | consumed samples: 25822720 | consumed tokens: 52884930560 | elapsed time per iteration (s): 0.44 | learning rate: 8.840E-05 | global batch size: 256 | lm loss: 2.916776E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.444 | TFLOPs: 30.77 | +7: iteration 100880/ 173500 | consumed samples: 25825280 | consumed tokens: 52890173440 | elapsed time per iteration (s): 0.42 | learning rate: 8.838E-05 | global batch size: 256 | lm loss: 2.925134E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.808 | TFLOPs: 31.84 | +7: iteration 100890/ 173500 | consumed samples: 25827840 | consumed tokens: 52895416320 | elapsed time per iteration (s): 0.43 | learning rate: 8.837E-05 | global batch size: 256 | lm loss: 2.912890E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.171 | TFLOPs: 31.28 | +7: iteration 100900/ 173500 | consumed samples: 25830400 | consumed tokens: 52900659200 | elapsed time per iteration (s): 0.43 | learning rate: 8.835E-05 | global batch size: 256 | lm loss: 2.919348E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.091 | TFLOPs: 31.28 | +7: iteration 100910/ 173500 | consumed samples: 25832960 | consumed tokens: 52905902080 | elapsed time per iteration (s): 0.43 | learning rate: 8.833E-05 | global batch size: 256 | lm loss: 2.917232E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.290 | TFLOPs: 31.18 | +7: iteration 100920/ 173500 | consumed samples: 25835520 | consumed tokens: 52911144960 | elapsed time per iteration (s): 0.43 | learning rate: 8.832E-05 | global batch size: 256 | lm loss: 2.926849E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.683 | TFLOPs: 31.57 | +7: iteration 100930/ 173500 | consumed samples: 25838080 | consumed tokens: 52916387840 | elapsed time per iteration (s): 0.43 | learning rate: 8.830E-05 | global batch size: 256 | lm loss: 2.917771E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.586 | TFLOPs: 31.46 | +7: iteration 100940/ 173500 | consumed samples: 25840640 | consumed tokens: 52921630720 | elapsed time per iteration (s): 0.43 | learning rate: 8.829E-05 | global batch size: 256 | lm loss: 2.917381E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.711 | TFLOPs: 31.31 | +7: iteration 100950/ 173500 | consumed samples: 25843200 | consumed tokens: 52926873600 | elapsed time per iteration (s): 0.44 | learning rate: 8.827E-05 | global batch size: 256 | lm loss: 2.928806E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.552 | TFLOPs: 30.57 | +7: iteration 100960/ 173500 | consumed samples: 25845760 | consumed tokens: 52932116480 | elapsed time per iteration (s): 0.45 | learning rate: 8.825E-05 | global batch size: 256 | lm loss: 2.925779E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.412 | TFLOPs: 29.72 | +7: iteration 100970/ 173500 | consumed samples: 25848320 | consumed tokens: 52937359360 | elapsed time per iteration (s): 0.42 | learning rate: 8.824E-05 | global batch size: 256 | lm loss: 2.920118E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.690 | TFLOPs: 31.88 | +7: iteration 100980/ 173500 | consumed samples: 25850880 | consumed tokens: 52942602240 | elapsed time per iteration (s): 0.43 | learning rate: 8.822E-05 | global batch size: 256 | lm loss: 2.912406E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.882 | TFLOPs: 31.06 | +7: iteration 100990/ 173500 | consumed samples: 25853440 | consumed tokens: 52947845120 | elapsed time per iteration (s): 0.43 | learning rate: 8.821E-05 | global batch size: 256 | lm loss: 2.928269E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.996 | TFLOPs: 30.90 | +7: iteration 101000/ 173500 | consumed samples: 25856000 | consumed tokens: 52953088000 | elapsed time per iteration (s): 0.43 | learning rate: 8.819E-05 | global batch size: 256 | lm loss: 2.911867E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.487 | TFLOPs: 31.09 | +7: iteration 101010/ 173500 | consumed samples: 25858560 | consumed tokens: 52958330880 | elapsed time per iteration (s): 0.43 | learning rate: 8.817E-05 | global batch size: 256 | lm loss: 2.916444E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.257 | TFLOPs: 31.23 | +7: iteration 101020/ 173500 | consumed samples: 25861120 | consumed tokens: 52963573760 | elapsed time per iteration (s): 0.43 | learning rate: 8.816E-05 | global batch size: 256 | lm loss: 2.918350E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.301 | TFLOPs: 31.39 | +7: iteration 101030/ 173500 | consumed samples: 25863680 | consumed tokens: 52968816640 | elapsed time per iteration (s): 0.43 | learning rate: 8.814E-05 | global batch size: 256 | lm loss: 2.923268E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.792 | TFLOPs: 31.21 | +7: iteration 101040/ 173500 | consumed samples: 25866240 | consumed tokens: 52974059520 | elapsed time per iteration (s): 0.43 | learning rate: 8.813E-05 | global batch size: 256 | lm loss: 2.929180E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.157 | TFLOPs: 31.38 | +7: iteration 101050/ 173500 | consumed samples: 25868800 | consumed tokens: 52979302400 | elapsed time per iteration (s): 0.44 | learning rate: 8.811E-05 | global batch size: 256 | lm loss: 2.922721E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.192 | TFLOPs: 30.81 | +7: iteration 101060/ 173500 | consumed samples: 25871360 | consumed tokens: 52984545280 | elapsed time per iteration (s): 0.45 | learning rate: 8.810E-05 | global batch size: 256 | lm loss: 2.919013E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.152 | TFLOPs: 29.81 | +7: iteration 101070/ 173500 | consumed samples: 25873920 | consumed tokens: 52989788160 | elapsed time per iteration (s): 0.43 | learning rate: 8.808E-05 | global batch size: 256 | lm loss: 2.901114E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.297 | TFLOPs: 31.13 | +7: iteration 101080/ 173500 | consumed samples: 25876480 | consumed tokens: 52995031040 | elapsed time per iteration (s): 0.43 | learning rate: 8.806E-05 | global batch size: 256 | lm loss: 2.927603E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.536 | TFLOPs: 31.30 | +7: iteration 101090/ 173500 | consumed samples: 25879040 | consumed tokens: 53000273920 | elapsed time per iteration (s): 0.44 | learning rate: 8.805E-05 | global batch size: 256 | lm loss: 2.912622E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.975 | TFLOPs: 30.80 | +7: iteration 101100/ 173500 | consumed samples: 25881600 | consumed tokens: 53005516800 | elapsed time per iteration (s): 0.43 | learning rate: 8.803E-05 | global batch size: 256 | lm loss: 2.914808E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.903 | TFLOPs: 31.37 | +7: iteration 101110/ 173500 | consumed samples: 25884160 | consumed tokens: 53010759680 | elapsed time per iteration (s): 0.43 | learning rate: 8.802E-05 | global batch size: 256 | lm loss: 2.919876E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.201 | TFLOPs: 31.44 | +7: iteration 101120/ 173500 | consumed samples: 25886720 | consumed tokens: 53016002560 | elapsed time per iteration (s): 0.43 | learning rate: 8.800E-05 | global batch size: 256 | lm loss: 2.916516E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.160 | TFLOPs: 31.23 | +7: iteration 101130/ 173500 | consumed samples: 25889280 | consumed tokens: 53021245440 | elapsed time per iteration (s): 0.43 | learning rate: 8.798E-05 | global batch size: 256 | lm loss: 2.918634E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.979 | TFLOPs: 31.27 | +7: iteration 101140/ 173500 | consumed samples: 25891840 | consumed tokens: 53026488320 | elapsed time per iteration (s): 0.43 | learning rate: 8.797E-05 | global batch size: 256 | lm loss: 2.925185E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.323 | TFLOPs: 30.92 | +7: iteration 101150/ 173500 | consumed samples: 25894400 | consumed tokens: 53031731200 | elapsed time per iteration (s): 0.43 | learning rate: 8.795E-05 | global batch size: 256 | lm loss: 2.920141E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.839 | TFLOPs: 31.53 | +7: iteration 101160/ 173500 | consumed samples: 25896960 | consumed tokens: 53036974080 | elapsed time per iteration (s): 0.43 | learning rate: 8.794E-05 | global batch size: 256 | lm loss: 2.922003E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.938 | TFLOPs: 31.48 | +7: iteration 101170/ 173500 | consumed samples: 25899520 | consumed tokens: 53042216960 | elapsed time per iteration (s): 0.43 | learning rate: 8.792E-05 | global batch size: 256 | lm loss: 2.920914E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.925 | TFLOPs: 30.95 | +7: iteration 101180/ 173500 | consumed samples: 25902080 | consumed tokens: 53047459840 | elapsed time per iteration (s): 0.42 | learning rate: 8.790E-05 | global batch size: 256 | lm loss: 2.932991E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.645 | TFLOPs: 32.09 | +7: iteration 101190/ 173500 | consumed samples: 25904640 | consumed tokens: 53052702720 | elapsed time per iteration (s): 0.44 | learning rate: 8.789E-05 | global batch size: 256 | lm loss: 2.902938E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.257 | TFLOPs: 30.34 | +7: iteration 101200/ 173500 | consumed samples: 25907200 | consumed tokens: 53057945600 | elapsed time per iteration (s): 0.43 | learning rate: 8.787E-05 | global batch size: 256 | lm loss: 2.926863E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.180 | TFLOPs: 31.23 | +7: iteration 101210/ 173500 | consumed samples: 25909760 | consumed tokens: 53063188480 | elapsed time per iteration (s): 0.43 | learning rate: 8.786E-05 | global batch size: 256 | lm loss: 2.922871E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.519 | TFLOPs: 31.56 | +7: iteration 101220/ 173500 | consumed samples: 25912320 | consumed tokens: 53068431360 | elapsed time per iteration (s): 0.44 | learning rate: 8.784E-05 | global batch size: 256 | lm loss: 2.925516E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.995 | TFLOPs: 30.59 | +7: iteration 101230/ 173500 | consumed samples: 25914880 | consumed tokens: 53073674240 | elapsed time per iteration (s): 0.43 | learning rate: 8.782E-05 | global batch size: 256 | lm loss: 2.922118E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.125 | TFLOPs: 30.96 | +7: iteration 101240/ 173500 | consumed samples: 25917440 | consumed tokens: 53078917120 | elapsed time per iteration (s): 0.44 | learning rate: 8.781E-05 | global batch size: 256 | lm loss: 2.915415E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.301 | TFLOPs: 30.60 | +7: iteration 101250/ 173500 | consumed samples: 25920000 | consumed tokens: 53084160000 | elapsed time per iteration (s): 0.44 | learning rate: 8.779E-05 | global batch size: 256 | lm loss: 2.923671E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.295 | TFLOPs: 30.87 | +7: iteration 101260/ 173500 | consumed samples: 25922560 | consumed tokens: 53089402880 | elapsed time per iteration (s): 0.42 | learning rate: 8.778E-05 | global batch size: 256 | lm loss: 2.919851E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.490 | TFLOPs: 31.72 | +7: iteration 101270/ 173500 | consumed samples: 25925120 | consumed tokens: 53094645760 | elapsed time per iteration (s): 0.43 | learning rate: 8.776E-05 | global batch size: 256 | lm loss: 2.918524E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.795 | TFLOPs: 31.52 | +7: iteration 101280/ 173500 | consumed samples: 25927680 | consumed tokens: 53099888640 | elapsed time per iteration (s): 0.44 | learning rate: 8.774E-05 | global batch size: 256 | lm loss: 2.930528E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.276 | TFLOPs: 30.71 | +7: iteration 101290/ 173500 | consumed samples: 25930240 | consumed tokens: 53105131520 | elapsed time per iteration (s): 0.43 | learning rate: 8.773E-05 | global batch size: 256 | lm loss: 2.910348E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.443 | TFLOPs: 31.45 | +7: iteration 101300/ 173500 | consumed samples: 25932800 | consumed tokens: 53110374400 | elapsed time per iteration (s): 0.43 | learning rate: 8.771E-05 | global batch size: 256 | lm loss: 2.919179E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.924 | TFLOPs: 31.42 | +7: iteration 101310/ 173500 | consumed samples: 25935360 | consumed tokens: 53115617280 | elapsed time per iteration (s): 0.43 | learning rate: 8.770E-05 | global batch size: 256 | lm loss: 2.921794E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.988 | TFLOPs: 31.38 | +7: iteration 101320/ 173500 | consumed samples: 25937920 | consumed tokens: 53120860160 | elapsed time per iteration (s): 0.42 | learning rate: 8.768E-05 | global batch size: 256 | lm loss: 2.933629E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.958 | TFLOPs: 31.69 | +7: iteration 101330/ 173500 | consumed samples: 25940480 | consumed tokens: 53126103040 | elapsed time per iteration (s): 0.44 | learning rate: 8.766E-05 | global batch size: 256 | lm loss: 2.916180E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.578 | TFLOPs: 30.57 | +7: iteration 101340/ 173500 | consumed samples: 25943040 | consumed tokens: 53131345920 | elapsed time per iteration (s): 0.42 | learning rate: 8.765E-05 | global batch size: 256 | lm loss: 2.920002E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.663 | TFLOPs: 32.09 | +7: iteration 101350/ 173500 | consumed samples: 25945600 | consumed tokens: 53136588800 | elapsed time per iteration (s): 0.43 | learning rate: 8.763E-05 | global batch size: 256 | lm loss: 2.922454E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.295 | TFLOPs: 31.13 | +7: iteration 101360/ 173500 | consumed samples: 25948160 | consumed tokens: 53141831680 | elapsed time per iteration (s): 0.42 | learning rate: 8.762E-05 | global batch size: 256 | lm loss: 2.902845E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.759 | TFLOPs: 31.84 | +7: iteration 101370/ 173500 | consumed samples: 25950720 | consumed tokens: 53147074560 | elapsed time per iteration (s): 0.43 | learning rate: 8.760E-05 | global batch size: 256 | lm loss: 2.919281E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.180 | TFLOPs: 31.28 | +7: iteration 101380/ 173500 | consumed samples: 25953280 | consumed tokens: 53152317440 | elapsed time per iteration (s): 0.43 | learning rate: 8.758E-05 | global batch size: 256 | lm loss: 2.927055E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.422 | TFLOPs: 31.19 | +7: iteration 101390/ 173500 | consumed samples: 25955840 | consumed tokens: 53157560320 | elapsed time per iteration (s): 0.43 | learning rate: 8.757E-05 | global batch size: 256 | lm loss: 2.909416E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.227 | TFLOPs: 31.13 | +7: iteration 101400/ 173500 | consumed samples: 25958400 | consumed tokens: 53162803200 | elapsed time per iteration (s): 0.43 | learning rate: 8.755E-05 | global batch size: 256 | lm loss: 2.917080E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.813 | TFLOPs: 31.47 | +7: iteration 101410/ 173500 | consumed samples: 25960960 | consumed tokens: 53168046080 | elapsed time per iteration (s): 0.43 | learning rate: 8.754E-05 | global batch size: 256 | lm loss: 2.930121E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.048 | TFLOPs: 31.06 | +7: iteration 101420/ 173500 | consumed samples: 25963520 | consumed tokens: 53173288960 | elapsed time per iteration (s): 0.42 | learning rate: 8.752E-05 | global batch size: 256 | lm loss: 2.914948E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.211 | TFLOPs: 31.65 | +7: iteration 101430/ 173500 | consumed samples: 25966080 | consumed tokens: 53178531840 | elapsed time per iteration (s): 0.43 | learning rate: 8.750E-05 | global batch size: 256 | lm loss: 2.912124E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.843 | TFLOPs: 31.21 | +7: iteration 101440/ 173500 | consumed samples: 25968640 | consumed tokens: 53183774720 | elapsed time per iteration (s): 0.43 | learning rate: 8.749E-05 | global batch size: 256 | lm loss: 2.925857E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.292 | TFLOPs: 30.92 | +7: iteration 101450/ 173500 | consumed samples: 25971200 | consumed tokens: 53189017600 | elapsed time per iteration (s): 0.42 | learning rate: 8.747E-05 | global batch size: 256 | lm loss: 2.912499E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.964 | TFLOPs: 31.69 | +7: iteration 101460/ 173500 | consumed samples: 25973760 | consumed tokens: 53194260480 | elapsed time per iteration (s): 0.43 | learning rate: 8.746E-05 | global batch size: 256 | lm loss: 2.913045E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.390 | TFLOPs: 31.03 | +7: iteration 101470/ 173500 | consumed samples: 25976320 | consumed tokens: 53199503360 | elapsed time per iteration (s): 0.43 | learning rate: 8.744E-05 | global batch size: 256 | lm loss: 2.909498E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.247 | TFLOPs: 31.34 | +7: iteration 101480/ 173500 | consumed samples: 25978880 | consumed tokens: 53204746240 | elapsed time per iteration (s): 0.42 | learning rate: 8.743E-05 | global batch size: 256 | lm loss: 2.927869E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.380 | TFLOPs: 31.61 | +7: iteration 101490/ 173500 | consumed samples: 25981440 | consumed tokens: 53209989120 | elapsed time per iteration (s): 0.42 | learning rate: 8.741E-05 | global batch size: 256 | lm loss: 2.915676E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.208 | TFLOPs: 31.81 | +7: iteration 101500/ 173500 | consumed samples: 25984000 | consumed tokens: 53215232000 | elapsed time per iteration (s): 0.44 | learning rate: 8.739E-05 | global batch size: 256 | lm loss: 2.912340E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.317 | TFLOPs: 30.76 | +7: iteration 101510/ 173500 | consumed samples: 25986560 | consumed tokens: 53220474880 | elapsed time per iteration (s): 0.44 | learning rate: 8.738E-05 | global batch size: 256 | lm loss: 2.928597E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.296 | TFLOPs: 30.87 | +7: iteration 101520/ 173500 | consumed samples: 25989120 | consumed tokens: 53225717760 | elapsed time per iteration (s): 0.43 | learning rate: 8.736E-05 | global batch size: 256 | lm loss: 2.927380E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.089 | TFLOPs: 31.43 | +7: iteration 101530/ 173500 | consumed samples: 25991680 | consumed tokens: 53230960640 | elapsed time per iteration (s): 0.44 | learning rate: 8.735E-05 | global batch size: 256 | lm loss: 2.925770E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.430 | TFLOPs: 30.45 | +7: iteration 101540/ 173500 | consumed samples: 25994240 | consumed tokens: 53236203520 | elapsed time per iteration (s): 0.43 | learning rate: 8.733E-05 | global batch size: 256 | lm loss: 2.921439E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.029 | TFLOPs: 31.38 | +7: iteration 101550/ 173500 | consumed samples: 25996800 | consumed tokens: 53241446400 | elapsed time per iteration (s): 0.43 | learning rate: 8.731E-05 | global batch size: 256 | lm loss: 2.911558E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.082 | TFLOPs: 31.43 | +7: iteration 101560/ 173500 | consumed samples: 25999360 | consumed tokens: 53246689280 | elapsed time per iteration (s): 0.43 | learning rate: 8.730E-05 | global batch size: 256 | lm loss: 2.927994E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.501 | TFLOPs: 31.24 | +7: iteration 101570/ 173500 | consumed samples: 26001920 | consumed tokens: 53251932160 | elapsed time per iteration (s): 0.43 | learning rate: 8.728E-05 | global batch size: 256 | lm loss: 2.923427E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.162 | TFLOPs: 31.33 | +7: iteration 101580/ 173500 | consumed samples: 26004480 | consumed tokens: 53257175040 | elapsed time per iteration (s): 0.42 | learning rate: 8.727E-05 | global batch size: 256 | lm loss: 2.921186E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.539 | TFLOPs: 31.67 | +7: iteration 101590/ 173500 | consumed samples: 26007040 | consumed tokens: 53262417920 | elapsed time per iteration (s): 0.43 | learning rate: 8.725E-05 | global batch size: 256 | lm loss: 2.932728E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.940 | TFLOPs: 31.37 | +7: iteration 101600/ 173500 | consumed samples: 26009600 | consumed tokens: 53267660800 | elapsed time per iteration (s): 0.48 | learning rate: 8.723E-05 | global batch size: 256 | lm loss: 2.905454E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.207 | TFLOPs: 28.24 | +7: iteration 101610/ 173500 | consumed samples: 26012160 | consumed tokens: 53272903680 | elapsed time per iteration (s): 0.43 | learning rate: 8.722E-05 | global batch size: 256 | lm loss: 2.911515E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.412 | TFLOPs: 30.98 | +7: iteration 101620/ 173500 | consumed samples: 26014720 | consumed tokens: 53278146560 | elapsed time per iteration (s): 0.43 | learning rate: 8.720E-05 | global batch size: 256 | lm loss: 2.924727E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.003 | TFLOPs: 31.59 | +7: iteration 101630/ 173500 | consumed samples: 26017280 | consumed tokens: 53283389440 | elapsed time per iteration (s): 0.42 | learning rate: 8.719E-05 | global batch size: 256 | lm loss: 2.922982E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.420 | TFLOPs: 31.92 | +7: iteration 101640/ 173500 | consumed samples: 26019840 | consumed tokens: 53288632320 | elapsed time per iteration (s): 0.42 | learning rate: 8.717E-05 | global batch size: 256 | lm loss: 2.937720E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.238 | TFLOPs: 31.70 | +7: iteration 101650/ 173500 | consumed samples: 26022400 | consumed tokens: 53293875200 | elapsed time per iteration (s): 0.44 | learning rate: 8.715E-05 | global batch size: 256 | lm loss: 2.915079E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.390 | TFLOPs: 30.87 | +7: iteration 101660/ 173500 | consumed samples: 26024960 | consumed tokens: 53299118080 | elapsed time per iteration (s): 0.43 | learning rate: 8.714E-05 | global batch size: 256 | lm loss: 2.911693E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.284 | TFLOPs: 31.18 | +7: iteration 101670/ 173500 | consumed samples: 26027520 | consumed tokens: 53304360960 | elapsed time per iteration (s): 0.43 | learning rate: 8.712E-05 | global batch size: 256 | lm loss: 2.927596E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.058 | TFLOPs: 31.54 | +7: iteration 101680/ 173500 | consumed samples: 26030080 | consumed tokens: 53309603840 | elapsed time per iteration (s): 0.43 | learning rate: 8.711E-05 | global batch size: 256 | lm loss: 2.919161E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.694 | TFLOPs: 31.41 | +7: iteration 101690/ 173500 | consumed samples: 26032640 | consumed tokens: 53314846720 | elapsed time per iteration (s): 0.42 | learning rate: 8.709E-05 | global batch size: 256 | lm loss: 2.901450E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.574 | TFLOPs: 31.88 | +7: iteration 101700/ 173500 | consumed samples: 26035200 | consumed tokens: 53320089600 | elapsed time per iteration (s): 0.42 | learning rate: 8.707E-05 | global batch size: 256 | lm loss: 2.909881E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.618 | TFLOPs: 31.67 | +7: iteration 101710/ 173500 | consumed samples: 26037760 | consumed tokens: 53325332480 | elapsed time per iteration (s): 0.44 | learning rate: 8.706E-05 | global batch size: 256 | lm loss: 2.912038E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.336 | TFLOPs: 30.82 | +7: iteration 101720/ 173500 | consumed samples: 26040320 | consumed tokens: 53330575360 | elapsed time per iteration (s): 0.42 | learning rate: 8.704E-05 | global batch size: 256 | lm loss: 2.938552E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.896 | TFLOPs: 31.69 | +7: iteration 101730/ 173500 | consumed samples: 26042880 | consumed tokens: 53335818240 | elapsed time per iteration (s): 0.44 | learning rate: 8.703E-05 | global batch size: 256 | lm loss: 2.920471E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.375 | TFLOPs: 30.82 | +7: iteration 101740/ 173500 | consumed samples: 26045440 | consumed tokens: 53341061120 | elapsed time per iteration (s): 0.44 | learning rate: 8.701E-05 | global batch size: 256 | lm loss: 2.918454E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.595 | TFLOPs: 30.83 | +7: iteration 101750/ 173500 | consumed samples: 26048000 | consumed tokens: 53346304000 | elapsed time per iteration (s): 0.48 | learning rate: 8.700E-05 | global batch size: 256 | lm loss: 2.921886E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.412 | TFLOPs: 28.14 | +7: iteration 101760/ 173500 | consumed samples: 26050560 | consumed tokens: 53351546880 | elapsed time per iteration (s): 0.47 | learning rate: 8.698E-05 | global batch size: 256 | lm loss: 2.918853E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 543.161 | TFLOPs: 28.50 | +7: iteration 101770/ 173500 | consumed samples: 26053120 | consumed tokens: 53356789760 | elapsed time per iteration (s): 0.49 | learning rate: 8.696E-05 | global batch size: 256 | lm loss: 2.917197E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 520.069 | TFLOPs: 27.29 | +7: iteration 101780/ 173500 | consumed samples: 26055680 | consumed tokens: 53362032640 | elapsed time per iteration (s): 0.45 | learning rate: 8.695E-05 | global batch size: 256 | lm loss: 2.920094E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.381 | TFLOPs: 30.14 | +7: iteration 101790/ 173500 | consumed samples: 26058240 | consumed tokens: 53367275520 | elapsed time per iteration (s): 0.47 | learning rate: 8.693E-05 | global batch size: 256 | lm loss: 2.911981E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 544.310 | TFLOPs: 28.56 | +7: iteration 101800/ 173500 | consumed samples: 26060800 | consumed tokens: 53372518400 | elapsed time per iteration (s): 0.46 | learning rate: 8.692E-05 | global batch size: 256 | lm loss: 2.925603E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.504 | TFLOPs: 29.15 | +7: iteration 101810/ 173500 | consumed samples: 26063360 | consumed tokens: 53377761280 | elapsed time per iteration (s): 0.45 | learning rate: 8.690E-05 | global batch size: 256 | lm loss: 2.909841E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.294 | TFLOPs: 29.61 | +7: iteration 101820/ 173500 | consumed samples: 26065920 | consumed tokens: 53383004160 | elapsed time per iteration (s): 0.42 | learning rate: 8.688E-05 | global batch size: 256 | lm loss: 2.929733E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.299 | TFLOPs: 31.71 | +7: iteration 101830/ 173500 | consumed samples: 26068480 | consumed tokens: 53388247040 | elapsed time per iteration (s): 0.47 | learning rate: 8.687E-05 | global batch size: 256 | lm loss: 2.927968E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 545.675 | TFLOPs: 28.63 | +7: iteration 101840/ 173500 | consumed samples: 26071040 | consumed tokens: 53393489920 | elapsed time per iteration (s): 0.43 | learning rate: 8.685E-05 | global batch size: 256 | lm loss: 2.920285E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.657 | TFLOPs: 30.94 | +7: iteration 101850/ 173500 | consumed samples: 26073600 | consumed tokens: 53398732800 | elapsed time per iteration (s): 0.42 | learning rate: 8.684E-05 | global batch size: 256 | lm loss: 2.917834E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.010 | TFLOPs: 31.69 | +7: iteration 101860/ 173500 | consumed samples: 26076160 | consumed tokens: 53403975680 | elapsed time per iteration (s): 0.46 | learning rate: 8.682E-05 | global batch size: 256 | lm loss: 2.918915E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.236 | TFLOPs: 29.29 | +7: iteration 101870/ 173500 | consumed samples: 26078720 | consumed tokens: 53409218560 | elapsed time per iteration (s): 0.47 | learning rate: 8.680E-05 | global batch size: 256 | lm loss: 2.903136E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.427 | TFLOPs: 28.30 | +7: iteration 101880/ 173500 | consumed samples: 26081280 | consumed tokens: 53414461440 | elapsed time per iteration (s): 0.48 | learning rate: 8.679E-05 | global batch size: 256 | lm loss: 2.920580E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.622 | TFLOPs: 27.74 | +7: iteration 101890/ 173500 | consumed samples: 26083840 | consumed tokens: 53419704320 | elapsed time per iteration (s): 0.51 | learning rate: 8.677E-05 | global batch size: 256 | lm loss: 2.927745E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 503.847 | TFLOPs: 26.44 | +7: iteration 101900/ 173500 | consumed samples: 26086400 | consumed tokens: 53424947200 | elapsed time per iteration (s): 0.46 | learning rate: 8.676E-05 | global batch size: 256 | lm loss: 2.909259E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.916 | TFLOPs: 29.43 | +7: iteration 101910/ 173500 | consumed samples: 26088960 | consumed tokens: 53430190080 | elapsed time per iteration (s): 0.45 | learning rate: 8.674E-05 | global batch size: 256 | lm loss: 2.923731E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.194 | TFLOPs: 29.65 | +7: iteration 101920/ 173500 | consumed samples: 26091520 | consumed tokens: 53435432960 | elapsed time per iteration (s): 0.45 | learning rate: 8.672E-05 | global batch size: 256 | lm loss: 2.910206E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.337 | TFLOPs: 29.92 | +7: iteration 101930/ 173500 | consumed samples: 26094080 | consumed tokens: 53440675840 | elapsed time per iteration (s): 0.45 | learning rate: 8.671E-05 | global batch size: 256 | lm loss: 2.923256E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.881 | TFLOPs: 30.06 | +7: iteration 101940/ 173500 | consumed samples: 26096640 | consumed tokens: 53445918720 | elapsed time per iteration (s): 0.45 | learning rate: 8.669E-05 | global batch size: 256 | lm loss: 2.916090E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.656 | TFLOPs: 29.73 | +7: iteration 101950/ 173500 | consumed samples: 26099200 | consumed tokens: 53451161600 | elapsed time per iteration (s): 0.43 | learning rate: 8.668E-05 | global batch size: 256 | lm loss: 2.928440E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.124 | TFLOPs: 31.33 | +7: iteration 101960/ 173500 | consumed samples: 26101760 | consumed tokens: 53456404480 | elapsed time per iteration (s): 0.43 | learning rate: 8.666E-05 | global batch size: 256 | lm loss: 2.934917E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.910 | TFLOPs: 31.21 | +7: iteration 101970/ 173500 | consumed samples: 26104320 | consumed tokens: 53461647360 | elapsed time per iteration (s): 0.42 | learning rate: 8.665E-05 | global batch size: 256 | lm loss: 2.929532E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.480 | TFLOPs: 31.93 | +7: iteration 101980/ 173500 | consumed samples: 26106880 | consumed tokens: 53466890240 | elapsed time per iteration (s): 0.43 | learning rate: 8.663E-05 | global batch size: 256 | lm loss: 2.922868E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.740 | TFLOPs: 31.31 | +7: iteration 101990/ 173500 | consumed samples: 26109440 | consumed tokens: 53472133120 | elapsed time per iteration (s): 0.43 | learning rate: 8.661E-05 | global batch size: 256 | lm loss: 2.922999E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.510 | TFLOPs: 31.19 | +0: [2023-03-17 11:17:35,125] [INFO] [logging.py:68:log_dist] [Rank 0] step=102000, skipped=0, lr=[8.659751165175261e-05, 8.659751165175261e-05, 8.659751165175261e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 102000/ 173500 | consumed samples: 26112000 | consumed tokens: 53477376000 | elapsed time per iteration (s): 0.43 | learning rate: 8.660E-05 | global batch size: 256 | lm loss: 2.922551E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.115 | TFLOPs: 31.07 | +0: steps: 102000 loss: 2.9406 iter time (s): 0.431 samples/sec: 593.905 +7: iteration 102010/ 173500 | consumed samples: 26114560 | consumed tokens: 53482618880 | elapsed time per iteration (s): 0.42 | learning rate: 8.658E-05 | global batch size: 256 | lm loss: 2.923052E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.296 | TFLOPs: 31.86 | +7: iteration 102020/ 173500 | consumed samples: 26117120 | consumed tokens: 53487861760 | elapsed time per iteration (s): 0.42 | learning rate: 8.657E-05 | global batch size: 256 | lm loss: 2.924941E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.988 | TFLOPs: 31.95 | +7: iteration 102030/ 173500 | consumed samples: 26119680 | consumed tokens: 53493104640 | elapsed time per iteration (s): 0.44 | learning rate: 8.655E-05 | global batch size: 256 | lm loss: 2.914611E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.294 | TFLOPs: 30.60 | +7: iteration 102040/ 173500 | consumed samples: 26122240 | consumed tokens: 53498347520 | elapsed time per iteration (s): 0.43 | learning rate: 8.653E-05 | global batch size: 256 | lm loss: 2.919156E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.028 | TFLOPs: 31.33 | +7: iteration 102050/ 173500 | consumed samples: 26124800 | consumed tokens: 53503590400 | elapsed time per iteration (s): 0.42 | learning rate: 8.652E-05 | global batch size: 256 | lm loss: 2.920456E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.756 | TFLOPs: 31.73 | +7: iteration 102060/ 173500 | consumed samples: 26127360 | consumed tokens: 53508833280 | elapsed time per iteration (s): 0.43 | learning rate: 8.650E-05 | global batch size: 256 | lm loss: 2.923677E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.824 | TFLOPs: 31.47 | +7: iteration 102070/ 173500 | consumed samples: 26129920 | consumed tokens: 53514076160 | elapsed time per iteration (s): 0.42 | learning rate: 8.649E-05 | global batch size: 256 | lm loss: 2.920556E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.164 | TFLOPs: 31.91 | +7: iteration 102080/ 173500 | consumed samples: 26132480 | consumed tokens: 53519319040 | elapsed time per iteration (s): 0.43 | learning rate: 8.647E-05 | global batch size: 256 | lm loss: 2.921901E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.651 | TFLOPs: 31.46 | +7: iteration 102090/ 173500 | consumed samples: 26135040 | consumed tokens: 53524561920 | elapsed time per iteration (s): 0.42 | learning rate: 8.645E-05 | global batch size: 256 | lm loss: 2.932716E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.708 | TFLOPs: 31.68 | +7: iteration 102100/ 173500 | consumed samples: 26137600 | consumed tokens: 53529804800 | elapsed time per iteration (s): 0.42 | learning rate: 8.644E-05 | global batch size: 256 | lm loss: 2.923434E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.322 | TFLOPs: 32.08 | +7: iteration 102110/ 173500 | consumed samples: 26140160 | consumed tokens: 53535047680 | elapsed time per iteration (s): 0.42 | learning rate: 8.642E-05 | global batch size: 256 | lm loss: 2.915471E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.065 | TFLOPs: 31.80 | +7: iteration 102120/ 173500 | consumed samples: 26142720 | consumed tokens: 53540290560 | elapsed time per iteration (s): 0.43 | learning rate: 8.641E-05 | global batch size: 256 | lm loss: 2.923363E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.817 | TFLOPs: 31.16 | +7: iteration 102130/ 173500 | consumed samples: 26145280 | consumed tokens: 53545533440 | elapsed time per iteration (s): 0.43 | learning rate: 8.639E-05 | global batch size: 256 | lm loss: 2.907685E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.337 | TFLOPs: 31.45 | +7: iteration 102140/ 173500 | consumed samples: 26147840 | consumed tokens: 53550776320 | elapsed time per iteration (s): 0.44 | learning rate: 8.638E-05 | global batch size: 256 | lm loss: 2.907334E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.721 | TFLOPs: 30.68 | +7: iteration 102150/ 173500 | consumed samples: 26150400 | consumed tokens: 53556019200 | elapsed time per iteration (s): 0.43 | learning rate: 8.636E-05 | global batch size: 256 | lm loss: 2.919756E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.964 | TFLOPs: 31.43 | +7: iteration 102160/ 173500 | consumed samples: 26152960 | consumed tokens: 53561262080 | elapsed time per iteration (s): 0.43 | learning rate: 8.634E-05 | global batch size: 256 | lm loss: 2.920538E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.767 | TFLOPs: 31.31 | +7: iteration 102170/ 173500 | consumed samples: 26155520 | consumed tokens: 53566504960 | elapsed time per iteration (s): 0.43 | learning rate: 8.633E-05 | global batch size: 256 | lm loss: 2.910052E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.824 | TFLOPs: 31.42 | +7: iteration 102180/ 173500 | consumed samples: 26158080 | consumed tokens: 53571747840 | elapsed time per iteration (s): 0.42 | learning rate: 8.631E-05 | global batch size: 256 | lm loss: 2.905577E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.891 | TFLOPs: 31.63 | +7: iteration 102190/ 173500 | consumed samples: 26160640 | consumed tokens: 53576990720 | elapsed time per iteration (s): 0.42 | learning rate: 8.630E-05 | global batch size: 256 | lm loss: 2.918870E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.886 | TFLOPs: 31.63 | +7: iteration 102200/ 173500 | consumed samples: 26163200 | consumed tokens: 53582233600 | elapsed time per iteration (s): 0.43 | learning rate: 8.628E-05 | global batch size: 256 | lm loss: 2.924126E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.774 | TFLOPs: 31.26 | +7: iteration 102210/ 173500 | consumed samples: 26165760 | consumed tokens: 53587476480 | elapsed time per iteration (s): 0.43 | learning rate: 8.626E-05 | global batch size: 256 | lm loss: 2.911679E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.044 | TFLOPs: 31.43 | +7: iteration 102220/ 173500 | consumed samples: 26168320 | consumed tokens: 53592719360 | elapsed time per iteration (s): 0.42 | learning rate: 8.625E-05 | global batch size: 256 | lm loss: 2.907405E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.048 | TFLOPs: 31.90 | +7: iteration 102230/ 173500 | consumed samples: 26170880 | consumed tokens: 53597962240 | elapsed time per iteration (s): 0.44 | learning rate: 8.623E-05 | global batch size: 256 | lm loss: 2.918193E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.945 | TFLOPs: 30.85 | +7: iteration 102240/ 173500 | consumed samples: 26173440 | consumed tokens: 53603205120 | elapsed time per iteration (s): 0.44 | learning rate: 8.622E-05 | global batch size: 256 | lm loss: 2.922816E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.109 | TFLOPs: 30.59 | +7: iteration 102250/ 173500 | consumed samples: 26176000 | consumed tokens: 53608448000 | elapsed time per iteration (s): 0.43 | learning rate: 8.620E-05 | global batch size: 256 | lm loss: 2.914496E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.613 | TFLOPs: 31.36 | +7: iteration 102260/ 173500 | consumed samples: 26178560 | consumed tokens: 53613690880 | elapsed time per iteration (s): 0.42 | learning rate: 8.618E-05 | global batch size: 256 | lm loss: 2.921915E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.617 | TFLOPs: 31.99 | +7: iteration 102270/ 173500 | consumed samples: 26181120 | consumed tokens: 53618933760 | elapsed time per iteration (s): 0.43 | learning rate: 8.617E-05 | global batch size: 256 | lm loss: 2.915731E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.153 | TFLOPs: 31.17 | +7: iteration 102280/ 173500 | consumed samples: 26183680 | consumed tokens: 53624176640 | elapsed time per iteration (s): 0.42 | learning rate: 8.615E-05 | global batch size: 256 | lm loss: 2.913161E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.766 | TFLOPs: 31.89 | +7: iteration 102290/ 173500 | consumed samples: 26186240 | consumed tokens: 53629419520 | elapsed time per iteration (s): 0.43 | learning rate: 8.614E-05 | global batch size: 256 | lm loss: 2.916495E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.048 | TFLOPs: 31.22 | +7: iteration 102300/ 173500 | consumed samples: 26188800 | consumed tokens: 53634662400 | elapsed time per iteration (s): 0.43 | learning rate: 8.612E-05 | global batch size: 256 | lm loss: 2.909367E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.242 | TFLOPs: 31.28 | +7: iteration 102310/ 173500 | consumed samples: 26191360 | consumed tokens: 53639905280 | elapsed time per iteration (s): 0.43 | learning rate: 8.611E-05 | global batch size: 256 | lm loss: 2.907922E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.967 | TFLOPs: 31.48 | +7: iteration 102320/ 173500 | consumed samples: 26193920 | consumed tokens: 53645148160 | elapsed time per iteration (s): 0.42 | learning rate: 8.609E-05 | global batch size: 256 | lm loss: 2.928851E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.735 | TFLOPs: 31.89 | +7: iteration 102330/ 173500 | consumed samples: 26196480 | consumed tokens: 53650391040 | elapsed time per iteration (s): 0.42 | learning rate: 8.607E-05 | global batch size: 256 | lm loss: 2.920210E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.974 | TFLOPs: 31.79 | +7: iteration 102340/ 173500 | consumed samples: 26199040 | consumed tokens: 53655633920 | elapsed time per iteration (s): 0.42 | learning rate: 8.606E-05 | global batch size: 256 | lm loss: 2.921920E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.608 | TFLOPs: 31.72 | +7: iteration 102350/ 173500 | consumed samples: 26201600 | consumed tokens: 53660876800 | elapsed time per iteration (s): 0.42 | learning rate: 8.604E-05 | global batch size: 256 | lm loss: 2.918016E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.147 | TFLOPs: 31.70 | +7: iteration 102360/ 173500 | consumed samples: 26204160 | consumed tokens: 53666119680 | elapsed time per iteration (s): 0.43 | learning rate: 8.603E-05 | global batch size: 256 | lm loss: 2.927291E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.215 | TFLOPs: 31.60 | +7: iteration 102370/ 173500 | consumed samples: 26206720 | consumed tokens: 53671362560 | elapsed time per iteration (s): 0.42 | learning rate: 8.601E-05 | global batch size: 256 | lm loss: 2.922958E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.113 | TFLOPs: 31.91 | +7: iteration 102380/ 173500 | consumed samples: 26209280 | consumed tokens: 53676605440 | elapsed time per iteration (s): 0.43 | learning rate: 8.599E-05 | global batch size: 256 | lm loss: 2.916004E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.032 | TFLOPs: 31.17 | +7: iteration 102390/ 173500 | consumed samples: 26211840 | consumed tokens: 53681848320 | elapsed time per iteration (s): 0.43 | learning rate: 8.598E-05 | global batch size: 256 | lm loss: 2.917896E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.454 | TFLOPs: 31.56 | +7: iteration 102400/ 173500 | consumed samples: 26214400 | consumed tokens: 53687091200 | elapsed time per iteration (s): 0.43 | learning rate: 8.596E-05 | global batch size: 256 | lm loss: 2.928748E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.737 | TFLOPs: 31.57 | +7: iteration 102410/ 173500 | consumed samples: 26216960 | consumed tokens: 53692334080 | elapsed time per iteration (s): 0.43 | learning rate: 8.595E-05 | global batch size: 256 | lm loss: 2.919821E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.635 | TFLOPs: 31.46 | +7: iteration 102420/ 173500 | consumed samples: 26219520 | consumed tokens: 53697576960 | elapsed time per iteration (s): 0.43 | learning rate: 8.593E-05 | global batch size: 256 | lm loss: 2.921280E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.961 | TFLOPs: 31.01 | +7: iteration 102430/ 173500 | consumed samples: 26222080 | consumed tokens: 53702819840 | elapsed time per iteration (s): 0.43 | learning rate: 8.591E-05 | global batch size: 256 | lm loss: 2.930305E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.016 | TFLOPs: 30.90 | +7: iteration 102440/ 173500 | consumed samples: 26224640 | consumed tokens: 53708062720 | elapsed time per iteration (s): 0.43 | learning rate: 8.590E-05 | global batch size: 256 | lm loss: 2.915155E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.070 | TFLOPs: 31.33 | +7: iteration 102450/ 173500 | consumed samples: 26227200 | consumed tokens: 53713305600 | elapsed time per iteration (s): 0.43 | learning rate: 8.588E-05 | global batch size: 256 | lm loss: 2.920190E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.494 | TFLOPs: 31.56 | +7: iteration 102460/ 173500 | consumed samples: 26229760 | consumed tokens: 53718548480 | elapsed time per iteration (s): 0.43 | learning rate: 8.587E-05 | global batch size: 256 | lm loss: 2.918826E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.919 | TFLOPs: 31.53 | +7: iteration 102470/ 173500 | consumed samples: 26232320 | consumed tokens: 53723791360 | elapsed time per iteration (s): 0.42 | learning rate: 8.585E-05 | global batch size: 256 | lm loss: 2.914241E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.503 | TFLOPs: 31.61 | +7: iteration 102480/ 173500 | consumed samples: 26234880 | consumed tokens: 53729034240 | elapsed time per iteration (s): 0.42 | learning rate: 8.584E-05 | global batch size: 256 | lm loss: 2.914865E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.675 | TFLOPs: 31.73 | +7: iteration 102490/ 173500 | consumed samples: 26237440 | consumed tokens: 53734277120 | elapsed time per iteration (s): 0.43 | learning rate: 8.582E-05 | global batch size: 256 | lm loss: 2.912225E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.814 | TFLOPs: 31.58 | +7: iteration 102500/ 173500 | consumed samples: 26240000 | consumed tokens: 53739520000 | elapsed time per iteration (s): 0.43 | learning rate: 8.580E-05 | global batch size: 256 | lm loss: 2.931941E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.862 | TFLOPs: 31.16 | +7: iteration 102510/ 173500 | consumed samples: 26242560 | consumed tokens: 53744762880 | elapsed time per iteration (s): 0.43 | learning rate: 8.579E-05 | global batch size: 256 | lm loss: 2.930636E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.319 | TFLOPs: 31.18 | +7: iteration 102520/ 173500 | consumed samples: 26245120 | consumed tokens: 53750005760 | elapsed time per iteration (s): 0.42 | learning rate: 8.577E-05 | global batch size: 256 | lm loss: 2.921369E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.227 | TFLOPs: 31.65 | +7: iteration 102530/ 173500 | consumed samples: 26247680 | consumed tokens: 53755248640 | elapsed time per iteration (s): 0.42 | learning rate: 8.576E-05 | global batch size: 256 | lm loss: 2.911749E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.781 | TFLOPs: 31.84 | +7: iteration 102540/ 173500 | consumed samples: 26250240 | consumed tokens: 53760491520 | elapsed time per iteration (s): 0.42 | learning rate: 8.574E-05 | global batch size: 256 | lm loss: 2.912769E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.474 | TFLOPs: 31.93 | +7: iteration 102550/ 173500 | consumed samples: 26252800 | consumed tokens: 53765734400 | elapsed time per iteration (s): 0.42 | learning rate: 8.572E-05 | global batch size: 256 | lm loss: 2.925016E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.869 | TFLOPs: 31.63 | +7: iteration 102560/ 173500 | consumed samples: 26255360 | consumed tokens: 53770977280 | elapsed time per iteration (s): 0.43 | learning rate: 8.571E-05 | global batch size: 256 | lm loss: 2.920869E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.938 | TFLOPs: 31.06 | +7: iteration 102570/ 173500 | consumed samples: 26257920 | consumed tokens: 53776220160 | elapsed time per iteration (s): 0.43 | learning rate: 8.569E-05 | global batch size: 256 | lm loss: 2.909979E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.518 | TFLOPs: 31.35 | +7: iteration 102580/ 173500 | consumed samples: 26260480 | consumed tokens: 53781463040 | elapsed time per iteration (s): 0.42 | learning rate: 8.568E-05 | global batch size: 256 | lm loss: 2.907871E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.064 | TFLOPs: 31.75 | +7: iteration 102590/ 173500 | consumed samples: 26263040 | consumed tokens: 53786705920 | elapsed time per iteration (s): 0.43 | learning rate: 8.566E-05 | global batch size: 256 | lm loss: 2.916582E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.213 | TFLOPs: 31.60 | +7: iteration 102600/ 173500 | consumed samples: 26265600 | consumed tokens: 53791948800 | elapsed time per iteration (s): 0.42 | learning rate: 8.565E-05 | global batch size: 256 | lm loss: 2.918409E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.451 | TFLOPs: 31.61 | +7: iteration 102610/ 173500 | consumed samples: 26268160 | consumed tokens: 53797191680 | elapsed time per iteration (s): 0.43 | learning rate: 8.563E-05 | global batch size: 256 | lm loss: 2.920233E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.939 | TFLOPs: 31.48 | +7: iteration 102620/ 173500 | consumed samples: 26270720 | consumed tokens: 53802434560 | elapsed time per iteration (s): 0.43 | learning rate: 8.561E-05 | global batch size: 256 | lm loss: 2.921153E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.080 | TFLOPs: 30.96 | +7: iteration 102630/ 173500 | consumed samples: 26273280 | consumed tokens: 53807677440 | elapsed time per iteration (s): 0.43 | learning rate: 8.560E-05 | global batch size: 256 | lm loss: 2.898728E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.722 | TFLOPs: 31.41 | +7: iteration 102640/ 173500 | consumed samples: 26275840 | consumed tokens: 53812920320 | elapsed time per iteration (s): 0.42 | learning rate: 8.558E-05 | global batch size: 256 | lm loss: 2.921931E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.800 | TFLOPs: 31.79 | +7: iteration 102650/ 173500 | consumed samples: 26278400 | consumed tokens: 53818163200 | elapsed time per iteration (s): 0.42 | learning rate: 8.557E-05 | global batch size: 256 | lm loss: 2.923697E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.831 | TFLOPs: 31.79 | +7: iteration 102660/ 173500 | consumed samples: 26280960 | consumed tokens: 53823406080 | elapsed time per iteration (s): 0.42 | learning rate: 8.555E-05 | global batch size: 256 | lm loss: 2.913962E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.351 | TFLOPs: 31.76 | +7: iteration 102670/ 173500 | consumed samples: 26283520 | consumed tokens: 53828648960 | elapsed time per iteration (s): 0.43 | learning rate: 8.553E-05 | global batch size: 256 | lm loss: 2.891637E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.821 | TFLOPs: 31.21 | +7: iteration 102680/ 173500 | consumed samples: 26286080 | consumed tokens: 53833891840 | elapsed time per iteration (s): 0.43 | learning rate: 8.552E-05 | global batch size: 256 | lm loss: 2.915635E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.284 | TFLOPs: 31.55 | +7: iteration 102690/ 173500 | consumed samples: 26288640 | consumed tokens: 53839134720 | elapsed time per iteration (s): 0.42 | learning rate: 8.550E-05 | global batch size: 256 | lm loss: 2.917926E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.919 | TFLOPs: 31.74 | +7: iteration 102700/ 173500 | consumed samples: 26291200 | consumed tokens: 53844377600 | elapsed time per iteration (s): 0.44 | learning rate: 8.549E-05 | global batch size: 256 | lm loss: 2.921758E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.593 | TFLOPs: 30.20 | +7: iteration 102710/ 173500 | consumed samples: 26293760 | consumed tokens: 53849620480 | elapsed time per iteration (s): 0.43 | learning rate: 8.547E-05 | global batch size: 256 | lm loss: 2.902444E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.620 | TFLOPs: 31.09 | +7: iteration 102720/ 173500 | consumed samples: 26296320 | consumed tokens: 53854863360 | elapsed time per iteration (s): 0.43 | learning rate: 8.546E-05 | global batch size: 256 | lm loss: 2.924742E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.411 | TFLOPs: 31.08 | +7: iteration 102730/ 173500 | consumed samples: 26298880 | consumed tokens: 53860106240 | elapsed time per iteration (s): 0.43 | learning rate: 8.544E-05 | global batch size: 256 | lm loss: 2.921998E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.581 | TFLOPs: 31.56 | +7: iteration 102740/ 173500 | consumed samples: 26301440 | consumed tokens: 53865349120 | elapsed time per iteration (s): 0.44 | learning rate: 8.542E-05 | global batch size: 256 | lm loss: 2.911238E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.334 | TFLOPs: 30.87 | +7: iteration 102750/ 173500 | consumed samples: 26304000 | consumed tokens: 53870592000 | elapsed time per iteration (s): 0.43 | learning rate: 8.541E-05 | global batch size: 256 | lm loss: 2.923232E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.711 | TFLOPs: 31.36 | +7: iteration 102760/ 173500 | consumed samples: 26306560 | consumed tokens: 53875834880 | elapsed time per iteration (s): 0.43 | learning rate: 8.539E-05 | global batch size: 256 | lm loss: 2.916725E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.277 | TFLOPs: 31.29 | +7: iteration 102770/ 173500 | consumed samples: 26309120 | consumed tokens: 53881077760 | elapsed time per iteration (s): 0.43 | learning rate: 8.538E-05 | global batch size: 256 | lm loss: 2.914591E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.109 | TFLOPs: 31.07 | +7: iteration 102780/ 173500 | consumed samples: 26311680 | consumed tokens: 53886320640 | elapsed time per iteration (s): 0.43 | learning rate: 8.536E-05 | global batch size: 256 | lm loss: 2.925449E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.493 | TFLOPs: 31.40 | +7: iteration 102790/ 173500 | consumed samples: 26314240 | consumed tokens: 53891563520 | elapsed time per iteration (s): 0.42 | learning rate: 8.534E-05 | global batch size: 256 | lm loss: 2.924204E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.834 | TFLOPs: 31.79 | +7: iteration 102800/ 173500 | consumed samples: 26316800 | consumed tokens: 53896806400 | elapsed time per iteration (s): 0.44 | learning rate: 8.533E-05 | global batch size: 256 | lm loss: 2.904670E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.250 | TFLOPs: 30.76 | +7: iteration 102810/ 173500 | consumed samples: 26319360 | consumed tokens: 53902049280 | elapsed time per iteration (s): 0.44 | learning rate: 8.531E-05 | global batch size: 256 | lm loss: 2.926318E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.517 | TFLOPs: 30.67 | +7: iteration 102820/ 173500 | consumed samples: 26321920 | consumed tokens: 53907292160 | elapsed time per iteration (s): 0.43 | learning rate: 8.530E-05 | global batch size: 256 | lm loss: 2.925715E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.124 | TFLOPs: 31.28 | +7: iteration 102830/ 173500 | consumed samples: 26324480 | consumed tokens: 53912535040 | elapsed time per iteration (s): 0.42 | learning rate: 8.528E-05 | global batch size: 256 | lm loss: 2.927378E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.040 | TFLOPs: 31.64 | +7: iteration 102840/ 173500 | consumed samples: 26327040 | consumed tokens: 53917777920 | elapsed time per iteration (s): 0.44 | learning rate: 8.527E-05 | global batch size: 256 | lm loss: 2.915784E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.713 | TFLOPs: 30.26 | +7: iteration 102850/ 173500 | consumed samples: 26329600 | consumed tokens: 53923020800 | elapsed time per iteration (s): 0.45 | learning rate: 8.525E-05 | global batch size: 256 | lm loss: 2.931852E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.195 | TFLOPs: 30.13 | +7: iteration 102860/ 173500 | consumed samples: 26332160 | consumed tokens: 53928263680 | elapsed time per iteration (s): 0.43 | learning rate: 8.523E-05 | global batch size: 256 | lm loss: 2.904694E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.103 | TFLOPs: 31.28 | +7: iteration 102870/ 173500 | consumed samples: 26334720 | consumed tokens: 53933506560 | elapsed time per iteration (s): 0.43 | learning rate: 8.522E-05 | global batch size: 256 | lm loss: 2.923789E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.364 | TFLOPs: 31.08 | +7: iteration 102880/ 173500 | consumed samples: 26337280 | consumed tokens: 53938749440 | elapsed time per iteration (s): 0.43 | learning rate: 8.520E-05 | global batch size: 256 | lm loss: 2.917992E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.537 | TFLOPs: 31.51 | +7: iteration 102890/ 173500 | consumed samples: 26339840 | consumed tokens: 53943992320 | elapsed time per iteration (s): 0.43 | learning rate: 8.519E-05 | global batch size: 256 | lm loss: 2.910976E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.104 | TFLOPs: 31.54 | +7: iteration 102900/ 173500 | consumed samples: 26342400 | consumed tokens: 53949235200 | elapsed time per iteration (s): 0.43 | learning rate: 8.517E-05 | global batch size: 256 | lm loss: 2.931737E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.400 | TFLOPs: 31.24 | +7: iteration 102910/ 173500 | consumed samples: 26344960 | consumed tokens: 53954478080 | elapsed time per iteration (s): 0.43 | learning rate: 8.515E-05 | global batch size: 256 | lm loss: 2.917211E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.392 | TFLOPs: 31.55 | +7: iteration 102920/ 173500 | consumed samples: 26347520 | consumed tokens: 53959720960 | elapsed time per iteration (s): 0.43 | learning rate: 8.514E-05 | global batch size: 256 | lm loss: 2.908556E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.602 | TFLOPs: 31.41 | +7: iteration 102930/ 173500 | consumed samples: 26350080 | consumed tokens: 53964963840 | elapsed time per iteration (s): 0.43 | learning rate: 8.512E-05 | global batch size: 256 | lm loss: 2.928458E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.310 | TFLOPs: 31.60 | +7: iteration 102940/ 173500 | consumed samples: 26352640 | consumed tokens: 53970206720 | elapsed time per iteration (s): 0.43 | learning rate: 8.511E-05 | global batch size: 256 | lm loss: 2.903711E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.242 | TFLOPs: 31.49 | +7: iteration 102950/ 173500 | consumed samples: 26355200 | consumed tokens: 53975449600 | elapsed time per iteration (s): 0.42 | learning rate: 8.509E-05 | global batch size: 256 | lm loss: 2.911180E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.081 | TFLOPs: 31.64 | +7: iteration 102960/ 173500 | consumed samples: 26357760 | consumed tokens: 53980692480 | elapsed time per iteration (s): 0.43 | learning rate: 8.508E-05 | global batch size: 256 | lm loss: 2.915060E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.175 | TFLOPs: 31.07 | +7: iteration 102970/ 173500 | consumed samples: 26360320 | consumed tokens: 53985935360 | elapsed time per iteration (s): 0.42 | learning rate: 8.506E-05 | global batch size: 256 | lm loss: 2.932643E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.237 | TFLOPs: 31.70 | +7: iteration 102980/ 173500 | consumed samples: 26362880 | consumed tokens: 53991178240 | elapsed time per iteration (s): 0.42 | learning rate: 8.504E-05 | global batch size: 256 | lm loss: 2.908727E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.122 | TFLOPs: 31.75 | +7: iteration 102990/ 173500 | consumed samples: 26365440 | consumed tokens: 53996421120 | elapsed time per iteration (s): 0.42 | learning rate: 8.503E-05 | global batch size: 256 | lm loss: 2.900754E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.964 | TFLOPs: 31.69 | +7: iteration 103000/ 173500 | consumed samples: 26368000 | consumed tokens: 54001664000 | elapsed time per iteration (s): 0.44 | learning rate: 8.501E-05 | global batch size: 256 | lm loss: 2.919744E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.939 | TFLOPs: 30.85 | +7: iteration 103010/ 173500 | consumed samples: 26370560 | consumed tokens: 54006906880 | elapsed time per iteration (s): 0.43 | learning rate: 8.500E-05 | global batch size: 256 | lm loss: 2.915245E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.548 | TFLOPs: 31.46 | +7: iteration 103020/ 173500 | consumed samples: 26373120 | consumed tokens: 54012149760 | elapsed time per iteration (s): 0.43 | learning rate: 8.498E-05 | global batch size: 256 | lm loss: 2.919690E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.015 | TFLOPs: 31.01 | +7: iteration 103030/ 173500 | consumed samples: 26375680 | consumed tokens: 54017392640 | elapsed time per iteration (s): 0.42 | learning rate: 8.496E-05 | global batch size: 256 | lm loss: 2.901992E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.716 | TFLOPs: 31.73 | +7: iteration 103040/ 173500 | consumed samples: 26378240 | consumed tokens: 54022635520 | elapsed time per iteration (s): 0.43 | learning rate: 8.495E-05 | global batch size: 256 | lm loss: 2.911044E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.030 | TFLOPs: 31.48 | +7: iteration 103050/ 173500 | consumed samples: 26380800 | consumed tokens: 54027878400 | elapsed time per iteration (s): 0.43 | learning rate: 8.493E-05 | global batch size: 256 | lm loss: 2.914781E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.100 | TFLOPs: 31.28 | +7: iteration 103060/ 173500 | consumed samples: 26383360 | consumed tokens: 54033121280 | elapsed time per iteration (s): 0.44 | learning rate: 8.492E-05 | global batch size: 256 | lm loss: 2.914087E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.653 | TFLOPs: 30.36 | +7: iteration 103070/ 173500 | consumed samples: 26385920 | consumed tokens: 54038364160 | elapsed time per iteration (s): 0.43 | learning rate: 8.490E-05 | global batch size: 256 | lm loss: 2.923865E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.981 | TFLOPs: 31.22 | +7: iteration 103080/ 173500 | consumed samples: 26388480 | consumed tokens: 54043607040 | elapsed time per iteration (s): 0.44 | learning rate: 8.489E-05 | global batch size: 256 | lm loss: 2.909477E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.326 | TFLOPs: 30.71 | +7: iteration 103090/ 173500 | consumed samples: 26391040 | consumed tokens: 54048849920 | elapsed time per iteration (s): 0.43 | learning rate: 8.487E-05 | global batch size: 256 | lm loss: 2.915942E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.922 | TFLOPs: 31.53 | +7: iteration 103100/ 173500 | consumed samples: 26393600 | consumed tokens: 54054092800 | elapsed time per iteration (s): 0.43 | learning rate: 8.485E-05 | global batch size: 256 | lm loss: 2.911581E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.760 | TFLOPs: 31.10 | +7: iteration 103110/ 173500 | consumed samples: 26396160 | consumed tokens: 54059335680 | elapsed time per iteration (s): 0.43 | learning rate: 8.484E-05 | global batch size: 256 | lm loss: 2.923426E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.590 | TFLOPs: 31.35 | +7: iteration 103120/ 173500 | consumed samples: 26398720 | consumed tokens: 54064578560 | elapsed time per iteration (s): 0.42 | learning rate: 8.482E-05 | global batch size: 256 | lm loss: 2.923647E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.129 | TFLOPs: 31.75 | +7: iteration 103130/ 173500 | consumed samples: 26401280 | consumed tokens: 54069821440 | elapsed time per iteration (s): 0.43 | learning rate: 8.481E-05 | global batch size: 256 | lm loss: 2.911912E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.813 | TFLOPs: 31.58 | +7: iteration 103140/ 173500 | consumed samples: 26403840 | consumed tokens: 54075064320 | elapsed time per iteration (s): 0.42 | learning rate: 8.479E-05 | global batch size: 256 | lm loss: 2.925111E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.031 | TFLOPs: 31.69 | +7: iteration 103150/ 173500 | consumed samples: 26406400 | consumed tokens: 54080307200 | elapsed time per iteration (s): 0.43 | learning rate: 8.477E-05 | global batch size: 256 | lm loss: 2.906255E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.517 | TFLOPs: 31.40 | +7: iteration 103160/ 173500 | consumed samples: 26408960 | consumed tokens: 54085550080 | elapsed time per iteration (s): 0.42 | learning rate: 8.476E-05 | global batch size: 256 | lm loss: 2.925002E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.795 | TFLOPs: 31.89 | +7: iteration 103170/ 173500 | consumed samples: 26411520 | consumed tokens: 54090792960 | elapsed time per iteration (s): 0.43 | learning rate: 8.474E-05 | global batch size: 256 | lm loss: 2.905769E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.564 | TFLOPs: 30.99 | +7: iteration 103180/ 173500 | consumed samples: 26414080 | consumed tokens: 54096035840 | elapsed time per iteration (s): 0.43 | learning rate: 8.473E-05 | global batch size: 256 | lm loss: 2.923556E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.498 | TFLOPs: 31.09 | +7: iteration 103190/ 173500 | consumed samples: 26416640 | consumed tokens: 54101278720 | elapsed time per iteration (s): 0.43 | learning rate: 8.471E-05 | global batch size: 256 | lm loss: 2.926558E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.137 | TFLOPs: 31.59 | +7: iteration 103200/ 173500 | consumed samples: 26419200 | consumed tokens: 54106521600 | elapsed time per iteration (s): 0.43 | learning rate: 8.470E-05 | global batch size: 256 | lm loss: 2.933144E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.876 | TFLOPs: 31.47 | +7: iteration 103210/ 173500 | consumed samples: 26421760 | consumed tokens: 54111764480 | elapsed time per iteration (s): 0.42 | learning rate: 8.468E-05 | global batch size: 256 | lm loss: 2.917322E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.232 | TFLOPs: 31.65 | +7: iteration 103220/ 173500 | consumed samples: 26424320 | consumed tokens: 54117007360 | elapsed time per iteration (s): 0.43 | learning rate: 8.466E-05 | global batch size: 256 | lm loss: 2.919617E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.842 | TFLOPs: 31.42 | +7: iteration 103230/ 173500 | consumed samples: 26426880 | consumed tokens: 54122250240 | elapsed time per iteration (s): 0.42 | learning rate: 8.465E-05 | global batch size: 256 | lm loss: 2.901094E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.435 | TFLOPs: 31.71 | +7: iteration 103240/ 173500 | consumed samples: 26429440 | consumed tokens: 54127493120 | elapsed time per iteration (s): 0.43 | learning rate: 8.463E-05 | global batch size: 256 | lm loss: 2.914437E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.232 | TFLOPs: 31.39 | +7: iteration 103250/ 173500 | consumed samples: 26432000 | consumed tokens: 54132736000 | elapsed time per iteration (s): 0.43 | learning rate: 8.462E-05 | global batch size: 256 | lm loss: 2.924004E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.112 | TFLOPs: 31.22 | +7: iteration 103260/ 173500 | consumed samples: 26434560 | consumed tokens: 54137978880 | elapsed time per iteration (s): 0.43 | learning rate: 8.460E-05 | global batch size: 256 | lm loss: 2.900608E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.271 | TFLOPs: 31.60 | +7: iteration 103270/ 173500 | consumed samples: 26437120 | consumed tokens: 54143221760 | elapsed time per iteration (s): 0.43 | learning rate: 8.459E-05 | global batch size: 256 | lm loss: 2.926186E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.537 | TFLOPs: 31.19 | +7: iteration 103280/ 173500 | consumed samples: 26439680 | consumed tokens: 54148464640 | elapsed time per iteration (s): 0.42 | learning rate: 8.457E-05 | global batch size: 256 | lm loss: 2.920579E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.957 | TFLOPs: 31.69 | +7: iteration 103290/ 173500 | consumed samples: 26442240 | consumed tokens: 54153707520 | elapsed time per iteration (s): 0.43 | learning rate: 8.455E-05 | global batch size: 256 | lm loss: 2.909528E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.372 | TFLOPs: 31.19 | +7: iteration 103300/ 173500 | consumed samples: 26444800 | consumed tokens: 54158950400 | elapsed time per iteration (s): 0.43 | learning rate: 8.454E-05 | global batch size: 256 | lm loss: 2.912725E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.310 | TFLOPs: 31.34 | +7: iteration 103310/ 173500 | consumed samples: 26447360 | consumed tokens: 54164193280 | elapsed time per iteration (s): 0.43 | learning rate: 8.452E-05 | global batch size: 256 | lm loss: 2.906476E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.751 | TFLOPs: 30.89 | +7: iteration 103320/ 173500 | consumed samples: 26449920 | consumed tokens: 54169436160 | elapsed time per iteration (s): 0.43 | learning rate: 8.451E-05 | global batch size: 256 | lm loss: 2.923597E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.514 | TFLOPs: 31.40 | +7: iteration 103330/ 173500 | consumed samples: 26452480 | consumed tokens: 54174679040 | elapsed time per iteration (s): 0.43 | learning rate: 8.449E-05 | global batch size: 256 | lm loss: 2.925190E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.748 | TFLOPs: 31.21 | +7: iteration 103340/ 173500 | consumed samples: 26455040 | consumed tokens: 54179921920 | elapsed time per iteration (s): 0.43 | learning rate: 8.447E-05 | global batch size: 256 | lm loss: 2.923961E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.482 | TFLOPs: 31.09 | +7: iteration 103350/ 173500 | consumed samples: 26457600 | consumed tokens: 54185164800 | elapsed time per iteration (s): 0.44 | learning rate: 8.446E-05 | global batch size: 256 | lm loss: 2.921872E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.106 | TFLOPs: 30.44 | +7: iteration 103360/ 173500 | consumed samples: 26460160 | consumed tokens: 54190407680 | elapsed time per iteration (s): 0.43 | learning rate: 8.444E-05 | global batch size: 256 | lm loss: 2.911592E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.883 | TFLOPs: 31.00 | +7: iteration 103370/ 173500 | consumed samples: 26462720 | consumed tokens: 54195650560 | elapsed time per iteration (s): 0.43 | learning rate: 8.443E-05 | global batch size: 256 | lm loss: 2.902891E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.668 | TFLOPs: 31.04 | +7: iteration 103380/ 173500 | consumed samples: 26465280 | consumed tokens: 54200893440 | elapsed time per iteration (s): 0.43 | learning rate: 8.441E-05 | global batch size: 256 | lm loss: 2.916879E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.367 | TFLOPs: 31.55 | +7: iteration 103390/ 173500 | consumed samples: 26467840 | consumed tokens: 54206136320 | elapsed time per iteration (s): 0.43 | learning rate: 8.440E-05 | global batch size: 256 | lm loss: 2.916467E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.461 | TFLOPs: 31.03 | +7: iteration 103400/ 173500 | consumed samples: 26470400 | consumed tokens: 54211379200 | elapsed time per iteration (s): 0.42 | learning rate: 8.438E-05 | global batch size: 256 | lm loss: 2.914777E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.699 | TFLOPs: 31.62 | +7: iteration 103410/ 173500 | consumed samples: 26472960 | consumed tokens: 54216622080 | elapsed time per iteration (s): 0.43 | learning rate: 8.436E-05 | global batch size: 256 | lm loss: 2.922865E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.512 | TFLOPs: 31.14 | +7: iteration 103420/ 173500 | consumed samples: 26475520 | consumed tokens: 54221864960 | elapsed time per iteration (s): 0.43 | learning rate: 8.435E-05 | global batch size: 256 | lm loss: 2.912894E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.605 | TFLOPs: 31.51 | +7: iteration 103430/ 173500 | consumed samples: 26478080 | consumed tokens: 54227107840 | elapsed time per iteration (s): 0.43 | learning rate: 8.433E-05 | global batch size: 256 | lm loss: 2.909045E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.714 | TFLOPs: 30.94 | +7: iteration 103440/ 173500 | consumed samples: 26480640 | consumed tokens: 54232350720 | elapsed time per iteration (s): 0.43 | learning rate: 8.432E-05 | global batch size: 256 | lm loss: 2.933541E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.300 | TFLOPs: 31.50 | +7: iteration 103450/ 173500 | consumed samples: 26483200 | consumed tokens: 54237593600 | elapsed time per iteration (s): 0.43 | learning rate: 8.430E-05 | global batch size: 256 | lm loss: 2.912414E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.839 | TFLOPs: 31.00 | +7: iteration 103460/ 173500 | consumed samples: 26485760 | consumed tokens: 54242836480 | elapsed time per iteration (s): 0.43 | learning rate: 8.429E-05 | global batch size: 256 | lm loss: 2.923248E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.623 | TFLOPs: 31.30 | +7: iteration 103470/ 173500 | consumed samples: 26488320 | consumed tokens: 54248079360 | elapsed time per iteration (s): 0.43 | learning rate: 8.427E-05 | global batch size: 256 | lm loss: 2.913611E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.488 | TFLOPs: 31.51 | +7: iteration 103480/ 173500 | consumed samples: 26490880 | consumed tokens: 54253322240 | elapsed time per iteration (s): 0.43 | learning rate: 8.425E-05 | global batch size: 256 | lm loss: 2.920636E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.678 | TFLOPs: 30.89 | +7: iteration 103490/ 173500 | consumed samples: 26493440 | consumed tokens: 54258565120 | elapsed time per iteration (s): 0.42 | learning rate: 8.424E-05 | global batch size: 256 | lm loss: 2.912054E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.088 | TFLOPs: 31.75 | +7: iteration 103500/ 173500 | consumed samples: 26496000 | consumed tokens: 54263808000 | elapsed time per iteration (s): 0.42 | learning rate: 8.422E-05 | global batch size: 256 | lm loss: 2.916201E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.562 | TFLOPs: 31.62 | +7: iteration 103510/ 173500 | consumed samples: 26498560 | consumed tokens: 54269050880 | elapsed time per iteration (s): 0.43 | learning rate: 8.421E-05 | global batch size: 256 | lm loss: 2.914380E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.644 | TFLOPs: 31.04 | +7: iteration 103520/ 173500 | consumed samples: 26501120 | consumed tokens: 54274293760 | elapsed time per iteration (s): 0.43 | learning rate: 8.419E-05 | global batch size: 256 | lm loss: 2.925646E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.902 | TFLOPs: 31.37 | +7: iteration 103530/ 173500 | consumed samples: 26503680 | consumed tokens: 54279536640 | elapsed time per iteration (s): 0.42 | learning rate: 8.418E-05 | global batch size: 256 | lm loss: 2.929312E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.392 | TFLOPs: 31.66 | +7: iteration 103540/ 173500 | consumed samples: 26506240 | consumed tokens: 54284779520 | elapsed time per iteration (s): 0.44 | learning rate: 8.416E-05 | global batch size: 256 | lm loss: 2.909003E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.318 | TFLOPs: 30.82 | +7: iteration 103550/ 173500 | consumed samples: 26508800 | consumed tokens: 54290022400 | elapsed time per iteration (s): 0.43 | learning rate: 8.414E-05 | global batch size: 256 | lm loss: 2.918169E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.669 | TFLOPs: 31.10 | +7: iteration 103560/ 173500 | consumed samples: 26511360 | consumed tokens: 54295265280 | elapsed time per iteration (s): 0.43 | learning rate: 8.413E-05 | global batch size: 256 | lm loss: 2.926686E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.160 | TFLOPs: 31.33 | +7: iteration 103570/ 173500 | consumed samples: 26513920 | consumed tokens: 54300508160 | elapsed time per iteration (s): 0.43 | learning rate: 8.411E-05 | global batch size: 256 | lm loss: 2.914390E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.272 | TFLOPs: 31.55 | +7: iteration 103580/ 173500 | consumed samples: 26516480 | consumed tokens: 54305751040 | elapsed time per iteration (s): 0.43 | learning rate: 8.410E-05 | global batch size: 256 | lm loss: 2.908680E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.007 | TFLOPs: 31.32 | +7: iteration 103590/ 173500 | consumed samples: 26519040 | consumed tokens: 54310993920 | elapsed time per iteration (s): 0.44 | learning rate: 8.408E-05 | global batch size: 256 | lm loss: 2.928527E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.477 | TFLOPs: 30.82 | +7: iteration 103600/ 173500 | consumed samples: 26521600 | consumed tokens: 54316236800 | elapsed time per iteration (s): 0.43 | learning rate: 8.406E-05 | global batch size: 256 | lm loss: 2.913831E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.936 | TFLOPs: 31.53 | +7: iteration 103610/ 173500 | consumed samples: 26524160 | consumed tokens: 54321479680 | elapsed time per iteration (s): 0.44 | learning rate: 8.405E-05 | global batch size: 256 | lm loss: 2.916622E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.560 | TFLOPs: 30.20 | +7: iteration 103620/ 173500 | consumed samples: 26526720 | consumed tokens: 54326722560 | elapsed time per iteration (s): 0.43 | learning rate: 8.403E-05 | global batch size: 256 | lm loss: 2.901315E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.279 | TFLOPs: 31.60 | +7: iteration 103630/ 173500 | consumed samples: 26529280 | consumed tokens: 54331965440 | elapsed time per iteration (s): 0.43 | learning rate: 8.402E-05 | global batch size: 256 | lm loss: 2.914540E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.589 | TFLOPs: 30.99 | +7: iteration 103640/ 173500 | consumed samples: 26531840 | consumed tokens: 54337208320 | elapsed time per iteration (s): 0.43 | learning rate: 8.400E-05 | global batch size: 256 | lm loss: 2.929659E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.518 | TFLOPs: 31.14 | +7: iteration 103650/ 173500 | consumed samples: 26534400 | consumed tokens: 54342451200 | elapsed time per iteration (s): 0.43 | learning rate: 8.399E-05 | global batch size: 256 | lm loss: 2.925354E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.726 | TFLOPs: 31.10 | +7: iteration 103660/ 173500 | consumed samples: 26536960 | consumed tokens: 54347694080 | elapsed time per iteration (s): 0.52 | learning rate: 8.397E-05 | global batch size: 256 | lm loss: 2.931111E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 488.022 | TFLOPs: 25.61 | +7: iteration 103670/ 173500 | consumed samples: 26539520 | consumed tokens: 54352936960 | elapsed time per iteration (s): 0.42 | learning rate: 8.395E-05 | global batch size: 256 | lm loss: 2.915843E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.492 | TFLOPs: 32.08 | +7: iteration 103680/ 173500 | consumed samples: 26542080 | consumed tokens: 54358179840 | elapsed time per iteration (s): 0.43 | learning rate: 8.394E-05 | global batch size: 256 | lm loss: 2.913092E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.201 | TFLOPs: 30.97 | +7: iteration 103690/ 173500 | consumed samples: 26544640 | consumed tokens: 54363422720 | elapsed time per iteration (s): 0.42 | learning rate: 8.392E-05 | global batch size: 256 | lm loss: 2.913553E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.965 | TFLOPs: 31.64 | +7: iteration 103700/ 173500 | consumed samples: 26547200 | consumed tokens: 54368665600 | elapsed time per iteration (s): 0.44 | learning rate: 8.391E-05 | global batch size: 256 | lm loss: 2.910923E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.033 | TFLOPs: 30.85 | +7: iteration 103710/ 173500 | consumed samples: 26549760 | consumed tokens: 54373908480 | elapsed time per iteration (s): 0.42 | learning rate: 8.389E-05 | global batch size: 256 | lm loss: 2.918674E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.470 | TFLOPs: 32.19 | +7: iteration 103720/ 173500 | consumed samples: 26552320 | consumed tokens: 54379151360 | elapsed time per iteration (s): 0.44 | learning rate: 8.388E-05 | global batch size: 256 | lm loss: 2.924793E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.885 | TFLOPs: 30.85 | +7: iteration 103730/ 173500 | consumed samples: 26554880 | consumed tokens: 54384394240 | elapsed time per iteration (s): 0.43 | learning rate: 8.386E-05 | global batch size: 256 | lm loss: 2.910196E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.740 | TFLOPs: 31.15 | +7: iteration 103740/ 173500 | consumed samples: 26557440 | consumed tokens: 54389637120 | elapsed time per iteration (s): 0.42 | learning rate: 8.384E-05 | global batch size: 256 | lm loss: 2.923083E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.775 | TFLOPs: 31.63 | +7: iteration 103750/ 173500 | consumed samples: 26560000 | consumed tokens: 54394880000 | elapsed time per iteration (s): 0.43 | learning rate: 8.383E-05 | global batch size: 256 | lm loss: 2.909520E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.684 | TFLOPs: 31.41 | +7: iteration 103760/ 173500 | consumed samples: 26562560 | consumed tokens: 54400122880 | elapsed time per iteration (s): 0.43 | learning rate: 8.381E-05 | global batch size: 256 | lm loss: 2.928237E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.092 | TFLOPs: 31.38 | +7: iteration 103770/ 173500 | consumed samples: 26565120 | consumed tokens: 54405365760 | elapsed time per iteration (s): 0.43 | learning rate: 8.380E-05 | global batch size: 256 | lm loss: 2.918407E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.015 | TFLOPs: 31.27 | +7: iteration 103780/ 173500 | consumed samples: 26567680 | consumed tokens: 54410608640 | elapsed time per iteration (s): 0.42 | learning rate: 8.378E-05 | global batch size: 256 | lm loss: 2.928871E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.119 | TFLOPs: 31.64 | +7: iteration 103790/ 173500 | consumed samples: 26570240 | consumed tokens: 54415851520 | elapsed time per iteration (s): 0.43 | learning rate: 8.377E-05 | global batch size: 256 | lm loss: 2.922039E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.733 | TFLOPs: 31.05 | +7: iteration 103800/ 173500 | consumed samples: 26572800 | consumed tokens: 54421094400 | elapsed time per iteration (s): 0.42 | learning rate: 8.375E-05 | global batch size: 256 | lm loss: 2.903431E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.301 | TFLOPs: 31.81 | +7: iteration 103810/ 173500 | consumed samples: 26575360 | consumed tokens: 54426337280 | elapsed time per iteration (s): 0.42 | learning rate: 8.373E-05 | global batch size: 256 | lm loss: 2.926579E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.325 | TFLOPs: 31.97 | +7: iteration 103820/ 173500 | consumed samples: 26577920 | consumed tokens: 54431580160 | elapsed time per iteration (s): 0.43 | learning rate: 8.372E-05 | global batch size: 256 | lm loss: 2.925539E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.381 | TFLOPs: 31.34 | +7: iteration 103830/ 173500 | consumed samples: 26580480 | consumed tokens: 54436823040 | elapsed time per iteration (s): 0.43 | learning rate: 8.370E-05 | global batch size: 256 | lm loss: 2.912824E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.213 | TFLOPs: 31.33 | +7: iteration 103840/ 173500 | consumed samples: 26583040 | consumed tokens: 54442065920 | elapsed time per iteration (s): 0.42 | learning rate: 8.369E-05 | global batch size: 256 | lm loss: 2.921494E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.179 | TFLOPs: 31.65 | +7: iteration 103850/ 173500 | consumed samples: 26585600 | consumed tokens: 54447308800 | elapsed time per iteration (s): 0.42 | learning rate: 8.367E-05 | global batch size: 256 | lm loss: 2.922527E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.383 | TFLOPs: 31.66 | +7: iteration 103860/ 173500 | consumed samples: 26588160 | consumed tokens: 54452551680 | elapsed time per iteration (s): 0.42 | learning rate: 8.366E-05 | global batch size: 256 | lm loss: 2.907703E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.974 | TFLOPs: 31.95 | +7: iteration 103870/ 173500 | consumed samples: 26590720 | consumed tokens: 54457794560 | elapsed time per iteration (s): 0.42 | learning rate: 8.364E-05 | global batch size: 256 | lm loss: 2.926719E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.986 | TFLOPs: 31.69 | +7: iteration 103880/ 173500 | consumed samples: 26593280 | consumed tokens: 54463037440 | elapsed time per iteration (s): 0.43 | learning rate: 8.362E-05 | global batch size: 256 | lm loss: 2.910662E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.732 | TFLOPs: 31.26 | +7: iteration 103890/ 173500 | consumed samples: 26595840 | consumed tokens: 54468280320 | elapsed time per iteration (s): 0.43 | learning rate: 8.361E-05 | global batch size: 256 | lm loss: 2.920452E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.076 | TFLOPs: 31.12 | +7: iteration 103900/ 173500 | consumed samples: 26598400 | consumed tokens: 54473523200 | elapsed time per iteration (s): 0.42 | learning rate: 8.359E-05 | global batch size: 256 | lm loss: 2.903856E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.365 | TFLOPs: 31.71 | +7: iteration 103910/ 173500 | consumed samples: 26600960 | consumed tokens: 54478766080 | elapsed time per iteration (s): 0.42 | learning rate: 8.358E-05 | global batch size: 256 | lm loss: 2.914323E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.884 | TFLOPs: 31.74 | +7: iteration 103920/ 173500 | consumed samples: 26603520 | consumed tokens: 54484008960 | elapsed time per iteration (s): 0.43 | learning rate: 8.356E-05 | global batch size: 256 | lm loss: 2.912384E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.746 | TFLOPs: 31.57 | +7: iteration 103930/ 173500 | consumed samples: 26606080 | consumed tokens: 54489251840 | elapsed time per iteration (s): 0.42 | learning rate: 8.355E-05 | global batch size: 256 | lm loss: 2.924870E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.584 | TFLOPs: 31.83 | +7: iteration 103940/ 173500 | consumed samples: 26608640 | consumed tokens: 54494494720 | elapsed time per iteration (s): 0.42 | learning rate: 8.353E-05 | global batch size: 256 | lm loss: 2.920535E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.435 | TFLOPs: 31.61 | +7: iteration 103950/ 173500 | consumed samples: 26611200 | consumed tokens: 54499737600 | elapsed time per iteration (s): 0.43 | learning rate: 8.351E-05 | global batch size: 256 | lm loss: 2.917800E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.785 | TFLOPs: 31.57 | +7: iteration 103960/ 173500 | consumed samples: 26613760 | consumed tokens: 54504980480 | elapsed time per iteration (s): 0.44 | learning rate: 8.350E-05 | global batch size: 256 | lm loss: 2.917016E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.878 | TFLOPs: 30.79 | +7: iteration 103970/ 173500 | consumed samples: 26616320 | consumed tokens: 54510223360 | elapsed time per iteration (s): 0.43 | learning rate: 8.348E-05 | global batch size: 256 | lm loss: 2.913770E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.608 | TFLOPs: 31.20 | +7: iteration 103980/ 173500 | consumed samples: 26618880 | consumed tokens: 54515466240 | elapsed time per iteration (s): 0.43 | learning rate: 8.347E-05 | global batch size: 256 | lm loss: 2.922113E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.571 | TFLOPs: 30.88 | +7: iteration 103990/ 173500 | consumed samples: 26621440 | consumed tokens: 54520709120 | elapsed time per iteration (s): 0.43 | learning rate: 8.345E-05 | global batch size: 256 | lm loss: 2.914148E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.592 | TFLOPs: 31.46 | +0: [2023-03-17 11:31:52,451] [INFO] [logging.py:68:log_dist] [Rank 0] step=104000, skipped=0, lr=[8.343492337309329e-05, 8.343492337309329e-05, 8.343492337309329e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 104000/ 173500 | consumed samples: 26624000 | consumed tokens: 54525952000 | elapsed time per iteration (s): 0.43 | learning rate: 8.343E-05 | global batch size: 256 | lm loss: 2.914834E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.836 | TFLOPs: 31.32 | +0: steps: 104000 loss: 2.9060 iter time (s): 0.427 samples/sec: 599.730 +7: iteration 104010/ 173500 | consumed samples: 26626560 | consumed tokens: 54531194880 | elapsed time per iteration (s): 0.43 | learning rate: 8.342E-05 | global batch size: 256 | lm loss: 2.923227E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.426 | TFLOPs: 31.08 | +7: iteration 104020/ 173500 | consumed samples: 26629120 | consumed tokens: 54536437760 | elapsed time per iteration (s): 0.42 | learning rate: 8.340E-05 | global batch size: 256 | lm loss: 2.917950E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.651 | TFLOPs: 31.93 | +7: iteration 104030/ 173500 | consumed samples: 26631680 | consumed tokens: 54541680640 | elapsed time per iteration (s): 0.43 | learning rate: 8.339E-05 | global batch size: 256 | lm loss: 2.938995E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.194 | TFLOPs: 31.54 | +7: iteration 104040/ 173500 | consumed samples: 26634240 | consumed tokens: 54546923520 | elapsed time per iteration (s): 0.42 | learning rate: 8.337E-05 | global batch size: 256 | lm loss: 2.919190E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.226 | TFLOPs: 31.65 | +7: iteration 104050/ 173500 | consumed samples: 26636800 | consumed tokens: 54552166400 | elapsed time per iteration (s): 0.44 | learning rate: 8.336E-05 | global batch size: 256 | lm loss: 2.920421E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.912 | TFLOPs: 30.69 | +7: iteration 104060/ 173500 | consumed samples: 26639360 | consumed tokens: 54557409280 | elapsed time per iteration (s): 0.42 | learning rate: 8.334E-05 | global batch size: 256 | lm loss: 2.923653E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.957 | TFLOPs: 32.00 | +7: iteration 104070/ 173500 | consumed samples: 26641920 | consumed tokens: 54562652160 | elapsed time per iteration (s): 0.42 | learning rate: 8.332E-05 | global batch size: 256 | lm loss: 2.909598E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.374 | TFLOPs: 31.82 | +7: iteration 104080/ 173500 | consumed samples: 26644480 | consumed tokens: 54567895040 | elapsed time per iteration (s): 0.43 | learning rate: 8.331E-05 | global batch size: 256 | lm loss: 2.926936E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.634 | TFLOPs: 30.88 | +7: iteration 104090/ 173500 | consumed samples: 26647040 | consumed tokens: 54573137920 | elapsed time per iteration (s): 0.42 | learning rate: 8.329E-05 | global batch size: 256 | lm loss: 2.924172E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.929 | TFLOPs: 31.79 | +7: iteration 104100/ 173500 | consumed samples: 26649600 | consumed tokens: 54578380800 | elapsed time per iteration (s): 0.42 | learning rate: 8.328E-05 | global batch size: 256 | lm loss: 2.917996E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.275 | TFLOPs: 31.65 | +7: iteration 104110/ 173500 | consumed samples: 26652160 | consumed tokens: 54583623680 | elapsed time per iteration (s): 0.44 | learning rate: 8.326E-05 | global batch size: 256 | lm loss: 2.918687E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.436 | TFLOPs: 30.77 | +7: iteration 104120/ 173500 | consumed samples: 26654720 | consumed tokens: 54588866560 | elapsed time per iteration (s): 0.43 | learning rate: 8.325E-05 | global batch size: 256 | lm loss: 2.907112E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.177 | TFLOPs: 31.44 | +7: iteration 104130/ 173500 | consumed samples: 26657280 | consumed tokens: 54594109440 | elapsed time per iteration (s): 0.43 | learning rate: 8.323E-05 | global batch size: 256 | lm loss: 2.904156E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.962 | TFLOPs: 31.43 | +7: iteration 104140/ 173500 | consumed samples: 26659840 | consumed tokens: 54599352320 | elapsed time per iteration (s): 0.42 | learning rate: 8.321E-05 | global batch size: 256 | lm loss: 2.930124E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.880 | TFLOPs: 32.16 | +7: iteration 104150/ 173500 | consumed samples: 26662400 | consumed tokens: 54604595200 | elapsed time per iteration (s): 0.43 | learning rate: 8.320E-05 | global batch size: 256 | lm loss: 2.914360E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.402 | TFLOPs: 31.45 | +7: iteration 104160/ 173500 | consumed samples: 26664960 | consumed tokens: 54609838080 | elapsed time per iteration (s): 0.43 | learning rate: 8.318E-05 | global batch size: 256 | lm loss: 2.924497E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.989 | TFLOPs: 31.43 | +7: iteration 104170/ 173500 | consumed samples: 26667520 | consumed tokens: 54615080960 | elapsed time per iteration (s): 0.43 | learning rate: 8.317E-05 | global batch size: 256 | lm loss: 2.912737E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.038 | TFLOPs: 30.96 | +7: iteration 104180/ 173500 | consumed samples: 26670080 | consumed tokens: 54620323840 | elapsed time per iteration (s): 0.42 | learning rate: 8.315E-05 | global batch size: 256 | lm loss: 2.906408E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.430 | TFLOPs: 31.98 | +7: iteration 104190/ 173500 | consumed samples: 26672640 | consumed tokens: 54625566720 | elapsed time per iteration (s): 0.43 | learning rate: 8.314E-05 | global batch size: 256 | lm loss: 2.927518E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.210 | TFLOPs: 31.39 | +7: iteration 104200/ 173500 | consumed samples: 26675200 | consumed tokens: 54630809600 | elapsed time per iteration (s): 0.42 | learning rate: 8.312E-05 | global batch size: 256 | lm loss: 2.925180E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.577 | TFLOPs: 32.14 | +7: iteration 104210/ 173500 | consumed samples: 26677760 | consumed tokens: 54636052480 | elapsed time per iteration (s): 0.43 | learning rate: 8.310E-05 | global batch size: 256 | lm loss: 2.920059E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.283 | TFLOPs: 31.23 | +7: iteration 104220/ 173500 | consumed samples: 26680320 | consumed tokens: 54641295360 | elapsed time per iteration (s): 0.42 | learning rate: 8.309E-05 | global batch size: 256 | lm loss: 2.912520E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.065 | TFLOPs: 31.64 | +7: iteration 104230/ 173500 | consumed samples: 26682880 | consumed tokens: 54646538240 | elapsed time per iteration (s): 0.43 | learning rate: 8.307E-05 | global batch size: 256 | lm loss: 2.913621E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.583 | TFLOPs: 31.30 | +7: iteration 104240/ 173500 | consumed samples: 26685440 | consumed tokens: 54651781120 | elapsed time per iteration (s): 0.44 | learning rate: 8.306E-05 | global batch size: 256 | lm loss: 2.911921E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.163 | TFLOPs: 30.55 | +7: iteration 104250/ 173500 | consumed samples: 26688000 | consumed tokens: 54657024000 | elapsed time per iteration (s): 0.43 | learning rate: 8.304E-05 | global batch size: 256 | lm loss: 2.930991E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.306 | TFLOPs: 31.50 | +7: iteration 104260/ 173500 | consumed samples: 26690560 | consumed tokens: 54662266880 | elapsed time per iteration (s): 0.43 | learning rate: 8.303E-05 | global batch size: 256 | lm loss: 2.922001E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.562 | TFLOPs: 31.46 | +7: iteration 104270/ 173500 | consumed samples: 26693120 | consumed tokens: 54667509760 | elapsed time per iteration (s): 0.43 | learning rate: 8.301E-05 | global batch size: 256 | lm loss: 2.918881E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.943 | TFLOPs: 31.22 | +7: iteration 104280/ 173500 | consumed samples: 26695680 | consumed tokens: 54672752640 | elapsed time per iteration (s): 0.43 | learning rate: 8.299E-05 | global batch size: 256 | lm loss: 2.910134E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.674 | TFLOPs: 31.52 | +7: iteration 104290/ 173500 | consumed samples: 26698240 | consumed tokens: 54677995520 | elapsed time per iteration (s): 0.42 | learning rate: 8.298E-05 | global batch size: 256 | lm loss: 2.915182E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.891 | TFLOPs: 31.63 | +7: iteration 104300/ 173500 | consumed samples: 26700800 | consumed tokens: 54683238400 | elapsed time per iteration (s): 0.42 | learning rate: 8.296E-05 | global batch size: 256 | lm loss: 2.911382E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.832 | TFLOPs: 31.79 | +7: iteration 104310/ 173500 | consumed samples: 26703360 | consumed tokens: 54688481280 | elapsed time per iteration (s): 0.44 | learning rate: 8.295E-05 | global batch size: 256 | lm loss: 2.903260E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.065 | TFLOPs: 30.85 | +7: iteration 104320/ 173500 | consumed samples: 26705920 | consumed tokens: 54693724160 | elapsed time per iteration (s): 0.44 | learning rate: 8.293E-05 | global batch size: 256 | lm loss: 2.915183E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.290 | TFLOPs: 30.66 | +7: iteration 104330/ 173500 | consumed samples: 26708480 | consumed tokens: 54698967040 | elapsed time per iteration (s): 0.43 | learning rate: 8.292E-05 | global batch size: 256 | lm loss: 2.920308E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.063 | TFLOPs: 30.96 | +7: iteration 104340/ 173500 | consumed samples: 26711040 | consumed tokens: 54704209920 | elapsed time per iteration (s): 0.42 | learning rate: 8.290E-05 | global batch size: 256 | lm loss: 2.921704E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.912 | TFLOPs: 31.95 | +7: iteration 104350/ 173500 | consumed samples: 26713600 | consumed tokens: 54709452800 | elapsed time per iteration (s): 0.43 | learning rate: 8.289E-05 | global batch size: 256 | lm loss: 2.905725E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.596 | TFLOPs: 31.51 | +7: iteration 104360/ 173500 | consumed samples: 26716160 | consumed tokens: 54714695680 | elapsed time per iteration (s): 0.50 | learning rate: 8.287E-05 | global batch size: 256 | lm loss: 2.915422E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 512.998 | TFLOPs: 26.92 | +7: iteration 104370/ 173500 | consumed samples: 26718720 | consumed tokens: 54719938560 | elapsed time per iteration (s): 1.41 | learning rate: 8.285E-05 | global batch size: 256 | lm loss: 2.928355E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 182.008 | TFLOPs: 9.55 | +7: iteration 104380/ 173500 | consumed samples: 26721280 | consumed tokens: 54725181440 | elapsed time per iteration (s): 0.42 | learning rate: 8.284E-05 | global batch size: 256 | lm loss: 2.911255E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.743 | TFLOPs: 32.31 | +7: iteration 104390/ 173500 | consumed samples: 26723840 | consumed tokens: 54730424320 | elapsed time per iteration (s): 0.42 | learning rate: 8.282E-05 | global batch size: 256 | lm loss: 2.918710E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.396 | TFLOPs: 32.18 | +7: iteration 104400/ 173500 | consumed samples: 26726400 | consumed tokens: 54735667200 | elapsed time per iteration (s): 0.42 | learning rate: 8.281E-05 | global batch size: 256 | lm loss: 2.918508E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.004 | TFLOPs: 32.06 | +7: iteration 104410/ 173500 | consumed samples: 26728960 | consumed tokens: 54740910080 | elapsed time per iteration (s): 0.42 | learning rate: 8.279E-05 | global batch size: 256 | lm loss: 2.914413E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.326 | TFLOPs: 31.92 | +7: iteration 104420/ 173500 | consumed samples: 26731520 | consumed tokens: 54746152960 | elapsed time per iteration (s): 0.42 | learning rate: 8.278E-05 | global batch size: 256 | lm loss: 2.916271E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.302 | TFLOPs: 31.76 | +7: iteration 104430/ 173500 | consumed samples: 26734080 | consumed tokens: 54751395840 | elapsed time per iteration (s): 0.42 | learning rate: 8.276E-05 | global batch size: 256 | lm loss: 2.921009E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.606 | TFLOPs: 32.30 | +7: iteration 104440/ 173500 | consumed samples: 26736640 | consumed tokens: 54756638720 | elapsed time per iteration (s): 0.43 | learning rate: 8.274E-05 | global batch size: 256 | lm loss: 2.909445E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.941 | TFLOPs: 31.48 | +7: iteration 104450/ 173500 | consumed samples: 26739200 | consumed tokens: 54761881600 | elapsed time per iteration (s): 0.42 | learning rate: 8.273E-05 | global batch size: 256 | lm loss: 2.915988E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.544 | TFLOPs: 31.88 | +7: iteration 104460/ 173500 | consumed samples: 26741760 | consumed tokens: 54767124480 | elapsed time per iteration (s): 0.42 | learning rate: 8.271E-05 | global batch size: 256 | lm loss: 2.922768E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.075 | TFLOPs: 31.85 | +7: iteration 104470/ 173500 | consumed samples: 26744320 | consumed tokens: 54772367360 | elapsed time per iteration (s): 0.43 | learning rate: 8.270E-05 | global batch size: 256 | lm loss: 2.907395E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.196 | TFLOPs: 31.44 | +7: iteration 104480/ 173500 | consumed samples: 26746880 | consumed tokens: 54777610240 | elapsed time per iteration (s): 0.42 | learning rate: 8.268E-05 | global batch size: 256 | lm loss: 2.935307E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.580 | TFLOPs: 31.83 | +7: iteration 104490/ 173500 | consumed samples: 26749440 | consumed tokens: 54782853120 | elapsed time per iteration (s): 0.42 | learning rate: 8.267E-05 | global batch size: 256 | lm loss: 2.916027E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.812 | TFLOPs: 31.94 | +7: iteration 104500/ 173500 | consumed samples: 26752000 | consumed tokens: 54788096000 | elapsed time per iteration (s): 0.42 | learning rate: 8.265E-05 | global batch size: 256 | lm loss: 2.936556E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.696 | TFLOPs: 32.20 | +7: iteration 104510/ 173500 | consumed samples: 26754560 | consumed tokens: 54793338880 | elapsed time per iteration (s): 0.42 | learning rate: 8.263E-05 | global batch size: 256 | lm loss: 2.911832E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.913 | TFLOPs: 32.05 | +7: iteration 104520/ 173500 | consumed samples: 26757120 | consumed tokens: 54798581760 | elapsed time per iteration (s): 0.42 | learning rate: 8.262E-05 | global batch size: 256 | lm loss: 2.918977E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.630 | TFLOPs: 32.20 | +7: iteration 104530/ 173500 | consumed samples: 26759680 | consumed tokens: 54803824640 | elapsed time per iteration (s): 0.42 | learning rate: 8.260E-05 | global batch size: 256 | lm loss: 2.909895E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.157 | TFLOPs: 32.17 | +7: iteration 104540/ 173500 | consumed samples: 26762240 | consumed tokens: 54809067520 | elapsed time per iteration (s): 0.42 | learning rate: 8.259E-05 | global batch size: 256 | lm loss: 2.916002E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.031 | TFLOPs: 31.90 | +7: iteration 104550/ 173500 | consumed samples: 26764800 | consumed tokens: 54814310400 | elapsed time per iteration (s): 0.42 | learning rate: 8.257E-05 | global batch size: 256 | lm loss: 2.926797E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.341 | TFLOPs: 31.71 | +7: iteration 104560/ 173500 | consumed samples: 26767360 | consumed tokens: 54819553280 | elapsed time per iteration (s): 0.42 | learning rate: 8.256E-05 | global batch size: 256 | lm loss: 2.903574E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.836 | TFLOPs: 31.63 | +7: iteration 104570/ 173500 | consumed samples: 26769920 | consumed tokens: 54824796160 | elapsed time per iteration (s): 0.43 | learning rate: 8.254E-05 | global batch size: 256 | lm loss: 2.924593E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.321 | TFLOPs: 31.60 | +7: iteration 104580/ 173500 | consumed samples: 26772480 | consumed tokens: 54830039040 | elapsed time per iteration (s): 0.42 | learning rate: 8.252E-05 | global batch size: 256 | lm loss: 2.916455E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.298 | TFLOPs: 31.86 | +7: iteration 104590/ 173500 | consumed samples: 26775040 | consumed tokens: 54835281920 | elapsed time per iteration (s): 0.42 | learning rate: 8.251E-05 | global batch size: 256 | lm loss: 2.909367E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.964 | TFLOPs: 32.06 | +7: iteration 104600/ 173500 | consumed samples: 26777600 | consumed tokens: 54840524800 | elapsed time per iteration (s): 0.42 | learning rate: 8.249E-05 | global batch size: 256 | lm loss: 2.924209E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.060 | TFLOPs: 32.11 | +7: iteration 104610/ 173500 | consumed samples: 26780160 | consumed tokens: 54845767680 | elapsed time per iteration (s): 0.43 | learning rate: 8.248E-05 | global batch size: 256 | lm loss: 2.917103E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.153 | TFLOPs: 31.54 | +7: iteration 104620/ 173500 | consumed samples: 26782720 | consumed tokens: 54851010560 | elapsed time per iteration (s): 0.42 | learning rate: 8.246E-05 | global batch size: 256 | lm loss: 2.926103E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.987 | TFLOPs: 31.69 | +7: iteration 104630/ 173500 | consumed samples: 26785280 | consumed tokens: 54856253440 | elapsed time per iteration (s): 0.42 | learning rate: 8.245E-05 | global batch size: 256 | lm loss: 2.917163E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.565 | TFLOPs: 32.09 | +7: iteration 104640/ 173500 | consumed samples: 26787840 | consumed tokens: 54861496320 | elapsed time per iteration (s): 0.42 | learning rate: 8.243E-05 | global batch size: 256 | lm loss: 2.909750E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.250 | TFLOPs: 31.65 | +7: iteration 104650/ 173500 | consumed samples: 26790400 | consumed tokens: 54866739200 | elapsed time per iteration (s): 0.42 | learning rate: 8.241E-05 | global batch size: 256 | lm loss: 2.919981E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.620 | TFLOPs: 31.72 | +7: iteration 104660/ 173500 | consumed samples: 26792960 | consumed tokens: 54871982080 | elapsed time per iteration (s): 0.42 | learning rate: 8.240E-05 | global batch size: 256 | lm loss: 2.905615E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.981 | TFLOPs: 31.90 | +7: iteration 104670/ 173500 | consumed samples: 26795520 | consumed tokens: 54877224960 | elapsed time per iteration (s): 0.42 | learning rate: 8.238E-05 | global batch size: 256 | lm loss: 2.905895E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.703 | TFLOPs: 31.73 | +7: iteration 104680/ 173500 | consumed samples: 26798080 | consumed tokens: 54882467840 | elapsed time per iteration (s): 0.42 | learning rate: 8.237E-05 | global batch size: 256 | lm loss: 2.916060E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.239 | TFLOPs: 31.65 | +7: iteration 104690/ 173500 | consumed samples: 26800640 | consumed tokens: 54887710720 | elapsed time per iteration (s): 0.42 | learning rate: 8.235E-05 | global batch size: 256 | lm loss: 2.907994E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.332 | TFLOPs: 31.81 | +7: iteration 104700/ 173500 | consumed samples: 26803200 | consumed tokens: 54892953600 | elapsed time per iteration (s): 0.43 | learning rate: 8.234E-05 | global batch size: 256 | lm loss: 2.914880E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.502 | TFLOPs: 31.04 | +7: iteration 104710/ 173500 | consumed samples: 26805760 | consumed tokens: 54898196480 | elapsed time per iteration (s): 0.43 | learning rate: 8.232E-05 | global batch size: 256 | lm loss: 2.916196E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.276 | TFLOPs: 31.23 | +7: iteration 104720/ 173500 | consumed samples: 26808320 | consumed tokens: 54903439360 | elapsed time per iteration (s): 0.42 | learning rate: 8.230E-05 | global batch size: 256 | lm loss: 2.916034E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.355 | TFLOPs: 31.92 | +7: iteration 104730/ 173500 | consumed samples: 26810880 | consumed tokens: 54908682240 | elapsed time per iteration (s): 0.43 | learning rate: 8.229E-05 | global batch size: 256 | lm loss: 2.910092E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.361 | TFLOPs: 31.50 | +7: iteration 104740/ 173500 | consumed samples: 26813440 | consumed tokens: 54913925120 | elapsed time per iteration (s): 0.42 | learning rate: 8.227E-05 | global batch size: 256 | lm loss: 2.917165E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.610 | TFLOPs: 31.93 | +7: iteration 104750/ 173500 | consumed samples: 26816000 | consumed tokens: 54919168000 | elapsed time per iteration (s): 0.42 | learning rate: 8.226E-05 | global batch size: 256 | lm loss: 2.910571E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.317 | TFLOPs: 31.66 | +7: iteration 104760/ 173500 | consumed samples: 26818560 | consumed tokens: 54924410880 | elapsed time per iteration (s): 0.42 | learning rate: 8.224E-05 | global batch size: 256 | lm loss: 2.907694E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.959 | TFLOPs: 31.64 | +7: iteration 104770/ 173500 | consumed samples: 26821120 | consumed tokens: 54929653760 | elapsed time per iteration (s): 0.42 | learning rate: 8.223E-05 | global batch size: 256 | lm loss: 2.912465E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.947 | TFLOPs: 31.85 | +7: iteration 104780/ 173500 | consumed samples: 26823680 | consumed tokens: 54934896640 | elapsed time per iteration (s): 0.43 | learning rate: 8.221E-05 | global batch size: 256 | lm loss: 2.924732E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.960 | TFLOPs: 31.58 | +7: iteration 104790/ 173500 | consumed samples: 26826240 | consumed tokens: 54940139520 | elapsed time per iteration (s): 0.42 | learning rate: 8.220E-05 | global batch size: 256 | lm loss: 2.914237E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.485 | TFLOPs: 32.03 | +7: iteration 104800/ 173500 | consumed samples: 26828800 | consumed tokens: 54945382400 | elapsed time per iteration (s): 0.43 | learning rate: 8.218E-05 | global batch size: 256 | lm loss: 2.917917E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.398 | TFLOPs: 31.19 | +7: iteration 104810/ 173500 | consumed samples: 26831360 | consumed tokens: 54950625280 | elapsed time per iteration (s): 0.43 | learning rate: 8.216E-05 | global batch size: 256 | lm loss: 2.916108E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.325 | TFLOPs: 31.29 | +7: iteration 104820/ 173500 | consumed samples: 26833920 | consumed tokens: 54955868160 | elapsed time per iteration (s): 0.42 | learning rate: 8.215E-05 | global batch size: 256 | lm loss: 2.913741E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.157 | TFLOPs: 32.07 | +7: iteration 104830/ 173500 | consumed samples: 26836480 | consumed tokens: 54961111040 | elapsed time per iteration (s): 0.43 | learning rate: 8.213E-05 | global batch size: 256 | lm loss: 2.916093E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.484 | TFLOPs: 31.45 | +7: iteration 104840/ 173500 | consumed samples: 26839040 | consumed tokens: 54966353920 | elapsed time per iteration (s): 0.42 | learning rate: 8.212E-05 | global batch size: 256 | lm loss: 2.913088E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.766 | TFLOPs: 31.78 | +7: iteration 104850/ 173500 | consumed samples: 26841600 | consumed tokens: 54971596800 | elapsed time per iteration (s): 0.42 | learning rate: 8.210E-05 | global batch size: 256 | lm loss: 2.905827E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.947 | TFLOPs: 31.69 | +7: iteration 104860/ 173500 | consumed samples: 26844160 | consumed tokens: 54976839680 | elapsed time per iteration (s): 0.43 | learning rate: 8.209E-05 | global batch size: 256 | lm loss: 2.905724E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.622 | TFLOPs: 31.36 | +7: iteration 104870/ 173500 | consumed samples: 26846720 | consumed tokens: 54982082560 | elapsed time per iteration (s): 0.43 | learning rate: 8.207E-05 | global batch size: 256 | lm loss: 2.919996E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.176 | TFLOPs: 31.39 | +7: iteration 104880/ 173500 | consumed samples: 26849280 | consumed tokens: 54987325440 | elapsed time per iteration (s): 0.42 | learning rate: 8.205E-05 | global batch size: 256 | lm loss: 2.904776E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.396 | TFLOPs: 31.66 | +7: iteration 104890/ 173500 | consumed samples: 26851840 | consumed tokens: 54992568320 | elapsed time per iteration (s): 0.43 | learning rate: 8.204E-05 | global batch size: 256 | lm loss: 2.907964E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.367 | TFLOPs: 31.24 | +7: iteration 104900/ 173500 | consumed samples: 26854400 | consumed tokens: 54997811200 | elapsed time per iteration (s): 0.43 | learning rate: 8.202E-05 | global batch size: 256 | lm loss: 2.929531E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.304 | TFLOPs: 31.60 | +7: iteration 104910/ 173500 | consumed samples: 26856960 | consumed tokens: 55003054080 | elapsed time per iteration (s): 0.42 | learning rate: 8.201E-05 | global batch size: 256 | lm loss: 2.916755E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.501 | TFLOPs: 31.82 | +7: iteration 104920/ 173500 | consumed samples: 26859520 | consumed tokens: 55008296960 | elapsed time per iteration (s): 0.46 | learning rate: 8.199E-05 | global batch size: 256 | lm loss: 2.912832E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.346 | TFLOPs: 29.40 | +7: iteration 104930/ 173500 | consumed samples: 26862080 | consumed tokens: 55013539840 | elapsed time per iteration (s): 0.42 | learning rate: 8.198E-05 | global batch size: 256 | lm loss: 2.925023E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.062 | TFLOPs: 31.75 | +7: iteration 104940/ 173500 | consumed samples: 26864640 | consumed tokens: 55018782720 | elapsed time per iteration (s): 0.42 | learning rate: 8.196E-05 | global batch size: 256 | lm loss: 2.916309E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.588 | TFLOPs: 32.09 | +7: iteration 104950/ 173500 | consumed samples: 26867200 | consumed tokens: 55024025600 | elapsed time per iteration (s): 0.42 | learning rate: 8.194E-05 | global batch size: 256 | lm loss: 2.905096E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.673 | TFLOPs: 31.88 | +7: iteration 104960/ 173500 | consumed samples: 26869760 | consumed tokens: 55029268480 | elapsed time per iteration (s): 0.43 | learning rate: 8.193E-05 | global batch size: 256 | lm loss: 2.912912E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.929 | TFLOPs: 31.48 | +7: iteration 104970/ 173500 | consumed samples: 26872320 | consumed tokens: 55034511360 | elapsed time per iteration (s): 0.42 | learning rate: 8.191E-05 | global batch size: 256 | lm loss: 2.918188E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.776 | TFLOPs: 32.05 | +7: iteration 104980/ 173500 | consumed samples: 26874880 | consumed tokens: 55039754240 | elapsed time per iteration (s): 0.43 | learning rate: 8.190E-05 | global batch size: 256 | lm loss: 2.916778E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.102 | TFLOPs: 31.54 | +7: iteration 104990/ 173500 | consumed samples: 26877440 | consumed tokens: 55044997120 | elapsed time per iteration (s): 0.43 | learning rate: 8.188E-05 | global batch size: 256 | lm loss: 2.898860E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.502 | TFLOPs: 31.45 | +7: iteration 105000/ 173500 | consumed samples: 26880000 | consumed tokens: 55050240000 | elapsed time per iteration (s): 0.43 | learning rate: 8.187E-05 | global batch size: 256 | lm loss: 2.918188E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.959 | TFLOPs: 31.58 | +7: iteration 105010/ 173500 | consumed samples: 26882560 | consumed tokens: 55055482880 | elapsed time per iteration (s): 0.42 | learning rate: 8.185E-05 | global batch size: 256 | lm loss: 2.916076E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.368 | TFLOPs: 32.03 | +7: iteration 105020/ 173500 | consumed samples: 26885120 | consumed tokens: 55060725760 | elapsed time per iteration (s): 0.42 | learning rate: 8.184E-05 | global batch size: 256 | lm loss: 2.913847E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.271 | TFLOPs: 31.65 | +7: iteration 105030/ 173500 | consumed samples: 26887680 | consumed tokens: 55065968640 | elapsed time per iteration (s): 0.43 | learning rate: 8.182E-05 | global batch size: 256 | lm loss: 2.895915E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.731 | TFLOPs: 30.99 | +7: iteration 105040/ 173500 | consumed samples: 26890240 | consumed tokens: 55071211520 | elapsed time per iteration (s): 0.42 | learning rate: 8.180E-05 | global batch size: 256 | lm loss: 2.913067E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.946 | TFLOPs: 31.85 | +7: iteration 105050/ 173500 | consumed samples: 26892800 | consumed tokens: 55076454400 | elapsed time per iteration (s): 0.43 | learning rate: 8.179E-05 | global batch size: 256 | lm loss: 2.916313E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.889 | TFLOPs: 31.53 | +7: iteration 105060/ 173500 | consumed samples: 26895360 | consumed tokens: 55081697280 | elapsed time per iteration (s): 0.42 | learning rate: 8.177E-05 | global batch size: 256 | lm loss: 2.909562E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.724 | TFLOPs: 31.68 | +7: iteration 105070/ 173500 | consumed samples: 26897920 | consumed tokens: 55086940160 | elapsed time per iteration (s): 0.42 | learning rate: 8.176E-05 | global batch size: 256 | lm loss: 2.899662E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.832 | TFLOPs: 31.63 | +7: iteration 105080/ 173500 | consumed samples: 26900480 | consumed tokens: 55092183040 | elapsed time per iteration (s): 0.43 | learning rate: 8.174E-05 | global batch size: 256 | lm loss: 2.939302E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.224 | TFLOPs: 31.23 | +7: iteration 105090/ 173500 | consumed samples: 26903040 | consumed tokens: 55097425920 | elapsed time per iteration (s): 0.42 | learning rate: 8.173E-05 | global batch size: 256 | lm loss: 2.907541E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.850 | TFLOPs: 31.63 | +7: iteration 105100/ 173500 | consumed samples: 26905600 | consumed tokens: 55102668800 | elapsed time per iteration (s): 0.42 | learning rate: 8.171E-05 | global batch size: 256 | lm loss: 2.923239E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.469 | TFLOPs: 31.72 | +7: iteration 105110/ 173500 | consumed samples: 26908160 | consumed tokens: 55107911680 | elapsed time per iteration (s): 0.42 | learning rate: 8.169E-05 | global batch size: 256 | lm loss: 2.912933E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.923 | TFLOPs: 31.63 | +7: iteration 105120/ 173500 | consumed samples: 26910720 | consumed tokens: 55113154560 | elapsed time per iteration (s): 0.43 | learning rate: 8.168E-05 | global batch size: 256 | lm loss: 2.894213E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.100 | TFLOPs: 31.54 | +7: iteration 105130/ 173500 | consumed samples: 26913280 | consumed tokens: 55118397440 | elapsed time per iteration (s): 0.42 | learning rate: 8.166E-05 | global batch size: 256 | lm loss: 2.908875E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.324 | TFLOPs: 31.76 | +7: iteration 105140/ 173500 | consumed samples: 26915840 | consumed tokens: 55123640320 | elapsed time per iteration (s): 0.42 | learning rate: 8.165E-05 | global batch size: 256 | lm loss: 2.916990E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.329 | TFLOPs: 31.66 | +7: iteration 105150/ 173500 | consumed samples: 26918400 | consumed tokens: 55128883200 | elapsed time per iteration (s): 0.42 | learning rate: 8.163E-05 | global batch size: 256 | lm loss: 2.916574E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.872 | TFLOPs: 31.68 | +7: iteration 105160/ 173500 | consumed samples: 26920960 | consumed tokens: 55134126080 | elapsed time per iteration (s): 0.44 | learning rate: 8.162E-05 | global batch size: 256 | lm loss: 2.917438E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.612 | TFLOPs: 30.83 | +7: iteration 105170/ 173500 | consumed samples: 26923520 | consumed tokens: 55139368960 | elapsed time per iteration (s): 0.43 | learning rate: 8.160E-05 | global batch size: 256 | lm loss: 2.921699E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.799 | TFLOPs: 31.47 | +7: iteration 105180/ 173500 | consumed samples: 26926080 | consumed tokens: 55144611840 | elapsed time per iteration (s): 0.42 | learning rate: 8.159E-05 | global batch size: 256 | lm loss: 2.914428E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.294 | TFLOPs: 31.92 | +7: iteration 105190/ 173500 | consumed samples: 26928640 | consumed tokens: 55149854720 | elapsed time per iteration (s): 0.42 | learning rate: 8.157E-05 | global batch size: 256 | lm loss: 2.917041E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.525 | TFLOPs: 32.03 | +7: iteration 105200/ 173500 | consumed samples: 26931200 | consumed tokens: 55155097600 | elapsed time per iteration (s): 0.42 | learning rate: 8.155E-05 | global batch size: 256 | lm loss: 2.905039E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.900 | TFLOPs: 31.69 | +7: iteration 105210/ 173500 | consumed samples: 26933760 | consumed tokens: 55160340480 | elapsed time per iteration (s): 0.43 | learning rate: 8.154E-05 | global batch size: 256 | lm loss: 2.915420E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.177 | TFLOPs: 31.60 | +7: iteration 105220/ 173500 | consumed samples: 26936320 | consumed tokens: 55165583360 | elapsed time per iteration (s): 0.42 | learning rate: 8.152E-05 | global batch size: 256 | lm loss: 2.909439E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.364 | TFLOPs: 31.76 | +7: iteration 105230/ 173500 | consumed samples: 26938880 | consumed tokens: 55170826240 | elapsed time per iteration (s): 0.43 | learning rate: 8.151E-05 | global batch size: 256 | lm loss: 2.911654E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.982 | TFLOPs: 31.32 | +7: iteration 105240/ 173500 | consumed samples: 26941440 | consumed tokens: 55176069120 | elapsed time per iteration (s): 0.42 | learning rate: 8.149E-05 | global batch size: 256 | lm loss: 2.915310E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.842 | TFLOPs: 31.79 | +7: iteration 105250/ 173500 | consumed samples: 26944000 | consumed tokens: 55181312000 | elapsed time per iteration (s): 0.42 | learning rate: 8.148E-05 | global batch size: 256 | lm loss: 2.911702E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.121 | TFLOPs: 32.01 | +7: iteration 105260/ 173500 | consumed samples: 26946560 | consumed tokens: 55186554880 | elapsed time per iteration (s): 0.42 | learning rate: 8.146E-05 | global batch size: 256 | lm loss: 2.921680E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.961 | TFLOPs: 31.64 | +7: iteration 105270/ 173500 | consumed samples: 26949120 | consumed tokens: 55191797760 | elapsed time per iteration (s): 0.42 | learning rate: 8.144E-05 | global batch size: 256 | lm loss: 2.924282E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.741 | TFLOPs: 31.99 | +7: iteration 105280/ 173500 | consumed samples: 26951680 | consumed tokens: 55197040640 | elapsed time per iteration (s): 0.43 | learning rate: 8.143E-05 | global batch size: 256 | lm loss: 2.913242E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.090 | TFLOPs: 31.54 | +7: iteration 105290/ 173500 | consumed samples: 26954240 | consumed tokens: 55202283520 | elapsed time per iteration (s): 0.43 | learning rate: 8.141E-05 | global batch size: 256 | lm loss: 2.917647E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.934 | TFLOPs: 31.27 | +7: iteration 105300/ 173500 | consumed samples: 26956800 | consumed tokens: 55207526400 | elapsed time per iteration (s): 0.43 | learning rate: 8.140E-05 | global batch size: 256 | lm loss: 2.901720E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.152 | TFLOPs: 31.28 | +7: iteration 105310/ 173500 | consumed samples: 26959360 | consumed tokens: 55212769280 | elapsed time per iteration (s): 0.42 | learning rate: 8.138E-05 | global batch size: 256 | lm loss: 2.895881E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.448 | TFLOPs: 32.03 | +7: iteration 105320/ 173500 | consumed samples: 26961920 | consumed tokens: 55218012160 | elapsed time per iteration (s): 0.42 | learning rate: 8.137E-05 | global batch size: 256 | lm loss: 2.921274E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.968 | TFLOPs: 31.95 | +7: iteration 105330/ 173500 | consumed samples: 26964480 | consumed tokens: 55223255040 | elapsed time per iteration (s): 0.44 | learning rate: 8.135E-05 | global batch size: 256 | lm loss: 2.922889E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.095 | TFLOPs: 30.70 | +7: iteration 105340/ 173500 | consumed samples: 26967040 | consumed tokens: 55228497920 | elapsed time per iteration (s): 0.42 | learning rate: 8.134E-05 | global batch size: 256 | lm loss: 2.905985E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.816 | TFLOPs: 31.79 | +7: iteration 105350/ 173500 | consumed samples: 26969600 | consumed tokens: 55233740800 | elapsed time per iteration (s): 0.42 | learning rate: 8.132E-05 | global batch size: 256 | lm loss: 2.907655E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.025 | TFLOPs: 31.69 | +7: iteration 105360/ 173500 | consumed samples: 26972160 | consumed tokens: 55238983680 | elapsed time per iteration (s): 0.42 | learning rate: 8.130E-05 | global batch size: 256 | lm loss: 2.916748E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.816 | TFLOPs: 32.00 | +7: iteration 105370/ 173500 | consumed samples: 26974720 | consumed tokens: 55244226560 | elapsed time per iteration (s): 0.42 | learning rate: 8.129E-05 | global batch size: 256 | lm loss: 2.913747E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.738 | TFLOPs: 31.99 | +7: iteration 105380/ 173500 | consumed samples: 26977280 | consumed tokens: 55249469440 | elapsed time per iteration (s): 0.42 | learning rate: 8.127E-05 | global batch size: 256 | lm loss: 2.916401E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.469 | TFLOPs: 31.98 | +7: iteration 105390/ 173500 | consumed samples: 26979840 | consumed tokens: 55254712320 | elapsed time per iteration (s): 0.44 | learning rate: 8.126E-05 | global batch size: 256 | lm loss: 2.914637E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.213 | TFLOPs: 30.81 | +7: iteration 105400/ 173500 | consumed samples: 26982400 | consumed tokens: 55259955200 | elapsed time per iteration (s): 0.42 | learning rate: 8.124E-05 | global batch size: 256 | lm loss: 2.925969E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.264 | TFLOPs: 31.81 | +7: iteration 105410/ 173500 | consumed samples: 26984960 | consumed tokens: 55265198080 | elapsed time per iteration (s): 0.42 | learning rate: 8.123E-05 | global batch size: 256 | lm loss: 2.921221E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.679 | TFLOPs: 31.99 | +7: iteration 105420/ 173500 | consumed samples: 26987520 | consumed tokens: 55270440960 | elapsed time per iteration (s): 0.42 | learning rate: 8.121E-05 | global batch size: 256 | lm loss: 2.906179E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.891 | TFLOPs: 31.84 | +7: iteration 105430/ 173500 | consumed samples: 26990080 | consumed tokens: 55275683840 | elapsed time per iteration (s): 0.42 | learning rate: 8.120E-05 | global batch size: 256 | lm loss: 2.906933E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.347 | TFLOPs: 31.97 | +7: iteration 105440/ 173500 | consumed samples: 26992640 | consumed tokens: 55280926720 | elapsed time per iteration (s): 0.45 | learning rate: 8.118E-05 | global batch size: 256 | lm loss: 2.920426E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.325 | TFLOPs: 30.08 | +7: iteration 105450/ 173500 | consumed samples: 26995200 | consumed tokens: 55286169600 | elapsed time per iteration (s): 0.45 | learning rate: 8.116E-05 | global batch size: 256 | lm loss: 2.904714E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.472 | TFLOPs: 29.72 | +7: iteration 105460/ 173500 | consumed samples: 26997760 | consumed tokens: 55291412480 | elapsed time per iteration (s): 0.42 | learning rate: 8.115E-05 | global batch size: 256 | lm loss: 2.898976E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.807 | TFLOPs: 31.68 | +7: iteration 105470/ 173500 | consumed samples: 27000320 | consumed tokens: 55296655360 | elapsed time per iteration (s): 0.42 | learning rate: 8.113E-05 | global batch size: 256 | lm loss: 2.917422E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.649 | TFLOPs: 32.04 | +7: iteration 105480/ 173500 | consumed samples: 27002880 | consumed tokens: 55301898240 | elapsed time per iteration (s): 0.42 | learning rate: 8.112E-05 | global batch size: 256 | lm loss: 2.913388E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.898 | TFLOPs: 32.00 | +7: iteration 105490/ 173500 | consumed samples: 27005440 | consumed tokens: 55307141120 | elapsed time per iteration (s): 0.42 | learning rate: 8.110E-05 | global batch size: 256 | lm loss: 2.907636E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.057 | TFLOPs: 32.01 | +7: iteration 105500/ 173500 | consumed samples: 27008000 | consumed tokens: 55312384000 | elapsed time per iteration (s): 0.42 | learning rate: 8.109E-05 | global batch size: 256 | lm loss: 2.912305E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.315 | TFLOPs: 31.97 | +7: iteration 105510/ 173500 | consumed samples: 27010560 | consumed tokens: 55317626880 | elapsed time per iteration (s): 0.42 | learning rate: 8.107E-05 | global batch size: 256 | lm loss: 2.915038E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.339 | TFLOPs: 31.66 | +7: iteration 105520/ 173500 | consumed samples: 27013120 | consumed tokens: 55322869760 | elapsed time per iteration (s): 0.43 | learning rate: 8.105E-05 | global batch size: 256 | lm loss: 2.917886E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.594 | TFLOPs: 31.04 | +7: iteration 105530/ 173500 | consumed samples: 27015680 | consumed tokens: 55328112640 | elapsed time per iteration (s): 0.42 | learning rate: 8.104E-05 | global batch size: 256 | lm loss: 2.902973E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.659 | TFLOPs: 31.62 | +7: iteration 105540/ 173500 | consumed samples: 27018240 | consumed tokens: 55333355520 | elapsed time per iteration (s): 0.42 | learning rate: 8.102E-05 | global batch size: 256 | lm loss: 2.930260E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.587 | TFLOPs: 32.04 | +7: iteration 105550/ 173500 | consumed samples: 27020800 | consumed tokens: 55338598400 | elapsed time per iteration (s): 0.44 | learning rate: 8.101E-05 | global batch size: 256 | lm loss: 2.918436E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.608 | TFLOPs: 30.41 | +7: iteration 105560/ 173500 | consumed samples: 27023360 | consumed tokens: 55343841280 | elapsed time per iteration (s): 0.42 | learning rate: 8.099E-05 | global batch size: 256 | lm loss: 2.908838E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.652 | TFLOPs: 32.04 | +7: iteration 105570/ 173500 | consumed samples: 27025920 | consumed tokens: 55349084160 | elapsed time per iteration (s): 0.42 | learning rate: 8.098E-05 | global batch size: 256 | lm loss: 2.921311E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.496 | TFLOPs: 31.87 | +7: iteration 105580/ 173500 | consumed samples: 27028480 | consumed tokens: 55354327040 | elapsed time per iteration (s): 0.42 | learning rate: 8.096E-05 | global batch size: 256 | lm loss: 2.906602E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.398 | TFLOPs: 31.87 | +7: iteration 105590/ 173500 | consumed samples: 27031040 | consumed tokens: 55359569920 | elapsed time per iteration (s): 0.42 | learning rate: 8.095E-05 | global batch size: 256 | lm loss: 2.909334E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.453 | TFLOPs: 31.61 | +7: iteration 105600/ 173500 | consumed samples: 27033600 | consumed tokens: 55364812800 | elapsed time per iteration (s): 0.42 | learning rate: 8.093E-05 | global batch size: 256 | lm loss: 2.915716E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.213 | TFLOPs: 31.75 | +7: iteration 105610/ 173500 | consumed samples: 27036160 | consumed tokens: 55370055680 | elapsed time per iteration (s): 0.42 | learning rate: 8.091E-05 | global batch size: 256 | lm loss: 2.913475E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.892 | TFLOPs: 32.00 | +7: iteration 105620/ 173500 | consumed samples: 27038720 | consumed tokens: 55375298560 | elapsed time per iteration (s): 0.43 | learning rate: 8.090E-05 | global batch size: 256 | lm loss: 2.916880E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.273 | TFLOPs: 31.44 | +7: iteration 105630/ 173500 | consumed samples: 27041280 | consumed tokens: 55380541440 | elapsed time per iteration (s): 0.42 | learning rate: 8.088E-05 | global batch size: 256 | lm loss: 2.915653E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.040 | TFLOPs: 31.75 | +7: iteration 105640/ 173500 | consumed samples: 27043840 | consumed tokens: 55385784320 | elapsed time per iteration (s): 0.45 | learning rate: 8.087E-05 | global batch size: 256 | lm loss: 2.907645E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.394 | TFLOPs: 30.14 | +7: iteration 105650/ 173500 | consumed samples: 27046400 | consumed tokens: 55391027200 | elapsed time per iteration (s): 0.42 | learning rate: 8.085E-05 | global batch size: 256 | lm loss: 2.916196E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.119 | TFLOPs: 31.80 | +7: iteration 105660/ 173500 | consumed samples: 27048960 | consumed tokens: 55396270080 | elapsed time per iteration (s): 0.42 | learning rate: 8.084E-05 | global batch size: 256 | lm loss: 2.912516E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.988 | TFLOPs: 31.80 | +7: iteration 105670/ 173500 | consumed samples: 27051520 | consumed tokens: 55401512960 | elapsed time per iteration (s): 0.43 | learning rate: 8.082E-05 | global batch size: 256 | lm loss: 2.908958E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.873 | TFLOPs: 31.00 | +7: iteration 105680/ 173500 | consumed samples: 27054080 | consumed tokens: 55406755840 | elapsed time per iteration (s): 0.42 | learning rate: 8.081E-05 | global batch size: 256 | lm loss: 2.911678E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.568 | TFLOPs: 32.04 | +7: iteration 105690/ 173500 | consumed samples: 27056640 | consumed tokens: 55411998720 | elapsed time per iteration (s): 0.44 | learning rate: 8.079E-05 | global batch size: 256 | lm loss: 2.918346E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.708 | TFLOPs: 30.84 | +7: iteration 105700/ 173500 | consumed samples: 27059200 | consumed tokens: 55417241600 | elapsed time per iteration (s): 0.45 | learning rate: 8.077E-05 | global batch size: 256 | lm loss: 2.928602E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.009 | TFLOPs: 30.06 | +7: iteration 105710/ 173500 | consumed samples: 27061760 | consumed tokens: 55422484480 | elapsed time per iteration (s): 0.53 | learning rate: 8.076E-05 | global batch size: 256 | lm loss: 2.910405E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 483.476 | TFLOPs: 25.37 | +7: iteration 105720/ 173500 | consumed samples: 27064320 | consumed tokens: 55427727360 | elapsed time per iteration (s): 0.46 | learning rate: 8.074E-05 | global batch size: 256 | lm loss: 2.918253E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.859 | TFLOPs: 29.01 | +7: iteration 105730/ 173500 | consumed samples: 27066880 | consumed tokens: 55432970240 | elapsed time per iteration (s): 0.46 | learning rate: 8.073E-05 | global batch size: 256 | lm loss: 2.913959E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.098 | TFLOPs: 29.44 | +7: iteration 105740/ 173500 | consumed samples: 27069440 | consumed tokens: 55438213120 | elapsed time per iteration (s): 0.44 | learning rate: 8.071E-05 | global batch size: 256 | lm loss: 2.916713E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.125 | TFLOPs: 30.75 | +7: iteration 105750/ 173500 | consumed samples: 27072000 | consumed tokens: 55443456000 | elapsed time per iteration (s): 0.54 | learning rate: 8.070E-05 | global batch size: 256 | lm loss: 2.914747E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 471.980 | TFLOPs: 24.76 | +7: iteration 105760/ 173500 | consumed samples: 27074560 | consumed tokens: 55448698880 | elapsed time per iteration (s): 0.42 | learning rate: 8.068E-05 | global batch size: 256 | lm loss: 2.908199E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.220 | TFLOPs: 32.28 | +7: iteration 105770/ 173500 | consumed samples: 27077120 | consumed tokens: 55453941760 | elapsed time per iteration (s): 0.45 | learning rate: 8.067E-05 | global batch size: 256 | lm loss: 2.929798E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.992 | TFLOPs: 30.06 | +7: iteration 105780/ 173500 | consumed samples: 27079680 | consumed tokens: 55459184640 | elapsed time per iteration (s): 0.45 | learning rate: 8.065E-05 | global batch size: 256 | lm loss: 2.913182E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.603 | TFLOPs: 30.04 | +7: iteration 105790/ 173500 | consumed samples: 27082240 | consumed tokens: 55464427520 | elapsed time per iteration (s): 0.42 | learning rate: 8.063E-05 | global batch size: 256 | lm loss: 2.902203E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.786 | TFLOPs: 32.20 | +7: iteration 105800/ 173500 | consumed samples: 27084800 | consumed tokens: 55469670400 | elapsed time per iteration (s): 0.45 | learning rate: 8.062E-05 | global batch size: 256 | lm loss: 2.914853E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.387 | TFLOPs: 29.93 | +7: iteration 105810/ 173500 | consumed samples: 27087360 | consumed tokens: 55474913280 | elapsed time per iteration (s): 0.49 | learning rate: 8.060E-05 | global batch size: 256 | lm loss: 2.897224E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 524.503 | TFLOPs: 27.52 | +7: iteration 105820/ 173500 | consumed samples: 27089920 | consumed tokens: 55480156160 | elapsed time per iteration (s): 0.44 | learning rate: 8.059E-05 | global batch size: 256 | lm loss: 2.907812E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.545 | TFLOPs: 30.30 | +7: iteration 105830/ 173500 | consumed samples: 27092480 | consumed tokens: 55485399040 | elapsed time per iteration (s): 0.47 | learning rate: 8.057E-05 | global batch size: 256 | lm loss: 2.910891E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 543.400 | TFLOPs: 28.51 | +7: iteration 105840/ 173500 | consumed samples: 27095040 | consumed tokens: 55490641920 | elapsed time per iteration (s): 0.43 | learning rate: 8.056E-05 | global batch size: 256 | lm loss: 2.894462E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.786 | TFLOPs: 31.05 | +7: iteration 105850/ 173500 | consumed samples: 27097600 | consumed tokens: 55495884800 | elapsed time per iteration (s): 0.45 | learning rate: 8.054E-05 | global batch size: 256 | lm loss: 2.929097E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.005 | TFLOPs: 29.96 | +7: iteration 105860/ 173500 | consumed samples: 27100160 | consumed tokens: 55501127680 | elapsed time per iteration (s): 0.46 | learning rate: 8.053E-05 | global batch size: 256 | lm loss: 2.899255E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.115 | TFLOPs: 29.07 | +7: iteration 105870/ 173500 | consumed samples: 27102720 | consumed tokens: 55506370560 | elapsed time per iteration (s): 0.46 | learning rate: 8.051E-05 | global batch size: 256 | lm loss: 2.921960E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.250 | TFLOPs: 29.40 | +7: iteration 105880/ 173500 | consumed samples: 27105280 | consumed tokens: 55511613440 | elapsed time per iteration (s): 0.46 | learning rate: 8.049E-05 | global batch size: 256 | lm loss: 2.903832E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.881 | TFLOPs: 29.11 | +7: iteration 105890/ 173500 | consumed samples: 27107840 | consumed tokens: 55516856320 | elapsed time per iteration (s): 0.42 | learning rate: 8.048E-05 | global batch size: 256 | lm loss: 2.909355E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.175 | TFLOPs: 31.81 | +7: iteration 105900/ 173500 | consumed samples: 27110400 | consumed tokens: 55522099200 | elapsed time per iteration (s): 0.42 | learning rate: 8.046E-05 | global batch size: 256 | lm loss: 2.918692E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.212 | TFLOPs: 31.91 | +7: iteration 105910/ 173500 | consumed samples: 27112960 | consumed tokens: 55527342080 | elapsed time per iteration (s): 0.42 | learning rate: 8.045E-05 | global batch size: 256 | lm loss: 2.917131E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.145 | TFLOPs: 31.86 | +7: iteration 105920/ 173500 | consumed samples: 27115520 | consumed tokens: 55532584960 | elapsed time per iteration (s): 0.42 | learning rate: 8.043E-05 | global batch size: 256 | lm loss: 2.913722E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.554 | TFLOPs: 31.88 | +7: iteration 105930/ 173500 | consumed samples: 27118080 | consumed tokens: 55537827840 | elapsed time per iteration (s): 0.42 | learning rate: 8.042E-05 | global batch size: 256 | lm loss: 2.916383E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.737 | TFLOPs: 31.89 | +7: iteration 105940/ 173500 | consumed samples: 27120640 | consumed tokens: 55543070720 | elapsed time per iteration (s): 0.43 | learning rate: 8.040E-05 | global batch size: 256 | lm loss: 2.911852E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.006 | TFLOPs: 30.90 | +7: iteration 105950/ 173500 | consumed samples: 27123200 | consumed tokens: 55548313600 | elapsed time per iteration (s): 0.42 | learning rate: 8.039E-05 | global batch size: 256 | lm loss: 2.931080E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.055 | TFLOPs: 31.75 | +7: iteration 105960/ 173500 | consumed samples: 27125760 | consumed tokens: 55553556480 | elapsed time per iteration (s): 0.42 | learning rate: 8.037E-05 | global batch size: 256 | lm loss: 2.913919E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.253 | TFLOPs: 31.76 | +7: iteration 105970/ 173500 | consumed samples: 27128320 | consumed tokens: 55558799360 | elapsed time per iteration (s): 0.43 | learning rate: 8.035E-05 | global batch size: 256 | lm loss: 2.914643E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.705 | TFLOPs: 31.10 | +7: iteration 105980/ 173500 | consumed samples: 27130880 | consumed tokens: 55564042240 | elapsed time per iteration (s): 0.42 | learning rate: 8.034E-05 | global batch size: 256 | lm loss: 2.907151E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.815 | TFLOPs: 32.10 | +7: iteration 105990/ 173500 | consumed samples: 27133440 | consumed tokens: 55569285120 | elapsed time per iteration (s): 0.42 | learning rate: 8.032E-05 | global batch size: 256 | lm loss: 2.906033E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.483 | TFLOPs: 32.08 | +0: [2023-03-17 11:46:19,400] [INFO] [logging.py:68:log_dist] [Rank 0] step=106000, skipped=0, lr=[8.030787777917086e-05, 8.030787777917086e-05, 8.030787777917086e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 106000/ 173500 | consumed samples: 27136000 | consumed tokens: 55574528000 | elapsed time per iteration (s): 0.42 | learning rate: 8.031E-05 | global batch size: 256 | lm loss: 2.899326E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.526 | TFLOPs: 31.88 | +0: steps: 106000 loss: 2.9278 iter time (s): 0.432 samples/sec: 593.043 +7: iteration 106010/ 173500 | consumed samples: 27138560 | consumed tokens: 55579770880 | elapsed time per iteration (s): 0.43 | learning rate: 8.029E-05 | global batch size: 256 | lm loss: 2.911383E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.660 | TFLOPs: 31.52 | +7: iteration 106020/ 173500 | consumed samples: 27141120 | consumed tokens: 55585013760 | elapsed time per iteration (s): 0.42 | learning rate: 8.028E-05 | global batch size: 256 | lm loss: 2.908134E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.433 | TFLOPs: 32.08 | +7: iteration 106030/ 173500 | consumed samples: 27143680 | consumed tokens: 55590256640 | elapsed time per iteration (s): 0.42 | learning rate: 8.026E-05 | global batch size: 256 | lm loss: 2.921475E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.230 | TFLOPs: 31.91 | +7: iteration 106040/ 173500 | consumed samples: 27146240 | consumed tokens: 55595499520 | elapsed time per iteration (s): 0.42 | learning rate: 8.025E-05 | global batch size: 256 | lm loss: 2.912176E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.156 | TFLOPs: 31.75 | +7: iteration 106050/ 173500 | consumed samples: 27148800 | consumed tokens: 55600742400 | elapsed time per iteration (s): 0.42 | learning rate: 8.023E-05 | global batch size: 256 | lm loss: 2.913697E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.610 | TFLOPs: 31.83 | +7: iteration 106060/ 173500 | consumed samples: 27151360 | consumed tokens: 55605985280 | elapsed time per iteration (s): 0.42 | learning rate: 8.021E-05 | global batch size: 256 | lm loss: 2.917772E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.630 | TFLOPs: 31.83 | +7: iteration 106070/ 173500 | consumed samples: 27153920 | consumed tokens: 55611228160 | elapsed time per iteration (s): 0.42 | learning rate: 8.020E-05 | global batch size: 256 | lm loss: 2.921015E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.777 | TFLOPs: 32.05 | +7: iteration 106080/ 173500 | consumed samples: 27156480 | consumed tokens: 55616471040 | elapsed time per iteration (s): 0.42 | learning rate: 8.018E-05 | global batch size: 256 | lm loss: 2.917349E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.220 | TFLOPs: 32.02 | +7: iteration 106090/ 173500 | consumed samples: 27159040 | consumed tokens: 55621713920 | elapsed time per iteration (s): 0.60 | learning rate: 8.017E-05 | global batch size: 256 | lm loss: 2.915127E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 423.818 | TFLOPs: 22.24 | +7: iteration 106100/ 173500 | consumed samples: 27161600 | consumed tokens: 55626956800 | elapsed time per iteration (s): 0.42 | learning rate: 8.015E-05 | global batch size: 256 | lm loss: 2.907036E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.533 | TFLOPs: 32.24 | +7: iteration 106110/ 173500 | consumed samples: 27164160 | consumed tokens: 55632199680 | elapsed time per iteration (s): 0.42 | learning rate: 8.014E-05 | global batch size: 256 | lm loss: 2.904601E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.372 | TFLOPs: 32.18 | +7: iteration 106120/ 173500 | consumed samples: 27166720 | consumed tokens: 55637442560 | elapsed time per iteration (s): 0.42 | learning rate: 8.012E-05 | global batch size: 256 | lm loss: 2.923070E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.619 | TFLOPs: 32.14 | +7: iteration 106130/ 173500 | consumed samples: 27169280 | consumed tokens: 55642685440 | elapsed time per iteration (s): 0.42 | learning rate: 8.011E-05 | global batch size: 256 | lm loss: 2.913673E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.073 | TFLOPs: 32.11 | +7: iteration 106140/ 173500 | consumed samples: 27171840 | consumed tokens: 55647928320 | elapsed time per iteration (s): 0.42 | learning rate: 8.009E-05 | global batch size: 256 | lm loss: 2.909970E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.716 | TFLOPs: 32.10 | +7: iteration 106150/ 173500 | consumed samples: 27174400 | consumed tokens: 55653171200 | elapsed time per iteration (s): 0.42 | learning rate: 8.007E-05 | global batch size: 256 | lm loss: 2.903520E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.011 | TFLOPs: 31.85 | +7: iteration 106160/ 173500 | consumed samples: 27176960 | consumed tokens: 55658414080 | elapsed time per iteration (s): 0.42 | learning rate: 8.006E-05 | global batch size: 256 | lm loss: 2.921190E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.554 | TFLOPs: 32.09 | +7: iteration 106170/ 173500 | consumed samples: 27179520 | consumed tokens: 55663656960 | elapsed time per iteration (s): 0.42 | learning rate: 8.004E-05 | global batch size: 256 | lm loss: 2.913092E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.985 | TFLOPs: 32.06 | +7: iteration 106180/ 173500 | consumed samples: 27182080 | consumed tokens: 55668899840 | elapsed time per iteration (s): 0.42 | learning rate: 8.003E-05 | global batch size: 256 | lm loss: 2.916493E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.532 | TFLOPs: 31.77 | +7: iteration 106190/ 173500 | consumed samples: 27184640 | consumed tokens: 55674142720 | elapsed time per iteration (s): 0.42 | learning rate: 8.001E-05 | global batch size: 256 | lm loss: 2.911512E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.647 | TFLOPs: 32.04 | +7: iteration 106200/ 173500 | consumed samples: 27187200 | consumed tokens: 55679385600 | elapsed time per iteration (s): 0.42 | learning rate: 8.000E-05 | global batch size: 256 | lm loss: 2.905746E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.679 | TFLOPs: 32.04 | +7: iteration 106210/ 173500 | consumed samples: 27189760 | consumed tokens: 55684628480 | elapsed time per iteration (s): 0.42 | learning rate: 7.998E-05 | global batch size: 256 | lm loss: 2.901533E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.931 | TFLOPs: 31.74 | +7: iteration 106220/ 173500 | consumed samples: 27192320 | consumed tokens: 55689871360 | elapsed time per iteration (s): 0.42 | learning rate: 7.997E-05 | global batch size: 256 | lm loss: 2.902694E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.059 | TFLOPs: 31.80 | +7: iteration 106230/ 173500 | consumed samples: 27194880 | consumed tokens: 55695114240 | elapsed time per iteration (s): 0.42 | learning rate: 7.995E-05 | global batch size: 256 | lm loss: 2.899599E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.613 | TFLOPs: 32.04 | +7: iteration 106240/ 173500 | consumed samples: 27197440 | consumed tokens: 55700357120 | elapsed time per iteration (s): 0.42 | learning rate: 7.994E-05 | global batch size: 256 | lm loss: 2.910169E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.371 | TFLOPs: 32.03 | +7: iteration 106250/ 173500 | consumed samples: 27200000 | consumed tokens: 55705600000 | elapsed time per iteration (s): 0.42 | learning rate: 7.992E-05 | global batch size: 256 | lm loss: 2.911410E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.131 | TFLOPs: 32.01 | +7: iteration 106260/ 173500 | consumed samples: 27202560 | consumed tokens: 55710842880 | elapsed time per iteration (s): 0.42 | learning rate: 7.990E-05 | global batch size: 256 | lm loss: 2.920590E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.813 | TFLOPs: 32.00 | +7: iteration 106270/ 173500 | consumed samples: 27205120 | consumed tokens: 55716085760 | elapsed time per iteration (s): 0.42 | learning rate: 7.989E-05 | global batch size: 256 | lm loss: 2.903381E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.870 | TFLOPs: 32.00 | +7: iteration 106280/ 173500 | consumed samples: 27207680 | consumed tokens: 55721328640 | elapsed time per iteration (s): 0.42 | learning rate: 7.987E-05 | global batch size: 256 | lm loss: 2.915531E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.961 | TFLOPs: 31.79 | +7: iteration 106290/ 173500 | consumed samples: 27210240 | consumed tokens: 55726571520 | elapsed time per iteration (s): 0.42 | learning rate: 7.986E-05 | global batch size: 256 | lm loss: 2.922493E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.947 | TFLOPs: 31.79 | +7: iteration 106300/ 173500 | consumed samples: 27212800 | consumed tokens: 55731814400 | elapsed time per iteration (s): 0.42 | learning rate: 7.984E-05 | global batch size: 256 | lm loss: 2.923976E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.791 | TFLOPs: 31.99 | +7: iteration 106310/ 173500 | consumed samples: 27215360 | consumed tokens: 55737057280 | elapsed time per iteration (s): 0.42 | learning rate: 7.983E-05 | global batch size: 256 | lm loss: 2.908860E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.267 | TFLOPs: 31.97 | +7: iteration 106320/ 173500 | consumed samples: 27217920 | consumed tokens: 55742300160 | elapsed time per iteration (s): 0.42 | learning rate: 7.981E-05 | global batch size: 256 | lm loss: 2.918695E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.223 | TFLOPs: 31.65 | +7: iteration 106330/ 173500 | consumed samples: 27220480 | consumed tokens: 55747543040 | elapsed time per iteration (s): 0.42 | learning rate: 7.980E-05 | global batch size: 256 | lm loss: 2.916276E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.213 | TFLOPs: 32.02 | +7: iteration 106340/ 173500 | consumed samples: 27223040 | consumed tokens: 55752785920 | elapsed time per iteration (s): 0.42 | learning rate: 7.978E-05 | global batch size: 256 | lm loss: 2.899314E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.528 | TFLOPs: 31.98 | +7: iteration 106350/ 173500 | consumed samples: 27225600 | consumed tokens: 55758028800 | elapsed time per iteration (s): 0.42 | learning rate: 7.976E-05 | global batch size: 256 | lm loss: 2.920226E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.405 | TFLOPs: 31.76 | +7: iteration 106360/ 173500 | consumed samples: 27228160 | consumed tokens: 55763271680 | elapsed time per iteration (s): 0.42 | learning rate: 7.975E-05 | global batch size: 256 | lm loss: 2.908264E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.016 | TFLOPs: 32.01 | +7: iteration 106370/ 173500 | consumed samples: 27230720 | consumed tokens: 55768514560 | elapsed time per iteration (s): 0.42 | learning rate: 7.973E-05 | global batch size: 256 | lm loss: 2.903569E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.891 | TFLOPs: 32.00 | +7: iteration 106380/ 173500 | consumed samples: 27233280 | consumed tokens: 55773757440 | elapsed time per iteration (s): 0.42 | learning rate: 7.972E-05 | global batch size: 256 | lm loss: 2.919949E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.541 | TFLOPs: 31.98 | +7: iteration 106390/ 173500 | consumed samples: 27235840 | consumed tokens: 55779000320 | elapsed time per iteration (s): 0.42 | learning rate: 7.970E-05 | global batch size: 256 | lm loss: 2.901685E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.628 | TFLOPs: 31.83 | +7: iteration 106400/ 173500 | consumed samples: 27238400 | consumed tokens: 55784243200 | elapsed time per iteration (s): 0.42 | learning rate: 7.969E-05 | global batch size: 256 | lm loss: 2.910522E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.334 | TFLOPs: 31.97 | +7: iteration 106410/ 173500 | consumed samples: 27240960 | consumed tokens: 55789486080 | elapsed time per iteration (s): 0.47 | learning rate: 7.967E-05 | global batch size: 256 | lm loss: 2.903208E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 547.249 | TFLOPs: 28.71 | +7: iteration 106420/ 173500 | consumed samples: 27243520 | consumed tokens: 55794728960 | elapsed time per iteration (s): 0.42 | learning rate: 7.966E-05 | global batch size: 256 | lm loss: 2.912703E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.274 | TFLOPs: 32.13 | +7: iteration 106430/ 173500 | consumed samples: 27246080 | consumed tokens: 55799971840 | elapsed time per iteration (s): 0.42 | learning rate: 7.964E-05 | global batch size: 256 | lm loss: 2.904025E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.697 | TFLOPs: 32.04 | +7: iteration 106440/ 173500 | consumed samples: 27248640 | consumed tokens: 55805214720 | elapsed time per iteration (s): 0.42 | learning rate: 7.963E-05 | global batch size: 256 | lm loss: 2.927897E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.331 | TFLOPs: 32.02 | +7: iteration 106450/ 173500 | consumed samples: 27251200 | consumed tokens: 55810457600 | elapsed time per iteration (s): 0.42 | learning rate: 7.961E-05 | global batch size: 256 | lm loss: 2.909468E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.489 | TFLOPs: 31.98 | +7: iteration 106460/ 173500 | consumed samples: 27253760 | consumed tokens: 55815700480 | elapsed time per iteration (s): 0.42 | learning rate: 7.959E-05 | global batch size: 256 | lm loss: 2.909114E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.755 | TFLOPs: 31.63 | +7: iteration 106470/ 173500 | consumed samples: 27256320 | consumed tokens: 55820943360 | elapsed time per iteration (s): 0.42 | learning rate: 7.958E-05 | global batch size: 256 | lm loss: 2.921090E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.055 | TFLOPs: 32.01 | +7: iteration 106480/ 173500 | consumed samples: 27258880 | consumed tokens: 55826186240 | elapsed time per iteration (s): 0.42 | learning rate: 7.956E-05 | global batch size: 256 | lm loss: 2.918505E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.080 | TFLOPs: 32.01 | +7: iteration 106490/ 173500 | consumed samples: 27261440 | consumed tokens: 55831429120 | elapsed time per iteration (s): 0.42 | learning rate: 7.955E-05 | global batch size: 256 | lm loss: 2.918415E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.977 | TFLOPs: 31.69 | +7: iteration 106500/ 173500 | consumed samples: 27264000 | consumed tokens: 55836672000 | elapsed time per iteration (s): 0.42 | learning rate: 7.953E-05 | global batch size: 256 | lm loss: 2.908523E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.773 | TFLOPs: 31.99 | +7: iteration 106510/ 173500 | consumed samples: 27266560 | consumed tokens: 55841914880 | elapsed time per iteration (s): 0.42 | learning rate: 7.952E-05 | global batch size: 256 | lm loss: 2.929506E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.736 | TFLOPs: 31.99 | +7: iteration 106520/ 173500 | consumed samples: 27269120 | consumed tokens: 55847157760 | elapsed time per iteration (s): 0.42 | learning rate: 7.950E-05 | global batch size: 256 | lm loss: 2.912744E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.629 | TFLOPs: 31.88 | +7: iteration 106530/ 173500 | consumed samples: 27271680 | consumed tokens: 55852400640 | elapsed time per iteration (s): 0.42 | learning rate: 7.949E-05 | global batch size: 256 | lm loss: 2.904542E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.833 | TFLOPs: 32.00 | +7: iteration 106540/ 173500 | consumed samples: 27274240 | consumed tokens: 55857643520 | elapsed time per iteration (s): 0.42 | learning rate: 7.947E-05 | global batch size: 256 | lm loss: 2.911485E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.621 | TFLOPs: 31.99 | +7: iteration 106550/ 173500 | consumed samples: 27276800 | consumed tokens: 55862886400 | elapsed time per iteration (s): 0.42 | learning rate: 7.945E-05 | global batch size: 256 | lm loss: 2.918946E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.874 | TFLOPs: 31.68 | +7: iteration 106560/ 173500 | consumed samples: 27279360 | consumed tokens: 55868129280 | elapsed time per iteration (s): 0.42 | learning rate: 7.944E-05 | global batch size: 256 | lm loss: 2.916284E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.609 | TFLOPs: 31.99 | +7: iteration 106570/ 173500 | consumed samples: 27281920 | consumed tokens: 55873372160 | elapsed time per iteration (s): 0.42 | learning rate: 7.942E-05 | global batch size: 256 | lm loss: 2.918724E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.431 | TFLOPs: 31.98 | +7: iteration 106580/ 173500 | consumed samples: 27284480 | consumed tokens: 55878615040 | elapsed time per iteration (s): 0.42 | learning rate: 7.941E-05 | global batch size: 256 | lm loss: 2.908161E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.223 | TFLOPs: 31.76 | +7: iteration 106590/ 173500 | consumed samples: 27287040 | consumed tokens: 55883857920 | elapsed time per iteration (s): 0.42 | learning rate: 7.939E-05 | global batch size: 256 | lm loss: 2.924949E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.420 | TFLOPs: 31.98 | +7: iteration 106600/ 173500 | consumed samples: 27289600 | consumed tokens: 55889100800 | elapsed time per iteration (s): 0.42 | learning rate: 7.938E-05 | global batch size: 256 | lm loss: 2.913982E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.496 | TFLOPs: 31.77 | +7: iteration 106610/ 173500 | consumed samples: 27292160 | consumed tokens: 55894343680 | elapsed time per iteration (s): 0.42 | learning rate: 7.936E-05 | global batch size: 256 | lm loss: 2.911698E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.668 | TFLOPs: 31.99 | +7: iteration 106620/ 173500 | consumed samples: 27294720 | consumed tokens: 55899586560 | elapsed time per iteration (s): 0.42 | learning rate: 7.935E-05 | global batch size: 256 | lm loss: 2.898578E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.890 | TFLOPs: 31.95 | +7: iteration 106630/ 173500 | consumed samples: 27297280 | consumed tokens: 55904829440 | elapsed time per iteration (s): 0.42 | learning rate: 7.933E-05 | global batch size: 256 | lm loss: 2.915354E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.417 | TFLOPs: 31.71 | +7: iteration 106640/ 173500 | consumed samples: 27299840 | consumed tokens: 55910072320 | elapsed time per iteration (s): 0.42 | learning rate: 7.932E-05 | global batch size: 256 | lm loss: 2.909428E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.368 | TFLOPs: 31.97 | +7: iteration 106650/ 173500 | consumed samples: 27302400 | consumed tokens: 55915315200 | elapsed time per iteration (s): 0.42 | learning rate: 7.930E-05 | global batch size: 256 | lm loss: 2.908768E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.358 | TFLOPs: 31.81 | +7: iteration 106660/ 173500 | consumed samples: 27304960 | consumed tokens: 55920558080 | elapsed time per iteration (s): 0.42 | learning rate: 7.928E-05 | global batch size: 256 | lm loss: 2.922340E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.601 | TFLOPs: 31.77 | +7: iteration 106670/ 173500 | consumed samples: 27307520 | consumed tokens: 55925800960 | elapsed time per iteration (s): 0.42 | learning rate: 7.927E-05 | global batch size: 256 | lm loss: 2.898781E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.302 | TFLOPs: 31.97 | +7: iteration 106680/ 173500 | consumed samples: 27310080 | consumed tokens: 55931043840 | elapsed time per iteration (s): 0.42 | learning rate: 7.925E-05 | global batch size: 256 | lm loss: 2.894731E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.843 | TFLOPs: 31.95 | +7: iteration 106690/ 173500 | consumed samples: 27312640 | consumed tokens: 55936286720 | elapsed time per iteration (s): 0.42 | learning rate: 7.924E-05 | global batch size: 256 | lm loss: 2.915887E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.907 | TFLOPs: 31.95 | +7: iteration 106700/ 173500 | consumed samples: 27315200 | consumed tokens: 55941529600 | elapsed time per iteration (s): 0.42 | learning rate: 7.922E-05 | global batch size: 256 | lm loss: 2.911303E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.719 | TFLOPs: 31.94 | +7: iteration 106710/ 173500 | consumed samples: 27317760 | consumed tokens: 55946772480 | elapsed time per iteration (s): 0.42 | learning rate: 7.921E-05 | global batch size: 256 | lm loss: 2.907001E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.109 | TFLOPs: 31.96 | +7: iteration 106720/ 173500 | consumed samples: 27320320 | consumed tokens: 55952015360 | elapsed time per iteration (s): 0.42 | learning rate: 7.919E-05 | global batch size: 256 | lm loss: 2.913525E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.873 | TFLOPs: 31.95 | +7: iteration 106730/ 173500 | consumed samples: 27322880 | consumed tokens: 55957258240 | elapsed time per iteration (s): 0.42 | learning rate: 7.918E-05 | global batch size: 256 | lm loss: 2.921850E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.034 | TFLOPs: 31.96 | +7: iteration 106740/ 173500 | consumed samples: 27325440 | consumed tokens: 55962501120 | elapsed time per iteration (s): 0.42 | learning rate: 7.916E-05 | global batch size: 256 | lm loss: 2.912506E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.987 | TFLOPs: 31.95 | +7: iteration 106750/ 173500 | consumed samples: 27328000 | consumed tokens: 55967744000 | elapsed time per iteration (s): 0.42 | learning rate: 7.915E-05 | global batch size: 256 | lm loss: 2.938968E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.106 | TFLOPs: 31.96 | +7: iteration 106760/ 173500 | consumed samples: 27330560 | consumed tokens: 55972986880 | elapsed time per iteration (s): 0.42 | learning rate: 7.913E-05 | global batch size: 256 | lm loss: 2.920952E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.030 | TFLOPs: 31.95 | +7: iteration 106770/ 173500 | consumed samples: 27333120 | consumed tokens: 55978229760 | elapsed time per iteration (s): 0.42 | learning rate: 7.911E-05 | global batch size: 256 | lm loss: 2.912688E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.542 | TFLOPs: 31.72 | +7: iteration 106780/ 173500 | consumed samples: 27335680 | consumed tokens: 55983472640 | elapsed time per iteration (s): 0.42 | learning rate: 7.910E-05 | global batch size: 256 | lm loss: 2.912448E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.697 | TFLOPs: 31.99 | +7: iteration 106790/ 173500 | consumed samples: 27338240 | consumed tokens: 55988715520 | elapsed time per iteration (s): 0.42 | learning rate: 7.908E-05 | global batch size: 256 | lm loss: 2.911071E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.453 | TFLOPs: 31.77 | +7: iteration 106800/ 173500 | consumed samples: 27340800 | consumed tokens: 55993958400 | elapsed time per iteration (s): 0.42 | learning rate: 7.907E-05 | global batch size: 256 | lm loss: 2.919107E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.737 | TFLOPs: 31.73 | +7: iteration 106810/ 173500 | consumed samples: 27343360 | consumed tokens: 55999201280 | elapsed time per iteration (s): 0.42 | learning rate: 7.905E-05 | global batch size: 256 | lm loss: 2.906310E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.497 | TFLOPs: 31.66 | +7: iteration 106820/ 173500 | consumed samples: 27345920 | consumed tokens: 56004444160 | elapsed time per iteration (s): 0.42 | learning rate: 7.904E-05 | global batch size: 256 | lm loss: 2.925281E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.751 | TFLOPs: 31.99 | +7: iteration 106830/ 173500 | consumed samples: 27348480 | consumed tokens: 56009687040 | elapsed time per iteration (s): 0.42 | learning rate: 7.902E-05 | global batch size: 256 | lm loss: 2.921766E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.001 | TFLOPs: 31.74 | +7: iteration 106840/ 173500 | consumed samples: 27351040 | consumed tokens: 56014929920 | elapsed time per iteration (s): 0.42 | learning rate: 7.901E-05 | global batch size: 256 | lm loss: 2.904502E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.873 | TFLOPs: 32.00 | +7: iteration 106850/ 173500 | consumed samples: 27353600 | consumed tokens: 56020172800 | elapsed time per iteration (s): 0.42 | learning rate: 7.899E-05 | global batch size: 256 | lm loss: 2.921003E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.431 | TFLOPs: 31.98 | +7: iteration 106860/ 173500 | consumed samples: 27356160 | consumed tokens: 56025415680 | elapsed time per iteration (s): 0.42 | learning rate: 7.898E-05 | global batch size: 256 | lm loss: 2.922858E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.642 | TFLOPs: 31.99 | +7: iteration 106870/ 173500 | consumed samples: 27358720 | consumed tokens: 56030658560 | elapsed time per iteration (s): 0.42 | learning rate: 7.896E-05 | global batch size: 256 | lm loss: 2.916031E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.491 | TFLOPs: 31.98 | +7: iteration 106880/ 173500 | consumed samples: 27361280 | consumed tokens: 56035901440 | elapsed time per iteration (s): 0.42 | learning rate: 7.894E-05 | global batch size: 256 | lm loss: 2.914241E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.315 | TFLOPs: 31.97 | +7: iteration 106890/ 173500 | consumed samples: 27363840 | consumed tokens: 56041144320 | elapsed time per iteration (s): 0.42 | learning rate: 7.893E-05 | global batch size: 256 | lm loss: 2.921719E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.480 | TFLOPs: 31.98 | +7: iteration 106900/ 173500 | consumed samples: 27366400 | consumed tokens: 56046387200 | elapsed time per iteration (s): 0.43 | learning rate: 7.891E-05 | global batch size: 256 | lm loss: 2.923861E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.204 | TFLOPs: 31.60 | +7: iteration 106910/ 173500 | consumed samples: 27368960 | consumed tokens: 56051630080 | elapsed time per iteration (s): 0.42 | learning rate: 7.890E-05 | global batch size: 256 | lm loss: 2.910540E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.469 | TFLOPs: 31.98 | +7: iteration 106920/ 173500 | consumed samples: 27371520 | consumed tokens: 56056872960 | elapsed time per iteration (s): 0.42 | learning rate: 7.888E-05 | global batch size: 256 | lm loss: 2.910965E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.019 | TFLOPs: 31.95 | +7: iteration 106930/ 173500 | consumed samples: 27374080 | consumed tokens: 56062115840 | elapsed time per iteration (s): 0.42 | learning rate: 7.887E-05 | global batch size: 256 | lm loss: 2.891951E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.669 | TFLOPs: 31.99 | +7: iteration 106940/ 173500 | consumed samples: 27376640 | consumed tokens: 56067358720 | elapsed time per iteration (s): 0.42 | learning rate: 7.885E-05 | global batch size: 256 | lm loss: 2.905419E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.287 | TFLOPs: 31.97 | +7: iteration 106950/ 173500 | consumed samples: 27379200 | consumed tokens: 56072601600 | elapsed time per iteration (s): 0.42 | learning rate: 7.884E-05 | global batch size: 256 | lm loss: 2.919162E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.332 | TFLOPs: 31.71 | +7: iteration 106960/ 173500 | consumed samples: 27381760 | consumed tokens: 56077844480 | elapsed time per iteration (s): 0.42 | learning rate: 7.882E-05 | global batch size: 256 | lm loss: 2.901936E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.360 | TFLOPs: 31.97 | +7: iteration 106970/ 173500 | consumed samples: 27384320 | consumed tokens: 56083087360 | elapsed time per iteration (s): 0.43 | learning rate: 7.881E-05 | global batch size: 256 | lm loss: 2.913865E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.639 | TFLOPs: 31.20 | +7: iteration 106980/ 173500 | consumed samples: 27386880 | consumed tokens: 56088330240 | elapsed time per iteration (s): 0.43 | learning rate: 7.879E-05 | global batch size: 256 | lm loss: 2.914658E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.580 | TFLOPs: 31.51 | +7: iteration 106990/ 173500 | consumed samples: 27389440 | consumed tokens: 56093573120 | elapsed time per iteration (s): 0.42 | learning rate: 7.877E-05 | global batch size: 256 | lm loss: 2.919800E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.820 | TFLOPs: 32.00 | +7: iteration 107000/ 173500 | consumed samples: 27392000 | consumed tokens: 56098816000 | elapsed time per iteration (s): 0.43 | learning rate: 7.876E-05 | global batch size: 256 | lm loss: 2.904229E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.142 | TFLOPs: 31.28 | +7: iteration 107010/ 173500 | consumed samples: 27394560 | consumed tokens: 56104058880 | elapsed time per iteration (s): 0.42 | learning rate: 7.874E-05 | global batch size: 256 | lm loss: 2.923669E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.495 | TFLOPs: 31.72 | +7: iteration 107020/ 173500 | consumed samples: 27397120 | consumed tokens: 56109301760 | elapsed time per iteration (s): 0.42 | learning rate: 7.873E-05 | global batch size: 256 | lm loss: 2.911628E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.868 | TFLOPs: 32.00 | +7: iteration 107030/ 173500 | consumed samples: 27399680 | consumed tokens: 56114544640 | elapsed time per iteration (s): 0.42 | learning rate: 7.871E-05 | global batch size: 256 | lm loss: 2.911430E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.419 | TFLOPs: 31.98 | +7: iteration 107040/ 173500 | consumed samples: 27402240 | consumed tokens: 56119787520 | elapsed time per iteration (s): 0.42 | learning rate: 7.870E-05 | global batch size: 256 | lm loss: 2.916350E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.470 | TFLOPs: 31.98 | +7: iteration 107050/ 173500 | consumed samples: 27404800 | consumed tokens: 56125030400 | elapsed time per iteration (s): 0.43 | learning rate: 7.868E-05 | global batch size: 256 | lm loss: 2.923845E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.629 | TFLOPs: 31.57 | +7: iteration 107060/ 173500 | consumed samples: 27407360 | consumed tokens: 56130273280 | elapsed time per iteration (s): 0.42 | learning rate: 7.867E-05 | global batch size: 256 | lm loss: 2.905750E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.423 | TFLOPs: 31.98 | +7: iteration 107070/ 173500 | consumed samples: 27409920 | consumed tokens: 56135516160 | elapsed time per iteration (s): 0.43 | learning rate: 7.865E-05 | global batch size: 256 | lm loss: 2.916309E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.736 | TFLOPs: 31.36 | +7: iteration 107080/ 173500 | consumed samples: 27412480 | consumed tokens: 56140759040 | elapsed time per iteration (s): 0.42 | learning rate: 7.864E-05 | global batch size: 256 | lm loss: 2.906286E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.722 | TFLOPs: 31.99 | +7: iteration 107090/ 173500 | consumed samples: 27415040 | consumed tokens: 56146001920 | elapsed time per iteration (s): 0.42 | learning rate: 7.862E-05 | global batch size: 256 | lm loss: 2.913711E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.330 | TFLOPs: 31.97 | +7: iteration 107100/ 173500 | consumed samples: 27417600 | consumed tokens: 56151244800 | elapsed time per iteration (s): 0.42 | learning rate: 7.860E-05 | global batch size: 256 | lm loss: 2.909629E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.407 | TFLOPs: 31.61 | +7: iteration 107110/ 173500 | consumed samples: 27420160 | consumed tokens: 56156487680 | elapsed time per iteration (s): 0.42 | learning rate: 7.859E-05 | global batch size: 256 | lm loss: 2.922370E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.607 | TFLOPs: 31.78 | +7: iteration 107120/ 173500 | consumed samples: 27422720 | consumed tokens: 56161730560 | elapsed time per iteration (s): 0.42 | learning rate: 7.857E-05 | global batch size: 256 | lm loss: 2.923009E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.313 | TFLOPs: 31.97 | +7: iteration 107130/ 173500 | consumed samples: 27425280 | consumed tokens: 56166973440 | elapsed time per iteration (s): 0.42 | learning rate: 7.856E-05 | global batch size: 256 | lm loss: 2.920753E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.429 | TFLOPs: 31.77 | +7: iteration 107140/ 173500 | consumed samples: 27427840 | consumed tokens: 56172216320 | elapsed time per iteration (s): 0.42 | learning rate: 7.854E-05 | global batch size: 256 | lm loss: 2.902613E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.187 | TFLOPs: 31.96 | +7: iteration 107150/ 173500 | consumed samples: 27430400 | consumed tokens: 56177459200 | elapsed time per iteration (s): 0.42 | learning rate: 7.853E-05 | global batch size: 256 | lm loss: 2.903733E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.238 | TFLOPs: 31.97 | +7: iteration 107160/ 173500 | consumed samples: 27432960 | consumed tokens: 56182702080 | elapsed time per iteration (s): 0.42 | learning rate: 7.851E-05 | global batch size: 256 | lm loss: 2.929695E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.885 | TFLOPs: 31.95 | +7: iteration 107170/ 173500 | consumed samples: 27435520 | consumed tokens: 56187944960 | elapsed time per iteration (s): 0.42 | learning rate: 7.850E-05 | global batch size: 256 | lm loss: 2.915836E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.782 | TFLOPs: 31.94 | +7: iteration 107180/ 173500 | consumed samples: 27438080 | consumed tokens: 56193187840 | elapsed time per iteration (s): 0.42 | learning rate: 7.848E-05 | global batch size: 256 | lm loss: 2.913354E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.165 | TFLOPs: 31.96 | +7: iteration 107190/ 173500 | consumed samples: 27440640 | consumed tokens: 56198430720 | elapsed time per iteration (s): 0.45 | learning rate: 7.847E-05 | global batch size: 256 | lm loss: 2.914508E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.831 | TFLOPs: 29.90 | +7: iteration 107200/ 173500 | consumed samples: 27443200 | consumed tokens: 56203673600 | elapsed time per iteration (s): 0.42 | learning rate: 7.845E-05 | global batch size: 256 | lm loss: 2.901911E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.008 | TFLOPs: 32.06 | +7: iteration 107210/ 173500 | consumed samples: 27445760 | consumed tokens: 56208916480 | elapsed time per iteration (s): 0.42 | learning rate: 7.844E-05 | global batch size: 256 | lm loss: 2.911923E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.889 | TFLOPs: 32.00 | +7: iteration 107220/ 173500 | consumed samples: 27448320 | consumed tokens: 56214159360 | elapsed time per iteration (s): 0.42 | learning rate: 7.842E-05 | global batch size: 256 | lm loss: 2.896021E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.458 | TFLOPs: 31.98 | +7: iteration 107230/ 173500 | consumed samples: 27450880 | consumed tokens: 56219402240 | elapsed time per iteration (s): 0.42 | learning rate: 7.840E-05 | global batch size: 256 | lm loss: 2.909794E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.224 | TFLOPs: 31.97 | +7: iteration 107240/ 173500 | consumed samples: 27453440 | consumed tokens: 56224645120 | elapsed time per iteration (s): 0.42 | learning rate: 7.839E-05 | global batch size: 256 | lm loss: 2.916447E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.448 | TFLOPs: 31.98 | +7: iteration 107250/ 173500 | consumed samples: 27456000 | consumed tokens: 56229888000 | elapsed time per iteration (s): 0.42 | learning rate: 7.837E-05 | global batch size: 256 | lm loss: 2.915057E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.800 | TFLOPs: 31.94 | +7: iteration 107260/ 173500 | consumed samples: 27458560 | consumed tokens: 56235130880 | elapsed time per iteration (s): 0.42 | learning rate: 7.836E-05 | global batch size: 256 | lm loss: 2.891788E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.935 | TFLOPs: 31.95 | +7: iteration 107270/ 173500 | consumed samples: 27461120 | consumed tokens: 56240373760 | elapsed time per iteration (s): 0.42 | learning rate: 7.834E-05 | global batch size: 256 | lm loss: 2.907008E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.614 | TFLOPs: 31.78 | +7: iteration 107280/ 173500 | consumed samples: 27463680 | consumed tokens: 56245616640 | elapsed time per iteration (s): 0.42 | learning rate: 7.833E-05 | global batch size: 256 | lm loss: 2.914224E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.163 | TFLOPs: 31.96 | +7: iteration 107290/ 173500 | consumed samples: 27466240 | consumed tokens: 56250859520 | elapsed time per iteration (s): 0.42 | learning rate: 7.831E-05 | global batch size: 256 | lm loss: 2.902533E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.683 | TFLOPs: 31.78 | +7: iteration 107300/ 173500 | consumed samples: 27468800 | consumed tokens: 56256102400 | elapsed time per iteration (s): 0.42 | learning rate: 7.830E-05 | global batch size: 256 | lm loss: 2.905180E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.013 | TFLOPs: 31.69 | +7: iteration 107310/ 173500 | consumed samples: 27471360 | consumed tokens: 56261345280 | elapsed time per iteration (s): 0.42 | learning rate: 7.828E-05 | global batch size: 256 | lm loss: 2.902885E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.129 | TFLOPs: 31.96 | +7: iteration 107320/ 173500 | consumed samples: 27473920 | consumed tokens: 56266588160 | elapsed time per iteration (s): 0.42 | learning rate: 7.827E-05 | global batch size: 256 | lm loss: 2.921827E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.112 | TFLOPs: 31.96 | +7: iteration 107330/ 173500 | consumed samples: 27476480 | consumed tokens: 56271831040 | elapsed time per iteration (s): 0.42 | learning rate: 7.825E-05 | global batch size: 256 | lm loss: 2.897630E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.049 | TFLOPs: 31.96 | +7: iteration 107340/ 173500 | consumed samples: 27479040 | consumed tokens: 56277073920 | elapsed time per iteration (s): 0.42 | learning rate: 7.823E-05 | global batch size: 256 | lm loss: 2.908725E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.991 | TFLOPs: 31.80 | +7: iteration 107350/ 173500 | consumed samples: 27481600 | consumed tokens: 56282316800 | elapsed time per iteration (s): 0.42 | learning rate: 7.822E-05 | global batch size: 256 | lm loss: 2.907164E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.384 | TFLOPs: 31.97 | +7: iteration 107360/ 173500 | consumed samples: 27484160 | consumed tokens: 56287559680 | elapsed time per iteration (s): 0.42 | learning rate: 7.820E-05 | global batch size: 256 | lm loss: 2.915837E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.217 | TFLOPs: 31.96 | +7: iteration 107370/ 173500 | consumed samples: 27486720 | consumed tokens: 56292802560 | elapsed time per iteration (s): 0.42 | learning rate: 7.819E-05 | global batch size: 256 | lm loss: 2.897209E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.978 | TFLOPs: 31.95 | +7: iteration 107380/ 173500 | consumed samples: 27489280 | consumed tokens: 56298045440 | elapsed time per iteration (s): 0.42 | learning rate: 7.817E-05 | global batch size: 256 | lm loss: 2.909751E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.639 | TFLOPs: 31.93 | +7: iteration 107390/ 173500 | consumed samples: 27491840 | consumed tokens: 56303288320 | elapsed time per iteration (s): 0.42 | learning rate: 7.816E-05 | global batch size: 256 | lm loss: 2.891705E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.771 | TFLOPs: 31.94 | +7: iteration 107400/ 173500 | consumed samples: 27494400 | consumed tokens: 56308531200 | elapsed time per iteration (s): 0.42 | learning rate: 7.814E-05 | global batch size: 256 | lm loss: 2.913552E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.641 | TFLOPs: 31.93 | +7: iteration 107410/ 173500 | consumed samples: 27496960 | consumed tokens: 56313774080 | elapsed time per iteration (s): 0.42 | learning rate: 7.813E-05 | global batch size: 256 | lm loss: 2.909026E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.979 | TFLOPs: 31.95 | +7: iteration 107420/ 173500 | consumed samples: 27499520 | consumed tokens: 56319016960 | elapsed time per iteration (s): 0.42 | learning rate: 7.811E-05 | global batch size: 256 | lm loss: 2.915511E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.134 | TFLOPs: 31.96 | +7: iteration 107430/ 173500 | consumed samples: 27502080 | consumed tokens: 56324259840 | elapsed time per iteration (s): 0.42 | learning rate: 7.810E-05 | global batch size: 256 | lm loss: 2.916565E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.719 | TFLOPs: 31.94 | +7: iteration 107440/ 173500 | consumed samples: 27504640 | consumed tokens: 56329502720 | elapsed time per iteration (s): 0.42 | learning rate: 7.808E-05 | global batch size: 256 | lm loss: 2.914936E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.815 | TFLOPs: 31.94 | +7: iteration 107450/ 173500 | consumed samples: 27507200 | consumed tokens: 56334745600 | elapsed time per iteration (s): 0.42 | learning rate: 7.807E-05 | global batch size: 256 | lm loss: 2.916968E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.860 | TFLOPs: 31.95 | +7: iteration 107460/ 173500 | consumed samples: 27509760 | consumed tokens: 56339988480 | elapsed time per iteration (s): 0.42 | learning rate: 7.805E-05 | global batch size: 256 | lm loss: 2.904083E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.976 | TFLOPs: 31.95 | +7: iteration 107470/ 173500 | consumed samples: 27512320 | consumed tokens: 56345231360 | elapsed time per iteration (s): 0.42 | learning rate: 7.803E-05 | global batch size: 256 | lm loss: 2.918691E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.583 | TFLOPs: 31.93 | +7: iteration 107480/ 173500 | consumed samples: 27514880 | consumed tokens: 56350474240 | elapsed time per iteration (s): 0.43 | learning rate: 7.802E-05 | global batch size: 256 | lm loss: 2.907990E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.155 | TFLOPs: 31.02 | +7: iteration 107490/ 173500 | consumed samples: 27517440 | consumed tokens: 56355717120 | elapsed time per iteration (s): 0.42 | learning rate: 7.800E-05 | global batch size: 256 | lm loss: 2.903044E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.802 | TFLOPs: 31.94 | +7: iteration 107500/ 173500 | consumed samples: 27520000 | consumed tokens: 56360960000 | elapsed time per iteration (s): 0.42 | learning rate: 7.799E-05 | global batch size: 256 | lm loss: 2.912649E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.155 | TFLOPs: 31.96 | +7: iteration 107510/ 173500 | consumed samples: 27522560 | consumed tokens: 56366202880 | elapsed time per iteration (s): 0.42 | learning rate: 7.797E-05 | global batch size: 256 | lm loss: 2.906450E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.601 | TFLOPs: 31.93 | +7: iteration 107520/ 173500 | consumed samples: 27525120 | consumed tokens: 56371445760 | elapsed time per iteration (s): 0.42 | learning rate: 7.796E-05 | global batch size: 256 | lm loss: 2.904827E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.808 | TFLOPs: 31.94 | +7: iteration 107530/ 173500 | consumed samples: 27527680 | consumed tokens: 56376688640 | elapsed time per iteration (s): 0.42 | learning rate: 7.794E-05 | global batch size: 256 | lm loss: 2.906556E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.723 | TFLOPs: 31.94 | +7: iteration 107540/ 173500 | consumed samples: 27530240 | consumed tokens: 56381931520 | elapsed time per iteration (s): 0.42 | learning rate: 7.793E-05 | global batch size: 256 | lm loss: 2.904900E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.909 | TFLOPs: 31.95 | +7: iteration 107550/ 173500 | consumed samples: 27532800 | consumed tokens: 56387174400 | elapsed time per iteration (s): 0.42 | learning rate: 7.791E-05 | global batch size: 256 | lm loss: 2.911282E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.687 | TFLOPs: 31.94 | +7: iteration 107560/ 173500 | consumed samples: 27535360 | consumed tokens: 56392417280 | elapsed time per iteration (s): 0.42 | learning rate: 7.790E-05 | global batch size: 256 | lm loss: 2.919762E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.889 | TFLOPs: 31.95 | +7: iteration 107570/ 173500 | consumed samples: 27537920 | consumed tokens: 56397660160 | elapsed time per iteration (s): 0.42 | learning rate: 7.788E-05 | global batch size: 256 | lm loss: 2.911756E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.537 | TFLOPs: 31.93 | +7: iteration 107580/ 173500 | consumed samples: 27540480 | consumed tokens: 56402903040 | elapsed time per iteration (s): 0.42 | learning rate: 7.787E-05 | global batch size: 256 | lm loss: 2.898835E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.898 | TFLOPs: 31.95 | +7: iteration 107590/ 173500 | consumed samples: 27543040 | consumed tokens: 56408145920 | elapsed time per iteration (s): 0.42 | learning rate: 7.785E-05 | global batch size: 256 | lm loss: 2.919084E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.970 | TFLOPs: 31.95 | +7: iteration 107600/ 173500 | consumed samples: 27545600 | consumed tokens: 56413388800 | elapsed time per iteration (s): 0.42 | learning rate: 7.783E-05 | global batch size: 256 | lm loss: 2.925176E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.955 | TFLOPs: 31.95 | +7: iteration 107610/ 173500 | consumed samples: 27548160 | consumed tokens: 56418631680 | elapsed time per iteration (s): 0.42 | learning rate: 7.782E-05 | global batch size: 256 | lm loss: 2.908123E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.077 | TFLOPs: 31.96 | +7: iteration 107620/ 173500 | consumed samples: 27550720 | consumed tokens: 56423874560 | elapsed time per iteration (s): 0.42 | learning rate: 7.780E-05 | global batch size: 256 | lm loss: 2.912828E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.153 | TFLOPs: 31.96 | +7: iteration 107630/ 173500 | consumed samples: 27553280 | consumed tokens: 56429117440 | elapsed time per iteration (s): 0.42 | learning rate: 7.779E-05 | global batch size: 256 | lm loss: 2.904044E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.424 | TFLOPs: 31.98 | +7: iteration 107640/ 173500 | consumed samples: 27555840 | consumed tokens: 56434360320 | elapsed time per iteration (s): 0.42 | learning rate: 7.777E-05 | global batch size: 256 | lm loss: 2.907366E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.423 | TFLOPs: 31.98 | +7: iteration 107650/ 173500 | consumed samples: 27558400 | consumed tokens: 56439603200 | elapsed time per iteration (s): 0.42 | learning rate: 7.776E-05 | global batch size: 256 | lm loss: 2.903352E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.332 | TFLOPs: 31.97 | +7: iteration 107660/ 173500 | consumed samples: 27560960 | consumed tokens: 56444846080 | elapsed time per iteration (s): 0.42 | learning rate: 7.774E-05 | global batch size: 256 | lm loss: 2.913712E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.892 | TFLOPs: 31.95 | +7: iteration 107670/ 173500 | consumed samples: 27563520 | consumed tokens: 56450088960 | elapsed time per iteration (s): 0.42 | learning rate: 7.773E-05 | global batch size: 256 | lm loss: 2.920904E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.168 | TFLOPs: 31.96 | +7: iteration 107680/ 173500 | consumed samples: 27566080 | consumed tokens: 56455331840 | elapsed time per iteration (s): 0.42 | learning rate: 7.771E-05 | global batch size: 256 | lm loss: 2.925047E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.880 | TFLOPs: 31.95 | +7: iteration 107690/ 173500 | consumed samples: 27568640 | consumed tokens: 56460574720 | elapsed time per iteration (s): 0.42 | learning rate: 7.770E-05 | global batch size: 256 | lm loss: 2.892006E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.236 | TFLOPs: 31.91 | +7: iteration 107700/ 173500 | consumed samples: 27571200 | consumed tokens: 56465817600 | elapsed time per iteration (s): 0.42 | learning rate: 7.768E-05 | global batch size: 256 | lm loss: 2.907720E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.276 | TFLOPs: 31.92 | +7: iteration 107710/ 173500 | consumed samples: 27573760 | consumed tokens: 56471060480 | elapsed time per iteration (s): 0.42 | learning rate: 7.767E-05 | global batch size: 256 | lm loss: 2.906086E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.294 | TFLOPs: 31.92 | +7: iteration 107720/ 173500 | consumed samples: 27576320 | consumed tokens: 56476303360 | elapsed time per iteration (s): 0.42 | learning rate: 7.765E-05 | global batch size: 256 | lm loss: 2.916766E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.092 | TFLOPs: 31.91 | +7: iteration 107730/ 173500 | consumed samples: 27578880 | consumed tokens: 56481546240 | elapsed time per iteration (s): 0.42 | learning rate: 7.763E-05 | global batch size: 256 | lm loss: 2.892527E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.904 | TFLOPs: 31.90 | +7: iteration 107740/ 173500 | consumed samples: 27581440 | consumed tokens: 56486789120 | elapsed time per iteration (s): 0.42 | learning rate: 7.762E-05 | global batch size: 256 | lm loss: 2.918630E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.076 | TFLOPs: 31.90 | +7: iteration 107750/ 173500 | consumed samples: 27584000 | consumed tokens: 56492032000 | elapsed time per iteration (s): 0.42 | learning rate: 7.760E-05 | global batch size: 256 | lm loss: 2.912531E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.524 | TFLOPs: 31.88 | +7: iteration 107760/ 173500 | consumed samples: 27586560 | consumed tokens: 56497274880 | elapsed time per iteration (s): 0.42 | learning rate: 7.759E-05 | global batch size: 256 | lm loss: 2.905608E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.471 | TFLOPs: 31.93 | +7: iteration 107770/ 173500 | consumed samples: 27589120 | consumed tokens: 56502517760 | elapsed time per iteration (s): 0.42 | learning rate: 7.757E-05 | global batch size: 256 | lm loss: 2.928677E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.984 | TFLOPs: 31.90 | +7: iteration 107780/ 173500 | consumed samples: 27591680 | consumed tokens: 56507760640 | elapsed time per iteration (s): 0.42 | learning rate: 7.756E-05 | global batch size: 256 | lm loss: 2.908626E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.311 | TFLOPs: 31.92 | +7: iteration 107790/ 173500 | consumed samples: 27594240 | consumed tokens: 56513003520 | elapsed time per iteration (s): 0.42 | learning rate: 7.754E-05 | global batch size: 256 | lm loss: 2.899965E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.045 | TFLOPs: 31.90 | +7: iteration 107800/ 173500 | consumed samples: 27596800 | consumed tokens: 56518246400 | elapsed time per iteration (s): 0.42 | learning rate: 7.753E-05 | global batch size: 256 | lm loss: 2.899916E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.166 | TFLOPs: 31.91 | +7: iteration 107810/ 173500 | consumed samples: 27599360 | consumed tokens: 56523489280 | elapsed time per iteration (s): 0.42 | learning rate: 7.751E-05 | global batch size: 256 | lm loss: 2.914606E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.405 | TFLOPs: 31.92 | +7: iteration 107820/ 173500 | consumed samples: 27601920 | consumed tokens: 56528732160 | elapsed time per iteration (s): 0.42 | learning rate: 7.750E-05 | global batch size: 256 | lm loss: 2.906971E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.283 | TFLOPs: 31.92 | +7: iteration 107830/ 173500 | consumed samples: 27604480 | consumed tokens: 56533975040 | elapsed time per iteration (s): 0.42 | learning rate: 7.748E-05 | global batch size: 256 | lm loss: 2.918614E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.507 | TFLOPs: 31.93 | +7: iteration 107840/ 173500 | consumed samples: 27607040 | consumed tokens: 56539217920 | elapsed time per iteration (s): 0.42 | learning rate: 7.747E-05 | global batch size: 256 | lm loss: 2.899665E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.923 | TFLOPs: 31.74 | +7: iteration 107850/ 173500 | consumed samples: 27609600 | consumed tokens: 56544460800 | elapsed time per iteration (s): 0.42 | learning rate: 7.745E-05 | global batch size: 256 | lm loss: 2.914119E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.182 | TFLOPs: 31.91 | +7: iteration 107860/ 173500 | consumed samples: 27612160 | consumed tokens: 56549703680 | elapsed time per iteration (s): 0.42 | learning rate: 7.744E-05 | global batch size: 256 | lm loss: 2.911960E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.079 | TFLOPs: 31.90 | +7: iteration 107870/ 173500 | consumed samples: 27614720 | consumed tokens: 56554946560 | elapsed time per iteration (s): 0.43 | learning rate: 7.742E-05 | global batch size: 256 | lm loss: 2.909963E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.685 | TFLOPs: 31.57 | +7: iteration 107880/ 173500 | consumed samples: 27617280 | consumed tokens: 56560189440 | elapsed time per iteration (s): 0.42 | learning rate: 7.740E-05 | global batch size: 256 | lm loss: 2.920185E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.373 | TFLOPs: 31.92 | +7: iteration 107890/ 173500 | consumed samples: 27619840 | consumed tokens: 56565432320 | elapsed time per iteration (s): 0.42 | learning rate: 7.739E-05 | global batch size: 256 | lm loss: 2.907863E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.215 | TFLOPs: 31.91 | +7: iteration 107900/ 173500 | consumed samples: 27622400 | consumed tokens: 56570675200 | elapsed time per iteration (s): 0.42 | learning rate: 7.737E-05 | global batch size: 256 | lm loss: 2.899848E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.981 | TFLOPs: 31.90 | +7: iteration 107910/ 173500 | consumed samples: 27624960 | consumed tokens: 56575918080 | elapsed time per iteration (s): 0.42 | learning rate: 7.736E-05 | global batch size: 256 | lm loss: 2.905352E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.278 | TFLOPs: 31.92 | +7: iteration 107920/ 173500 | consumed samples: 27627520 | consumed tokens: 56581160960 | elapsed time per iteration (s): 0.42 | learning rate: 7.734E-05 | global batch size: 256 | lm loss: 2.906043E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.718 | TFLOPs: 31.89 | +7: iteration 107930/ 173500 | consumed samples: 27630080 | consumed tokens: 56586403840 | elapsed time per iteration (s): 0.42 | learning rate: 7.733E-05 | global batch size: 256 | lm loss: 2.907717E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.425 | TFLOPs: 31.87 | +7: iteration 107940/ 173500 | consumed samples: 27632640 | consumed tokens: 56591646720 | elapsed time per iteration (s): 0.42 | learning rate: 7.731E-05 | global batch size: 256 | lm loss: 2.893366E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.790 | TFLOPs: 31.89 | +7: iteration 107950/ 173500 | consumed samples: 27635200 | consumed tokens: 56596889600 | elapsed time per iteration (s): 0.42 | learning rate: 7.730E-05 | global batch size: 256 | lm loss: 2.914778E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.969 | TFLOPs: 31.90 | +7: iteration 107960/ 173500 | consumed samples: 27637760 | consumed tokens: 56602132480 | elapsed time per iteration (s): 0.42 | learning rate: 7.728E-05 | global batch size: 256 | lm loss: 2.913647E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.225 | TFLOPs: 31.91 | +7: iteration 107970/ 173500 | consumed samples: 27640320 | consumed tokens: 56607375360 | elapsed time per iteration (s): 0.42 | learning rate: 7.727E-05 | global batch size: 256 | lm loss: 2.914384E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.040 | TFLOPs: 31.90 | +7: iteration 107980/ 173500 | consumed samples: 27642880 | consumed tokens: 56612618240 | elapsed time per iteration (s): 0.42 | learning rate: 7.725E-05 | global batch size: 256 | lm loss: 2.900785E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.883 | TFLOPs: 31.89 | +7: iteration 107990/ 173500 | consumed samples: 27645440 | consumed tokens: 56617861120 | elapsed time per iteration (s): 0.42 | learning rate: 7.724E-05 | global batch size: 256 | lm loss: 2.899893E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.297 | TFLOPs: 31.92 | +0: [2023-03-17 12:00:24,074] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=0, lr=[7.722055869362951e-05, 7.722055869362951e-05, 7.722055869362951e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 108000/ 173500 | consumed samples: 27648000 | consumed tokens: 56623104000 | elapsed time per iteration (s): 0.42 | learning rate: 7.722E-05 | global batch size: 256 | lm loss: 2.919328E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.999 | TFLOPs: 31.90 | +0: steps: 108000 loss: 2.9165 iter time (s): 0.420 samples/sec: 608.841 +7: iteration 108010/ 173500 | consumed samples: 27650560 | consumed tokens: 56628346880 | elapsed time per iteration (s): 0.42 | learning rate: 7.721E-05 | global batch size: 256 | lm loss: 2.898525E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.912 | TFLOPs: 31.74 | +7: iteration 108020/ 173500 | consumed samples: 27653120 | consumed tokens: 56633589760 | elapsed time per iteration (s): 0.42 | learning rate: 7.719E-05 | global batch size: 256 | lm loss: 2.914549E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.330 | TFLOPs: 31.87 | +7: iteration 108030/ 173500 | consumed samples: 27655680 | consumed tokens: 56638832640 | elapsed time per iteration (s): 0.42 | learning rate: 7.717E-05 | global batch size: 256 | lm loss: 2.917981E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.574 | TFLOPs: 31.88 | +7: iteration 108040/ 173500 | consumed samples: 27658240 | consumed tokens: 56644075520 | elapsed time per iteration (s): 0.42 | learning rate: 7.716E-05 | global batch size: 256 | lm loss: 2.914852E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.757 | TFLOPs: 31.89 | +7: iteration 108050/ 173500 | consumed samples: 27660800 | consumed tokens: 56649318400 | elapsed time per iteration (s): 0.42 | learning rate: 7.714E-05 | global batch size: 256 | lm loss: 2.910732E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.047 | TFLOPs: 31.90 | +7: iteration 108060/ 173500 | consumed samples: 27663360 | consumed tokens: 56654561280 | elapsed time per iteration (s): 0.42 | learning rate: 7.713E-05 | global batch size: 256 | lm loss: 2.904231E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.658 | TFLOPs: 31.88 | +7: iteration 108070/ 173500 | consumed samples: 27665920 | consumed tokens: 56659804160 | elapsed time per iteration (s): 0.42 | learning rate: 7.711E-05 | global batch size: 256 | lm loss: 2.904728E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.765 | TFLOPs: 31.89 | +7: iteration 108080/ 173500 | consumed samples: 27668480 | consumed tokens: 56665047040 | elapsed time per iteration (s): 0.43 | learning rate: 7.710E-05 | global batch size: 256 | lm loss: 2.916815E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.233 | TFLOPs: 31.34 | +7: iteration 108090/ 173500 | consumed samples: 27671040 | consumed tokens: 56670289920 | elapsed time per iteration (s): 0.42 | learning rate: 7.708E-05 | global batch size: 256 | lm loss: 2.905678E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.772 | TFLOPs: 31.94 | +7: iteration 108100/ 173500 | consumed samples: 27673600 | consumed tokens: 56675532800 | elapsed time per iteration (s): 0.42 | learning rate: 7.707E-05 | global batch size: 256 | lm loss: 2.916562E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.762 | TFLOPs: 31.89 | +7: iteration 108110/ 173500 | consumed samples: 27676160 | consumed tokens: 56680775680 | elapsed time per iteration (s): 0.42 | learning rate: 7.705E-05 | global batch size: 256 | lm loss: 2.917278E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.663 | TFLOPs: 31.88 | +7: iteration 108120/ 173500 | consumed samples: 27678720 | consumed tokens: 56686018560 | elapsed time per iteration (s): 0.42 | learning rate: 7.704E-05 | global batch size: 256 | lm loss: 2.902762E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.898 | TFLOPs: 31.90 | +7: iteration 108130/ 173500 | consumed samples: 27681280 | consumed tokens: 56691261440 | elapsed time per iteration (s): 0.42 | learning rate: 7.702E-05 | global batch size: 256 | lm loss: 2.906753E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.875 | TFLOPs: 31.89 | +7: iteration 108140/ 173500 | consumed samples: 27683840 | consumed tokens: 56696504320 | elapsed time per iteration (s): 0.42 | learning rate: 7.701E-05 | global batch size: 256 | lm loss: 2.909700E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.822 | TFLOPs: 31.89 | +7: iteration 108150/ 173500 | consumed samples: 27686400 | consumed tokens: 56701747200 | elapsed time per iteration (s): 0.42 | learning rate: 7.699E-05 | global batch size: 256 | lm loss: 2.918531E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.231 | TFLOPs: 31.91 | +7: iteration 108160/ 173500 | consumed samples: 27688960 | consumed tokens: 56706990080 | elapsed time per iteration (s): 0.42 | learning rate: 7.698E-05 | global batch size: 256 | lm loss: 2.898466E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.327 | TFLOPs: 31.92 | +7: iteration 108170/ 173500 | consumed samples: 27691520 | consumed tokens: 56712232960 | elapsed time per iteration (s): 0.42 | learning rate: 7.696E-05 | global batch size: 256 | lm loss: 2.903620E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.377 | TFLOPs: 31.92 | +7: iteration 108180/ 173500 | consumed samples: 27694080 | consumed tokens: 56717475840 | elapsed time per iteration (s): 0.42 | learning rate: 7.694E-05 | global batch size: 256 | lm loss: 2.919578E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.125 | TFLOPs: 31.91 | +7: iteration 108190/ 173500 | consumed samples: 27696640 | consumed tokens: 56722718720 | elapsed time per iteration (s): 0.42 | learning rate: 7.693E-05 | global batch size: 256 | lm loss: 2.914799E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.520 | TFLOPs: 31.93 | +7: iteration 108200/ 173500 | consumed samples: 27699200 | consumed tokens: 56727961600 | elapsed time per iteration (s): 0.42 | learning rate: 7.691E-05 | global batch size: 256 | lm loss: 2.901562E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.922 | TFLOPs: 31.90 | +7: iteration 108210/ 173500 | consumed samples: 27701760 | consumed tokens: 56733204480 | elapsed time per iteration (s): 0.42 | learning rate: 7.690E-05 | global batch size: 256 | lm loss: 2.923313E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.883 | TFLOPs: 31.89 | +7: iteration 108220/ 173500 | consumed samples: 27704320 | consumed tokens: 56738447360 | elapsed time per iteration (s): 0.43 | learning rate: 7.688E-05 | global batch size: 256 | lm loss: 2.921333E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.619 | TFLOPs: 31.57 | +7: iteration 108230/ 173500 | consumed samples: 27706880 | consumed tokens: 56743690240 | elapsed time per iteration (s): 0.42 | learning rate: 7.687E-05 | global batch size: 256 | lm loss: 2.907563E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.506 | TFLOPs: 31.93 | +7: iteration 108240/ 173500 | consumed samples: 27709440 | consumed tokens: 56748933120 | elapsed time per iteration (s): 0.42 | learning rate: 7.685E-05 | global batch size: 256 | lm loss: 2.899587E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.460 | TFLOPs: 31.92 | +7: iteration 108250/ 173500 | consumed samples: 27712000 | consumed tokens: 56754176000 | elapsed time per iteration (s): 0.42 | learning rate: 7.684E-05 | global batch size: 256 | lm loss: 2.921008E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.182 | TFLOPs: 31.86 | +7: iteration 108260/ 173500 | consumed samples: 27714560 | consumed tokens: 56759418880 | elapsed time per iteration (s): 0.42 | learning rate: 7.682E-05 | global batch size: 256 | lm loss: 2.904986E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.269 | TFLOPs: 31.91 | +7: iteration 108270/ 173500 | consumed samples: 27717120 | consumed tokens: 56764661760 | elapsed time per iteration (s): 0.42 | learning rate: 7.681E-05 | global batch size: 256 | lm loss: 2.916021E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.246 | TFLOPs: 31.91 | +7: iteration 108280/ 173500 | consumed samples: 27719680 | consumed tokens: 56769904640 | elapsed time per iteration (s): 0.42 | learning rate: 7.679E-05 | global batch size: 256 | lm loss: 2.920284E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.138 | TFLOPs: 31.91 | +7: iteration 108290/ 173500 | consumed samples: 27722240 | consumed tokens: 56775147520 | elapsed time per iteration (s): 0.42 | learning rate: 7.678E-05 | global batch size: 256 | lm loss: 2.908631E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.431 | TFLOPs: 31.92 | +7: iteration 108300/ 173500 | consumed samples: 27724800 | consumed tokens: 56780390400 | elapsed time per iteration (s): 0.42 | learning rate: 7.676E-05 | global batch size: 256 | lm loss: 2.912708E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.218 | TFLOPs: 31.91 | +7: iteration 108310/ 173500 | consumed samples: 27727360 | consumed tokens: 56785633280 | elapsed time per iteration (s): 0.42 | learning rate: 7.675E-05 | global batch size: 256 | lm loss: 2.907811E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.314 | TFLOPs: 31.65 | +7: iteration 108320/ 173500 | consumed samples: 27729920 | consumed tokens: 56790876160 | elapsed time per iteration (s): 0.42 | learning rate: 7.673E-05 | global batch size: 256 | lm loss: 2.909323E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.748 | TFLOPs: 31.94 | +7: iteration 108330/ 173500 | consumed samples: 27732480 | consumed tokens: 56796119040 | elapsed time per iteration (s): 0.42 | learning rate: 7.672E-05 | global batch size: 256 | lm loss: 2.906341E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.467 | TFLOPs: 31.93 | +7: iteration 108340/ 173500 | consumed samples: 27735040 | consumed tokens: 56801361920 | elapsed time per iteration (s): 0.42 | learning rate: 7.670E-05 | global batch size: 256 | lm loss: 2.906793E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.035 | TFLOPs: 31.90 | +7: iteration 108350/ 173500 | consumed samples: 27737600 | consumed tokens: 56806604800 | elapsed time per iteration (s): 0.42 | learning rate: 7.668E-05 | global batch size: 256 | lm loss: 2.906788E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.865 | TFLOPs: 31.89 | +7: iteration 108360/ 173500 | consumed samples: 27740160 | consumed tokens: 56811847680 | elapsed time per iteration (s): 0.42 | learning rate: 7.667E-05 | global batch size: 256 | lm loss: 2.916182E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.181 | TFLOPs: 31.91 | +7: iteration 108370/ 173500 | consumed samples: 27742720 | consumed tokens: 56817090560 | elapsed time per iteration (s): 0.42 | learning rate: 7.665E-05 | global batch size: 256 | lm loss: 2.914353E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.409 | TFLOPs: 31.92 | +7: iteration 108380/ 173500 | consumed samples: 27745280 | consumed tokens: 56822333440 | elapsed time per iteration (s): 0.42 | learning rate: 7.664E-05 | global batch size: 256 | lm loss: 2.921363E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.142 | TFLOPs: 31.91 | +7: iteration 108390/ 173500 | consumed samples: 27747840 | consumed tokens: 56827576320 | elapsed time per iteration (s): 0.42 | learning rate: 7.662E-05 | global batch size: 256 | lm loss: 2.905297E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.975 | TFLOPs: 31.90 | +7: iteration 108400/ 173500 | consumed samples: 27750400 | consumed tokens: 56832819200 | elapsed time per iteration (s): 0.42 | learning rate: 7.661E-05 | global batch size: 256 | lm loss: 2.909866E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.098 | TFLOPs: 31.91 | +7: iteration 108410/ 173500 | consumed samples: 27752960 | consumed tokens: 56838062080 | elapsed time per iteration (s): 0.42 | learning rate: 7.659E-05 | global batch size: 256 | lm loss: 2.906618E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.366 | TFLOPs: 31.92 | +7: iteration 108420/ 173500 | consumed samples: 27755520 | consumed tokens: 56843304960 | elapsed time per iteration (s): 0.42 | learning rate: 7.658E-05 | global batch size: 256 | lm loss: 2.900080E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.494 | TFLOPs: 31.93 | +7: iteration 108430/ 173500 | consumed samples: 27758080 | consumed tokens: 56848547840 | elapsed time per iteration (s): 0.42 | learning rate: 7.656E-05 | global batch size: 256 | lm loss: 2.919074E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.904 | TFLOPs: 31.74 | +7: iteration 108440/ 173500 | consumed samples: 27760640 | consumed tokens: 56853790720 | elapsed time per iteration (s): 0.42 | learning rate: 7.655E-05 | global batch size: 256 | lm loss: 2.909506E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.773 | TFLOPs: 31.94 | +7: iteration 108450/ 173500 | consumed samples: 27763200 | consumed tokens: 56859033600 | elapsed time per iteration (s): 0.42 | learning rate: 7.653E-05 | global batch size: 256 | lm loss: 2.913386E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.869 | TFLOPs: 31.95 | +7: iteration 108460/ 173500 | consumed samples: 27765760 | consumed tokens: 56864276480 | elapsed time per iteration (s): 0.42 | learning rate: 7.652E-05 | global batch size: 256 | lm loss: 2.909375E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.306 | TFLOPs: 31.92 | +7: iteration 108470/ 173500 | consumed samples: 27768320 | consumed tokens: 56869519360 | elapsed time per iteration (s): 0.42 | learning rate: 7.650E-05 | global batch size: 256 | lm loss: 2.917419E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.410 | TFLOPs: 31.92 | +7: iteration 108480/ 173500 | consumed samples: 27770880 | consumed tokens: 56874762240 | elapsed time per iteration (s): 0.42 | learning rate: 7.649E-05 | global batch size: 256 | lm loss: 2.913087E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.201 | TFLOPs: 31.91 | +7: iteration 108490/ 173500 | consumed samples: 27773440 | consumed tokens: 56880005120 | elapsed time per iteration (s): 0.42 | learning rate: 7.647E-05 | global batch size: 256 | lm loss: 2.905500E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.325 | TFLOPs: 31.92 | +7: iteration 108500/ 173500 | consumed samples: 27776000 | consumed tokens: 56885248000 | elapsed time per iteration (s): 0.42 | learning rate: 7.646E-05 | global batch size: 256 | lm loss: 2.912526E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.893 | TFLOPs: 31.90 | +7: iteration 108510/ 173500 | consumed samples: 27778560 | consumed tokens: 56890490880 | elapsed time per iteration (s): 0.42 | learning rate: 7.644E-05 | global batch size: 256 | lm loss: 2.915706E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.059 | TFLOPs: 31.90 | +7: iteration 108520/ 173500 | consumed samples: 27781120 | consumed tokens: 56895733760 | elapsed time per iteration (s): 0.42 | learning rate: 7.642E-05 | global batch size: 256 | lm loss: 2.908888E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.808 | TFLOPs: 31.89 | +7: iteration 108530/ 173500 | consumed samples: 27783680 | consumed tokens: 56900976640 | elapsed time per iteration (s): 0.42 | learning rate: 7.641E-05 | global batch size: 256 | lm loss: 2.901699E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.731 | TFLOPs: 31.83 | +7: iteration 108540/ 173500 | consumed samples: 27786240 | consumed tokens: 56906219520 | elapsed time per iteration (s): 0.42 | learning rate: 7.639E-05 | global batch size: 256 | lm loss: 2.892983E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.699 | TFLOPs: 31.88 | +7: iteration 108550/ 173500 | consumed samples: 27788800 | consumed tokens: 56911462400 | elapsed time per iteration (s): 0.42 | learning rate: 7.638E-05 | global batch size: 256 | lm loss: 2.908587E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.560 | TFLOPs: 31.88 | +7: iteration 108560/ 173500 | consumed samples: 27791360 | consumed tokens: 56916705280 | elapsed time per iteration (s): 0.42 | learning rate: 7.636E-05 | global batch size: 256 | lm loss: 2.926936E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.716 | TFLOPs: 31.89 | +7: iteration 108570/ 173500 | consumed samples: 27793920 | consumed tokens: 56921948160 | elapsed time per iteration (s): 0.42 | learning rate: 7.635E-05 | global batch size: 256 | lm loss: 2.902952E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.814 | TFLOPs: 31.89 | +7: iteration 108580/ 173500 | consumed samples: 27796480 | consumed tokens: 56927191040 | elapsed time per iteration (s): 0.42 | learning rate: 7.633E-05 | global batch size: 256 | lm loss: 2.923451E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.544 | TFLOPs: 31.88 | +7: iteration 108590/ 173500 | consumed samples: 27799040 | consumed tokens: 56932433920 | elapsed time per iteration (s): 0.42 | learning rate: 7.632E-05 | global batch size: 256 | lm loss: 2.914317E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.363 | TFLOPs: 31.87 | +7: iteration 108600/ 173500 | consumed samples: 27801600 | consumed tokens: 56937676800 | elapsed time per iteration (s): 0.42 | learning rate: 7.630E-05 | global batch size: 256 | lm loss: 2.908656E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.833 | TFLOPs: 31.89 | +7: iteration 108610/ 173500 | consumed samples: 27804160 | consumed tokens: 56942919680 | elapsed time per iteration (s): 0.42 | learning rate: 7.629E-05 | global batch size: 256 | lm loss: 2.908761E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.009 | TFLOPs: 31.90 | +7: iteration 108620/ 173500 | consumed samples: 27806720 | consumed tokens: 56948162560 | elapsed time per iteration (s): 0.42 | learning rate: 7.627E-05 | global batch size: 256 | lm loss: 2.913646E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.558 | TFLOPs: 31.88 | +7: iteration 108630/ 173500 | consumed samples: 27809280 | consumed tokens: 56953405440 | elapsed time per iteration (s): 0.42 | learning rate: 7.626E-05 | global batch size: 256 | lm loss: 2.913509E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.236 | TFLOPs: 31.91 | +7: iteration 108640/ 173500 | consumed samples: 27811840 | consumed tokens: 56958648320 | elapsed time per iteration (s): 0.42 | learning rate: 7.624E-05 | global batch size: 256 | lm loss: 2.892986E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.008 | TFLOPs: 31.90 | +7: iteration 108650/ 173500 | consumed samples: 27814400 | consumed tokens: 56963891200 | elapsed time per iteration (s): 0.42 | learning rate: 7.623E-05 | global batch size: 256 | lm loss: 2.908325E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.648 | TFLOPs: 31.88 | +7: iteration 108660/ 173500 | consumed samples: 27816960 | consumed tokens: 56969134080 | elapsed time per iteration (s): 0.42 | learning rate: 7.621E-05 | global batch size: 256 | lm loss: 2.896887E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.248 | TFLOPs: 31.91 | +7: iteration 108670/ 173500 | consumed samples: 27819520 | consumed tokens: 56974376960 | elapsed time per iteration (s): 0.42 | learning rate: 7.620E-05 | global batch size: 256 | lm loss: 2.911812E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.118 | TFLOPs: 31.91 | +7: iteration 108680/ 173500 | consumed samples: 27822080 | consumed tokens: 56979619840 | elapsed time per iteration (s): 0.42 | learning rate: 7.618E-05 | global batch size: 256 | lm loss: 2.919591E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.864 | TFLOPs: 31.89 | +7: iteration 108690/ 173500 | consumed samples: 27824640 | consumed tokens: 56984862720 | elapsed time per iteration (s): 0.42 | learning rate: 7.617E-05 | global batch size: 256 | lm loss: 2.898841E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.184 | TFLOPs: 31.91 | +7: iteration 108700/ 173500 | consumed samples: 27827200 | consumed tokens: 56990105600 | elapsed time per iteration (s): 0.43 | learning rate: 7.615E-05 | global batch size: 256 | lm loss: 2.911008E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.693 | TFLOPs: 31.52 | +7: iteration 108710/ 173500 | consumed samples: 27829760 | consumed tokens: 56995348480 | elapsed time per iteration (s): 0.42 | learning rate: 7.613E-05 | global batch size: 256 | lm loss: 2.911227E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.385 | TFLOPs: 31.92 | +7: iteration 108720/ 173500 | consumed samples: 27832320 | consumed tokens: 57000591360 | elapsed time per iteration (s): 0.42 | learning rate: 7.612E-05 | global batch size: 256 | lm loss: 2.911427E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.276 | TFLOPs: 31.92 | +7: iteration 108730/ 173500 | consumed samples: 27834880 | consumed tokens: 57005834240 | elapsed time per iteration (s): 0.42 | learning rate: 7.610E-05 | global batch size: 256 | lm loss: 2.904585E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.613 | TFLOPs: 31.88 | +7: iteration 108740/ 173500 | consumed samples: 27837440 | consumed tokens: 57011077120 | elapsed time per iteration (s): 0.42 | learning rate: 7.609E-05 | global batch size: 256 | lm loss: 2.920868E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.297 | TFLOPs: 31.86 | +7: iteration 108750/ 173500 | consumed samples: 27840000 | consumed tokens: 57016320000 | elapsed time per iteration (s): 0.42 | learning rate: 7.607E-05 | global batch size: 256 | lm loss: 2.906129E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.901 | TFLOPs: 31.84 | +7: iteration 108760/ 173500 | consumed samples: 27842560 | consumed tokens: 57021562880 | elapsed time per iteration (s): 0.42 | learning rate: 7.606E-05 | global batch size: 256 | lm loss: 2.916093E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.998 | TFLOPs: 31.90 | +7: iteration 108770/ 173500 | consumed samples: 27845120 | consumed tokens: 57026805760 | elapsed time per iteration (s): 0.42 | learning rate: 7.604E-05 | global batch size: 256 | lm loss: 2.917489E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.534 | TFLOPs: 31.88 | +7: iteration 108780/ 173500 | consumed samples: 27847680 | consumed tokens: 57032048640 | elapsed time per iteration (s): 0.42 | learning rate: 7.603E-05 | global batch size: 256 | lm loss: 2.917340E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.141 | TFLOPs: 31.91 | +7: iteration 108790/ 173500 | consumed samples: 27850240 | consumed tokens: 57037291520 | elapsed time per iteration (s): 0.42 | learning rate: 7.601E-05 | global batch size: 256 | lm loss: 2.905311E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.824 | TFLOPs: 31.89 | +7: iteration 108800/ 173500 | consumed samples: 27852800 | consumed tokens: 57042534400 | elapsed time per iteration (s): 0.42 | learning rate: 7.600E-05 | global batch size: 256 | lm loss: 2.905503E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.887 | TFLOPs: 31.89 | +7: iteration 108810/ 173500 | consumed samples: 27855360 | consumed tokens: 57047777280 | elapsed time per iteration (s): 0.42 | learning rate: 7.598E-05 | global batch size: 256 | lm loss: 2.908946E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.962 | TFLOPs: 31.90 | +7: iteration 108820/ 173500 | consumed samples: 27857920 | consumed tokens: 57053020160 | elapsed time per iteration (s): 0.42 | learning rate: 7.597E-05 | global batch size: 256 | lm loss: 2.907335E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.910 | TFLOPs: 31.63 | +7: iteration 108830/ 173500 | consumed samples: 27860480 | consumed tokens: 57058263040 | elapsed time per iteration (s): 0.42 | learning rate: 7.595E-05 | global batch size: 256 | lm loss: 2.902858E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.297 | TFLOPs: 31.92 | +7: iteration 108840/ 173500 | consumed samples: 27863040 | consumed tokens: 57063505920 | elapsed time per iteration (s): 0.42 | learning rate: 7.594E-05 | global batch size: 256 | lm loss: 2.896827E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.406 | TFLOPs: 31.92 | +7: iteration 108850/ 173500 | consumed samples: 27865600 | consumed tokens: 57068748800 | elapsed time per iteration (s): 0.42 | learning rate: 7.592E-05 | global batch size: 256 | lm loss: 2.902774E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.695 | TFLOPs: 31.88 | +7: iteration 108860/ 173500 | consumed samples: 27868160 | consumed tokens: 57073991680 | elapsed time per iteration (s): 0.42 | learning rate: 7.591E-05 | global batch size: 256 | lm loss: 2.900611E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.840 | TFLOPs: 31.89 | +7: iteration 108870/ 173500 | consumed samples: 27870720 | consumed tokens: 57079234560 | elapsed time per iteration (s): 0.42 | learning rate: 7.589E-05 | global batch size: 256 | lm loss: 2.898385E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.242 | TFLOPs: 31.91 | +7: iteration 108880/ 173500 | consumed samples: 27873280 | consumed tokens: 57084477440 | elapsed time per iteration (s): 0.42 | learning rate: 7.588E-05 | global batch size: 256 | lm loss: 2.901246E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.313 | TFLOPs: 31.92 | +7: iteration 108890/ 173500 | consumed samples: 27875840 | consumed tokens: 57089720320 | elapsed time per iteration (s): 0.42 | learning rate: 7.586E-05 | global batch size: 256 | lm loss: 2.919153E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.892 | TFLOPs: 31.90 | +7: iteration 108900/ 173500 | consumed samples: 27878400 | consumed tokens: 57094963200 | elapsed time per iteration (s): 0.42 | learning rate: 7.585E-05 | global batch size: 256 | lm loss: 2.907200E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.404 | TFLOPs: 31.66 | +7: iteration 108910/ 173500 | consumed samples: 27880960 | consumed tokens: 57100206080 | elapsed time per iteration (s): 0.42 | learning rate: 7.583E-05 | global batch size: 256 | lm loss: 2.916216E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.303 | TFLOPs: 31.92 | +7: iteration 108920/ 173500 | consumed samples: 27883520 | consumed tokens: 57105448960 | elapsed time per iteration (s): 0.42 | learning rate: 7.581E-05 | global batch size: 256 | lm loss: 2.908131E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.990 | TFLOPs: 31.90 | +7: iteration 108930/ 173500 | consumed samples: 27886080 | consumed tokens: 57110691840 | elapsed time per iteration (s): 0.42 | learning rate: 7.580E-05 | global batch size: 256 | lm loss: 2.896947E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.478 | TFLOPs: 31.93 | +7: iteration 108940/ 173500 | consumed samples: 27888640 | consumed tokens: 57115934720 | elapsed time per iteration (s): 0.42 | learning rate: 7.578E-05 | global batch size: 256 | lm loss: 2.905282E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.272 | TFLOPs: 31.92 | +7: iteration 108950/ 173500 | consumed samples: 27891200 | consumed tokens: 57121177600 | elapsed time per iteration (s): 0.42 | learning rate: 7.577E-05 | global batch size: 256 | lm loss: 2.916044E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.152 | TFLOPs: 31.91 | +7: iteration 108960/ 173500 | consumed samples: 27893760 | consumed tokens: 57126420480 | elapsed time per iteration (s): 0.42 | learning rate: 7.575E-05 | global batch size: 256 | lm loss: 2.915407E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.540 | TFLOPs: 31.93 | +7: iteration 108970/ 173500 | consumed samples: 27896320 | consumed tokens: 57131663360 | elapsed time per iteration (s): 0.42 | learning rate: 7.574E-05 | global batch size: 256 | lm loss: 2.887889E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.125 | TFLOPs: 31.96 | +7: iteration 108980/ 173500 | consumed samples: 27898880 | consumed tokens: 57136906240 | elapsed time per iteration (s): 0.42 | learning rate: 7.572E-05 | global batch size: 256 | lm loss: 2.918997E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.700 | TFLOPs: 31.94 | +7: iteration 108990/ 173500 | consumed samples: 27901440 | consumed tokens: 57142149120 | elapsed time per iteration (s): 0.42 | learning rate: 7.571E-05 | global batch size: 256 | lm loss: 2.910583E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.832 | TFLOPs: 31.94 | +7: iteration 109000/ 173500 | consumed samples: 27904000 | consumed tokens: 57147392000 | elapsed time per iteration (s): 0.42 | learning rate: 7.569E-05 | global batch size: 256 | lm loss: 2.911799E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.481 | TFLOPs: 31.93 | +7: iteration 109010/ 173500 | consumed samples: 27906560 | consumed tokens: 57152634880 | elapsed time per iteration (s): 0.42 | learning rate: 7.568E-05 | global batch size: 256 | lm loss: 2.895071E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.477 | TFLOPs: 31.93 | +7: iteration 109020/ 173500 | consumed samples: 27909120 | consumed tokens: 57157877760 | elapsed time per iteration (s): 0.42 | learning rate: 7.566E-05 | global batch size: 256 | lm loss: 2.903943E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.517 | TFLOPs: 31.88 | +7: iteration 109030/ 173500 | consumed samples: 27911680 | consumed tokens: 57163120640 | elapsed time per iteration (s): 0.42 | learning rate: 7.565E-05 | global batch size: 256 | lm loss: 2.906289E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.326 | TFLOPs: 31.92 | +7: iteration 109040/ 173500 | consumed samples: 27914240 | consumed tokens: 57168363520 | elapsed time per iteration (s): 0.42 | learning rate: 7.563E-05 | global batch size: 256 | lm loss: 2.906756E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.533 | TFLOPs: 31.88 | +7: iteration 109050/ 173500 | consumed samples: 27916800 | consumed tokens: 57173606400 | elapsed time per iteration (s): 0.42 | learning rate: 7.562E-05 | global batch size: 256 | lm loss: 2.912300E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.693 | TFLOPs: 31.94 | +7: iteration 109060/ 173500 | consumed samples: 27919360 | consumed tokens: 57178849280 | elapsed time per iteration (s): 0.42 | learning rate: 7.560E-05 | global batch size: 256 | lm loss: 2.917688E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.481 | TFLOPs: 31.93 | +7: iteration 109070/ 173500 | consumed samples: 27921920 | consumed tokens: 57184092160 | elapsed time per iteration (s): 0.42 | learning rate: 7.559E-05 | global batch size: 256 | lm loss: 2.912739E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.958 | TFLOPs: 31.95 | +7: iteration 109080/ 173500 | consumed samples: 27924480 | consumed tokens: 57189335040 | elapsed time per iteration (s): 0.42 | learning rate: 7.557E-05 | global batch size: 256 | lm loss: 2.903135E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.200 | TFLOPs: 31.91 | +7: iteration 109090/ 173500 | consumed samples: 27927040 | consumed tokens: 57194577920 | elapsed time per iteration (s): 0.42 | learning rate: 7.556E-05 | global batch size: 256 | lm loss: 2.931354E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.003 | TFLOPs: 31.95 | +7: iteration 109100/ 173500 | consumed samples: 27929600 | consumed tokens: 57199820800 | elapsed time per iteration (s): 0.42 | learning rate: 7.554E-05 | global batch size: 256 | lm loss: 2.912191E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.795 | TFLOPs: 31.94 | +7: iteration 109110/ 173500 | consumed samples: 27932160 | consumed tokens: 57205063680 | elapsed time per iteration (s): 0.46 | learning rate: 7.553E-05 | global batch size: 256 | lm loss: 2.902622E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.926 | TFLOPs: 29.22 | +7: iteration 109120/ 173500 | consumed samples: 27934720 | consumed tokens: 57210306560 | elapsed time per iteration (s): 0.44 | learning rate: 7.551E-05 | global batch size: 256 | lm loss: 2.912297E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.908 | TFLOPs: 30.43 | +7: iteration 109130/ 173500 | consumed samples: 27937280 | consumed tokens: 57215549440 | elapsed time per iteration (s): 0.44 | learning rate: 7.550E-05 | global batch size: 256 | lm loss: 2.910039E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.280 | TFLOPs: 30.24 | +7: iteration 109140/ 173500 | consumed samples: 27939840 | consumed tokens: 57220792320 | elapsed time per iteration (s): 0.42 | learning rate: 7.548E-05 | global batch size: 256 | lm loss: 2.902820E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.153 | TFLOPs: 32.07 | +7: iteration 109150/ 173500 | consumed samples: 27942400 | consumed tokens: 57226035200 | elapsed time per iteration (s): 0.42 | learning rate: 7.546E-05 | global batch size: 256 | lm loss: 2.911261E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.472 | TFLOPs: 32.03 | +7: iteration 109160/ 173500 | consumed samples: 27944960 | consumed tokens: 57231278080 | elapsed time per iteration (s): 0.42 | learning rate: 7.545E-05 | global batch size: 256 | lm loss: 2.917119E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.149 | TFLOPs: 32.01 | +7: iteration 109170/ 173500 | consumed samples: 27947520 | consumed tokens: 57236520960 | elapsed time per iteration (s): 0.42 | learning rate: 7.543E-05 | global batch size: 256 | lm loss: 2.900964E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.242 | TFLOPs: 31.97 | +7: iteration 109180/ 173500 | consumed samples: 27950080 | consumed tokens: 57241763840 | elapsed time per iteration (s): 0.42 | learning rate: 7.542E-05 | global batch size: 256 | lm loss: 2.903592E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.497 | TFLOPs: 31.98 | +7: iteration 109190/ 173500 | consumed samples: 27952640 | consumed tokens: 57247006720 | elapsed time per iteration (s): 0.42 | learning rate: 7.540E-05 | global batch size: 256 | lm loss: 2.908829E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.194 | TFLOPs: 31.81 | +7: iteration 109200/ 173500 | consumed samples: 27955200 | consumed tokens: 57252249600 | elapsed time per iteration (s): 0.42 | learning rate: 7.539E-05 | global batch size: 256 | lm loss: 2.899107E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.985 | TFLOPs: 31.64 | +7: iteration 109210/ 173500 | consumed samples: 27957760 | consumed tokens: 57257492480 | elapsed time per iteration (s): 0.42 | learning rate: 7.537E-05 | global batch size: 256 | lm loss: 2.898134E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.129 | TFLOPs: 32.01 | +7: iteration 109220/ 173500 | consumed samples: 27960320 | consumed tokens: 57262735360 | elapsed time per iteration (s): 0.42 | learning rate: 7.536E-05 | global batch size: 256 | lm loss: 2.917280E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.386 | TFLOPs: 31.97 | +7: iteration 109230/ 173500 | consumed samples: 27962880 | consumed tokens: 57267978240 | elapsed time per iteration (s): 0.42 | learning rate: 7.534E-05 | global batch size: 256 | lm loss: 2.914215E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.856 | TFLOPs: 32.00 | +7: iteration 109240/ 173500 | consumed samples: 27965440 | consumed tokens: 57273221120 | elapsed time per iteration (s): 0.42 | learning rate: 7.533E-05 | global batch size: 256 | lm loss: 2.891553E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.423 | TFLOPs: 31.98 | +7: iteration 109250/ 173500 | consumed samples: 27968000 | consumed tokens: 57278464000 | elapsed time per iteration (s): 0.42 | learning rate: 7.531E-05 | global batch size: 256 | lm loss: 2.892109E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.014 | TFLOPs: 32.01 | +7: iteration 109260/ 173500 | consumed samples: 27970560 | consumed tokens: 57283706880 | elapsed time per iteration (s): 0.42 | learning rate: 7.530E-05 | global batch size: 256 | lm loss: 2.910041E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.734 | TFLOPs: 31.99 | +7: iteration 109270/ 173500 | consumed samples: 27973120 | consumed tokens: 57288949760 | elapsed time per iteration (s): 0.42 | learning rate: 7.528E-05 | global batch size: 256 | lm loss: 2.915783E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.942 | TFLOPs: 31.95 | +7: iteration 109280/ 173500 | consumed samples: 27975680 | consumed tokens: 57294192640 | elapsed time per iteration (s): 0.42 | learning rate: 7.527E-05 | global batch size: 256 | lm loss: 2.912195E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.144 | TFLOPs: 31.96 | +7: iteration 109290/ 173500 | consumed samples: 27978240 | consumed tokens: 57299435520 | elapsed time per iteration (s): 0.42 | learning rate: 7.525E-05 | global batch size: 256 | lm loss: 2.904655E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.234 | TFLOPs: 31.97 | +7: iteration 109300/ 173500 | consumed samples: 27980800 | consumed tokens: 57304678400 | elapsed time per iteration (s): 0.42 | learning rate: 7.524E-05 | global batch size: 256 | lm loss: 2.907286E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.440 | TFLOPs: 31.92 | +7: iteration 109310/ 173500 | consumed samples: 27983360 | consumed tokens: 57309921280 | elapsed time per iteration (s): 0.42 | learning rate: 7.522E-05 | global batch size: 256 | lm loss: 2.910794E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.196 | TFLOPs: 31.91 | +7: iteration 109320/ 173500 | consumed samples: 27985920 | consumed tokens: 57315164160 | elapsed time per iteration (s): 0.42 | learning rate: 7.521E-05 | global batch size: 256 | lm loss: 2.912911E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.174 | TFLOPs: 31.91 | +7: iteration 109330/ 173500 | consumed samples: 27988480 | consumed tokens: 57320407040 | elapsed time per iteration (s): 0.42 | learning rate: 7.519E-05 | global batch size: 256 | lm loss: 2.916422E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.663 | TFLOPs: 31.88 | +7: iteration 109340/ 173500 | consumed samples: 27991040 | consumed tokens: 57325649920 | elapsed time per iteration (s): 0.42 | learning rate: 7.518E-05 | global batch size: 256 | lm loss: 2.895443E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.470 | TFLOPs: 31.93 | +7: iteration 109350/ 173500 | consumed samples: 27993600 | consumed tokens: 57330892800 | elapsed time per iteration (s): 0.42 | learning rate: 7.516E-05 | global batch size: 256 | lm loss: 2.906348E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.893 | TFLOPs: 31.95 | +7: iteration 109360/ 173500 | consumed samples: 27996160 | consumed tokens: 57336135680 | elapsed time per iteration (s): 0.42 | learning rate: 7.515E-05 | global batch size: 256 | lm loss: 2.883896E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.817 | TFLOPs: 31.94 | +7: iteration 109370/ 173500 | consumed samples: 27998720 | consumed tokens: 57341378560 | elapsed time per iteration (s): 0.42 | learning rate: 7.513E-05 | global batch size: 256 | lm loss: 2.908516E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.825 | TFLOPs: 31.94 | +7: iteration 109380/ 173500 | consumed samples: 28001280 | consumed tokens: 57346621440 | elapsed time per iteration (s): 0.42 | learning rate: 7.512E-05 | global batch size: 256 | lm loss: 2.905431E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.698 | TFLOPs: 31.94 | +7: iteration 109390/ 173500 | consumed samples: 28003840 | consumed tokens: 57351864320 | elapsed time per iteration (s): 0.42 | learning rate: 7.510E-05 | global batch size: 256 | lm loss: 2.900531E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.540 | TFLOPs: 31.93 | +7: iteration 109400/ 173500 | consumed samples: 28006400 | consumed tokens: 57357107200 | elapsed time per iteration (s): 0.42 | learning rate: 7.509E-05 | global batch size: 256 | lm loss: 2.903111E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.558 | TFLOPs: 31.88 | +7: iteration 109410/ 173500 | consumed samples: 28008960 | consumed tokens: 57362350080 | elapsed time per iteration (s): 0.42 | learning rate: 7.507E-05 | global batch size: 256 | lm loss: 2.917370E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.525 | TFLOPs: 31.93 | +7: iteration 109420/ 173500 | consumed samples: 28011520 | consumed tokens: 57367592960 | elapsed time per iteration (s): 0.42 | learning rate: 7.505E-05 | global batch size: 256 | lm loss: 2.908883E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.949 | TFLOPs: 31.90 | +7: iteration 109430/ 173500 | consumed samples: 28014080 | consumed tokens: 57372835840 | elapsed time per iteration (s): 0.42 | learning rate: 7.504E-05 | global batch size: 256 | lm loss: 2.916162E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.485 | TFLOPs: 31.93 | +7: iteration 109440/ 173500 | consumed samples: 28016640 | consumed tokens: 57378078720 | elapsed time per iteration (s): 0.42 | learning rate: 7.502E-05 | global batch size: 256 | lm loss: 2.912764E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.892 | TFLOPs: 31.84 | +7: iteration 109450/ 173500 | consumed samples: 28019200 | consumed tokens: 57383321600 | elapsed time per iteration (s): 0.51 | learning rate: 7.501E-05 | global batch size: 256 | lm loss: 2.897873E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 497.848 | TFLOPs: 26.12 | +7: iteration 109460/ 173500 | consumed samples: 28021760 | consumed tokens: 57388564480 | elapsed time per iteration (s): 0.42 | learning rate: 7.499E-05 | global batch size: 256 | lm loss: 2.895469E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.389 | TFLOPs: 32.18 | +7: iteration 109470/ 173500 | consumed samples: 28024320 | consumed tokens: 57393807360 | elapsed time per iteration (s): 0.44 | learning rate: 7.498E-05 | global batch size: 256 | lm loss: 2.917851E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.249 | TFLOPs: 30.81 | +7: iteration 109480/ 173500 | consumed samples: 28026880 | consumed tokens: 57399050240 | elapsed time per iteration (s): 0.42 | learning rate: 7.496E-05 | global batch size: 256 | lm loss: 2.914610E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.875 | TFLOPs: 32.05 | +7: iteration 109490/ 173500 | consumed samples: 28029440 | consumed tokens: 57404293120 | elapsed time per iteration (s): 0.43 | learning rate: 7.495E-05 | global batch size: 256 | lm loss: 2.912835E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.254 | TFLOPs: 31.23 | +7: iteration 109500/ 173500 | consumed samples: 28032000 | consumed tokens: 57409536000 | elapsed time per iteration (s): 0.43 | learning rate: 7.493E-05 | global batch size: 256 | lm loss: 2.901411E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.155 | TFLOPs: 31.33 | +7: iteration 109510/ 173500 | consumed samples: 28034560 | consumed tokens: 57414778880 | elapsed time per iteration (s): 0.43 | learning rate: 7.492E-05 | global batch size: 256 | lm loss: 2.911483E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.700 | TFLOPs: 31.52 | +7: iteration 109520/ 173500 | consumed samples: 28037120 | consumed tokens: 57420021760 | elapsed time per iteration (s): 0.42 | learning rate: 7.490E-05 | global batch size: 256 | lm loss: 2.907351E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.054 | TFLOPs: 31.64 | +7: iteration 109530/ 173500 | consumed samples: 28039680 | consumed tokens: 57425264640 | elapsed time per iteration (s): 0.42 | learning rate: 7.489E-05 | global batch size: 256 | lm loss: 2.903320E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.910 | TFLOPs: 31.63 | +7: iteration 109540/ 173500 | consumed samples: 28042240 | consumed tokens: 57430507520 | elapsed time per iteration (s): 0.43 | learning rate: 7.487E-05 | global batch size: 256 | lm loss: 2.911434E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.951 | TFLOPs: 31.27 | +7: iteration 109550/ 173500 | consumed samples: 28044800 | consumed tokens: 57435750400 | elapsed time per iteration (s): 0.43 | learning rate: 7.486E-05 | global batch size: 256 | lm loss: 2.911427E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.336 | TFLOPs: 31.39 | +7: iteration 109560/ 173500 | consumed samples: 28047360 | consumed tokens: 57440993280 | elapsed time per iteration (s): 0.42 | learning rate: 7.484E-05 | global batch size: 256 | lm loss: 2.897696E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.208 | TFLOPs: 31.65 | +7: iteration 109570/ 173500 | consumed samples: 28049920 | consumed tokens: 57446236160 | elapsed time per iteration (s): 0.43 | learning rate: 7.483E-05 | global batch size: 256 | lm loss: 2.900447E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.914 | TFLOPs: 31.53 | +7: iteration 109580/ 173500 | consumed samples: 28052480 | consumed tokens: 57451479040 | elapsed time per iteration (s): 0.43 | learning rate: 7.481E-05 | global batch size: 256 | lm loss: 2.914104E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.616 | TFLOPs: 31.15 | +7: iteration 109590/ 173500 | consumed samples: 28055040 | consumed tokens: 57456721920 | elapsed time per iteration (s): 0.42 | learning rate: 7.480E-05 | global batch size: 256 | lm loss: 2.910024E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.198 | TFLOPs: 31.81 | +7: iteration 109600/ 173500 | consumed samples: 28057600 | consumed tokens: 57461964800 | elapsed time per iteration (s): 0.45 | learning rate: 7.478E-05 | global batch size: 256 | lm loss: 2.919764E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.841 | TFLOPs: 29.74 | +7: iteration 109610/ 173500 | consumed samples: 28060160 | consumed tokens: 57467207680 | elapsed time per iteration (s): 0.43 | learning rate: 7.477E-05 | global batch size: 256 | lm loss: 2.912454E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.615 | TFLOPs: 31.04 | +7: iteration 109620/ 173500 | consumed samples: 28062720 | consumed tokens: 57472450560 | elapsed time per iteration (s): 0.45 | learning rate: 7.475E-05 | global batch size: 256 | lm loss: 2.904606E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.310 | TFLOPs: 29.61 | +7: iteration 109630/ 173500 | consumed samples: 28065280 | consumed tokens: 57477693440 | elapsed time per iteration (s): 0.43 | learning rate: 7.474E-05 | global batch size: 256 | lm loss: 2.911574E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.469 | TFLOPs: 30.93 | +7: iteration 109640/ 173500 | consumed samples: 28067840 | consumed tokens: 57482936320 | elapsed time per iteration (s): 0.42 | learning rate: 7.472E-05 | global batch size: 256 | lm loss: 2.905392E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.136 | TFLOPs: 31.75 | +7: iteration 109650/ 173500 | consumed samples: 28070400 | consumed tokens: 57488179200 | elapsed time per iteration (s): 0.42 | learning rate: 7.471E-05 | global batch size: 256 | lm loss: 2.900508E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.382 | TFLOPs: 31.61 | +7: iteration 109660/ 173500 | consumed samples: 28072960 | consumed tokens: 57493422080 | elapsed time per iteration (s): 0.43 | learning rate: 7.469E-05 | global batch size: 256 | lm loss: 2.902694E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.897 | TFLOPs: 31.42 | +7: iteration 109670/ 173500 | consumed samples: 28075520 | consumed tokens: 57498664960 | elapsed time per iteration (s): 0.44 | learning rate: 7.468E-05 | global batch size: 256 | lm loss: 2.905399E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.625 | TFLOPs: 30.83 | +7: iteration 109680/ 173500 | consumed samples: 28078080 | consumed tokens: 57503907840 | elapsed time per iteration (s): 0.43 | learning rate: 7.466E-05 | global batch size: 256 | lm loss: 2.913324E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.531 | TFLOPs: 31.30 | +7: iteration 109690/ 173500 | consumed samples: 28080640 | consumed tokens: 57509150720 | elapsed time per iteration (s): 0.43 | learning rate: 7.465E-05 | global batch size: 256 | lm loss: 2.902247E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.121 | TFLOPs: 31.23 | +7: iteration 109700/ 173500 | consumed samples: 28083200 | consumed tokens: 57514393600 | elapsed time per iteration (s): 0.44 | learning rate: 7.463E-05 | global batch size: 256 | lm loss: 2.893296E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.450 | TFLOPs: 30.51 | +7: iteration 109710/ 173500 | consumed samples: 28085760 | consumed tokens: 57519636480 | elapsed time per iteration (s): 0.43 | learning rate: 7.462E-05 | global batch size: 256 | lm loss: 2.911859E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.293 | TFLOPs: 31.44 | +7: iteration 109720/ 173500 | consumed samples: 28088320 | consumed tokens: 57524879360 | elapsed time per iteration (s): 0.42 | learning rate: 7.460E-05 | global batch size: 256 | lm loss: 2.916259E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.418 | TFLOPs: 31.82 | +7: iteration 109730/ 173500 | consumed samples: 28090880 | consumed tokens: 57530122240 | elapsed time per iteration (s): 0.55 | learning rate: 7.459E-05 | global batch size: 256 | lm loss: 2.902785E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 466.899 | TFLOPs: 24.50 | +7: iteration 109740/ 173500 | consumed samples: 28093440 | consumed tokens: 57535365120 | elapsed time per iteration (s): 0.44 | learning rate: 7.457E-05 | global batch size: 256 | lm loss: 2.908116E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.119 | TFLOPs: 30.81 | +7: iteration 109750/ 173500 | consumed samples: 28096000 | consumed tokens: 57540608000 | elapsed time per iteration (s): 0.42 | learning rate: 7.455E-05 | global batch size: 256 | lm loss: 2.908371E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.904 | TFLOPs: 31.95 | +7: iteration 109760/ 173500 | consumed samples: 28098560 | consumed tokens: 57545850880 | elapsed time per iteration (s): 0.43 | learning rate: 7.454E-05 | global batch size: 256 | lm loss: 2.899074E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.090 | TFLOPs: 31.43 | +7: iteration 109770/ 173500 | consumed samples: 28101120 | consumed tokens: 57551093760 | elapsed time per iteration (s): 0.43 | learning rate: 7.452E-05 | global batch size: 256 | lm loss: 2.900552E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.282 | TFLOPs: 31.34 | +7: iteration 109780/ 173500 | consumed samples: 28103680 | consumed tokens: 57556336640 | elapsed time per iteration (s): 0.47 | learning rate: 7.451E-05 | global batch size: 256 | lm loss: 2.903262E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.770 | TFLOPs: 28.43 | +7: iteration 109790/ 173500 | consumed samples: 28106240 | consumed tokens: 57561579520 | elapsed time per iteration (s): 0.51 | learning rate: 7.449E-05 | global batch size: 256 | lm loss: 2.902287E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 499.452 | TFLOPs: 26.21 | +7: iteration 109800/ 173500 | consumed samples: 28108800 | consumed tokens: 57566822400 | elapsed time per iteration (s): 0.67 | learning rate: 7.448E-05 | global batch size: 256 | lm loss: 2.888503E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 379.805 | TFLOPs: 19.93 | +7: iteration 109810/ 173500 | consumed samples: 28111360 | consumed tokens: 57572065280 | elapsed time per iteration (s): 0.45 | learning rate: 7.446E-05 | global batch size: 256 | lm loss: 2.904692E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.193 | TFLOPs: 29.65 | +7: iteration 109820/ 173500 | consumed samples: 28113920 | consumed tokens: 57577308160 | elapsed time per iteration (s): 0.44 | learning rate: 7.445E-05 | global batch size: 256 | lm loss: 2.906149E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.116 | TFLOPs: 30.44 | +7: iteration 109830/ 173500 | consumed samples: 28116480 | consumed tokens: 57582551040 | elapsed time per iteration (s): 0.45 | learning rate: 7.443E-05 | global batch size: 256 | lm loss: 2.907537E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.364 | TFLOPs: 30.08 | +7: iteration 109840/ 173500 | consumed samples: 28119040 | consumed tokens: 57587793920 | elapsed time per iteration (s): 0.44 | learning rate: 7.442E-05 | global batch size: 256 | lm loss: 2.899629E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.179 | TFLOPs: 30.23 | +7: iteration 109850/ 173500 | consumed samples: 28121600 | consumed tokens: 57593036800 | elapsed time per iteration (s): 0.45 | learning rate: 7.440E-05 | global batch size: 256 | lm loss: 2.911644E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.637 | TFLOPs: 30.10 | +7: iteration 109860/ 173500 | consumed samples: 28124160 | consumed tokens: 57598279680 | elapsed time per iteration (s): 0.47 | learning rate: 7.439E-05 | global batch size: 256 | lm loss: 2.910904E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.861 | TFLOPs: 28.43 | +7: iteration 109870/ 173500 | consumed samples: 28126720 | consumed tokens: 57603522560 | elapsed time per iteration (s): 0.47 | learning rate: 7.437E-05 | global batch size: 256 | lm loss: 2.898664E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.788 | TFLOPs: 28.43 | +7: iteration 109880/ 173500 | consumed samples: 28129280 | consumed tokens: 57608765440 | elapsed time per iteration (s): 0.45 | learning rate: 7.436E-05 | global batch size: 256 | lm loss: 2.918370E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.686 | TFLOPs: 29.73 | +7: iteration 109890/ 173500 | consumed samples: 28131840 | consumed tokens: 57614008320 | elapsed time per iteration (s): 0.45 | learning rate: 7.434E-05 | global batch size: 256 | lm loss: 2.920118E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.436 | TFLOPs: 29.93 | +7: iteration 109900/ 173500 | consumed samples: 28134400 | consumed tokens: 57619251200 | elapsed time per iteration (s): 0.48 | learning rate: 7.433E-05 | global batch size: 256 | lm loss: 2.907303E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.030 | TFLOPs: 28.18 | +7: iteration 109910/ 173500 | consumed samples: 28136960 | consumed tokens: 57624494080 | elapsed time per iteration (s): 0.48 | learning rate: 7.431E-05 | global batch size: 256 | lm loss: 2.894104E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.087 | TFLOPs: 27.76 | +7: iteration 109920/ 173500 | consumed samples: 28139520 | consumed tokens: 57629736960 | elapsed time per iteration (s): 0.50 | learning rate: 7.430E-05 | global batch size: 256 | lm loss: 2.912463E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 513.549 | TFLOPs: 26.95 | +7: iteration 109930/ 173500 | consumed samples: 28142080 | consumed tokens: 57634979840 | elapsed time per iteration (s): 0.44 | learning rate: 7.428E-05 | global batch size: 256 | lm loss: 2.888406E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.639 | TFLOPs: 30.83 | +7: iteration 109940/ 173500 | consumed samples: 28144640 | consumed tokens: 57640222720 | elapsed time per iteration (s): 0.44 | learning rate: 7.427E-05 | global batch size: 256 | lm loss: 2.906576E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.871 | TFLOPs: 30.32 | +7: iteration 109950/ 173500 | consumed samples: 28147200 | consumed tokens: 57645465600 | elapsed time per iteration (s): 0.42 | learning rate: 7.425E-05 | global batch size: 256 | lm loss: 2.907219E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.563 | TFLOPs: 31.93 | +7: iteration 109960/ 173500 | consumed samples: 28149760 | consumed tokens: 57650708480 | elapsed time per iteration (s): 0.43 | learning rate: 7.424E-05 | global batch size: 256 | lm loss: 2.908920E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.093 | TFLOPs: 31.12 | +7: iteration 109970/ 173500 | consumed samples: 28152320 | consumed tokens: 57655951360 | elapsed time per iteration (s): 0.42 | learning rate: 7.422E-05 | global batch size: 256 | lm loss: 2.897279E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.992 | TFLOPs: 31.64 | +7: iteration 109980/ 173500 | consumed samples: 28154880 | consumed tokens: 57661194240 | elapsed time per iteration (s): 0.44 | learning rate: 7.421E-05 | global batch size: 256 | lm loss: 2.906241E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.905 | TFLOPs: 30.74 | +7: iteration 109990/ 173500 | consumed samples: 28157440 | consumed tokens: 57666437120 | elapsed time per iteration (s): 0.43 | learning rate: 7.419E-05 | global batch size: 256 | lm loss: 2.896892E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.945 | TFLOPs: 31.01 | +0: [2023-03-17 12:14:41,549] [INFO] [logging.py:68:log_dist] [Rank 0] step=110000, skipped=0, lr=[7.417709678812063e-05, 7.417709678812063e-05, 7.417709678812063e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 110000/ 173500 | consumed samples: 28160000 | consumed tokens: 57671680000 | elapsed time per iteration (s): 0.43 | learning rate: 7.418E-05 | global batch size: 256 | lm loss: 2.900827E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.204 | TFLOPs: 31.39 | +0: steps: 110000 loss: 2.8747 iter time (s): 0.426 samples/sec: 600.384 +7: ------------------------------------------------------------------------------------------------- +7: validation loss at iteration 110000 | lm loss value: 3.328591E+00 | lm loss PPL: 2.789901E+01 | +7: ------------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 110000 to checkpoints_221m91b400m +0: [2023-03-17 12:14:41,772] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step110000 is begin to save! +0: [2023-03-17 12:14:41,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_01-model_00-model_states.pt... +0: [2023-03-17 12:14:41,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_01-model_00-model_states.pt. +0: [2023-03-17 12:14:41,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_03-model_00-model_states.pt... +0: [2023-03-17 12:14:41,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_03-model_00-model_states.pt. +0: [2023-03-17 12:14:41,926] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_04-model_00-model_states.pt... +0: [2023-03-17 12:14:41,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_04-model_00-model_states.pt. +0: [2023-03-17 12:14:41,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_05-model_00-model_states.pt... +0: [2023-03-17 12:14:41,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_05-model_00-model_states.pt. +0: [2023-03-17 12:14:41,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_06-model_00-model_states.pt... +0: [2023-03-17 12:14:42,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_06-model_00-model_states.pt. +0: [2023-03-17 12:14:42,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_07-model_00-model_states.pt... +0: [2023-03-17 12:14:42,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_07-model_00-model_states.pt. +0: [2023-03-17 12:14:42,026] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_08-model_00-model_states.pt... +0: [2023-03-17 12:14:42,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_08-model_00-model_states.pt. +0: [2023-03-17 12:14:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_09-model_00-model_states.pt... +0: [2023-03-17 12:14:42,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_09-model_00-model_states.pt. +0: [2023-03-17 12:14:42,076] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_10-model_00-model_states.pt... +0: [2023-03-17 12:14:42,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_10-model_00-model_states.pt. +0: [2023-03-17 12:14:42,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_11-model_00-model_states.pt... +0: [2023-03-17 12:14:42,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_11-model_00-model_states.pt. +0: [2023-03-17 12:14:42,127] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_12-model_00-model_states.pt... +0: [2023-03-17 12:14:42,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_12-model_00-model_states.pt. +0: [2023-03-17 12:14:42,153] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_13-model_00-model_states.pt... +0: [2023-03-17 12:14:42,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_13-model_00-model_states.pt. +0: [2023-03-17 12:14:42,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_14-model_00-model_states.pt... +0: [2023-03-17 12:14:42,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_14-model_00-model_states.pt. +0: [2023-03-17 12:14:42,204] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_15-model_00-model_states.pt... +0: [2023-03-17 12:14:42,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_15-model_00-model_states.pt. +0: [2023-03-17 12:14:42,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_16-model_00-model_states.pt... +0: [2023-03-17 12:14:42,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_16-model_00-model_states.pt. +0: [2023-03-17 12:14:42,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_17-model_00-model_states.pt... +0: [2023-03-17 12:14:42,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_17-model_00-model_states.pt. +0: [2023-03-17 12:14:42,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_18-model_00-model_states.pt... +0: [2023-03-17 12:14:42,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_18-model_00-model_states.pt. +0: [2023-03-17 12:14:42,306] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_19-model_00-model_states.pt... +0: [2023-03-17 12:14:42,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_19-model_00-model_states.pt. +0: [2023-03-17 12:14:42,330] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_20-model_00-model_states.pt... +0: [2023-03-17 12:14:42,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_20-model_00-model_states.pt. +0: [2023-03-17 12:14:42,355] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/layer_22-model_00-model_states.pt... +0: [2023-03-17 12:14:42,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/layer_22-model_00-model_states.pt. +0: [2023-03-17 12:14:42,362] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m91b400m/global_step110000/mp_rank_00_model_states.pt +0: [2023-03-17 12:14:42,362] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/mp_rank_00_model_states.pt... +0: [2023-03-17 12:14:42,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/mp_rank_00_model_states.pt. +0: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +1: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +7: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +6: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +2: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +3: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +1: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +7: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +6: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +2: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +4: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +3: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +2: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +4: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +5: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-17 12:14:42,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +0: [2023-03-17 12:14:42,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-17 12:14:42,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-17 12:14:42,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-17 12:14:42,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +0: [2023-03-17 12:14:42,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-17 12:14:42,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-17 12:14:42,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +0: [2023-03-17 12:14:42,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-17 12:14:42,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-17 12:14:42,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +0: [2023-03-17 12:14:42,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-17 12:14:42,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-17 12:14:42,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +0: [2023-03-17 12:14:42,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-17 12:14:42,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-17 12:14:42,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-17 12:14:42,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-17 12:14:42,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-17 12:14:42,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-17 12:14:42,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +0: [2023-03-17 12:14:42,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +0: [2023-03-17 12:14:42,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +7: [2023-03-17 12:14:42,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-17 12:14:42,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-17 12:14:42,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-17 12:14:42,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +7: [2023-03-17 12:14:42,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-17 12:14:42,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +7: [2023-03-17 12:14:42,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-17 12:14:42,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-17 12:14:42,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +7: [2023-03-17 12:14:42,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-17 12:14:42,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-17 12:14:42,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +3: [2023-03-17 12:14:42,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-17 12:14:42,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-17 12:14:42,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +3: [2023-03-17 12:14:42,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-17 12:14:42,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-17 12:14:42,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +3: [2023-03-17 12:14:42,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-17 12:14:42,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-17 12:14:42,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +3: [2023-03-17 12:14:42,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-17 12:14:42,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-17 12:14:42,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +3: [2023-03-17 12:14:42,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-17 12:14:42,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-17 12:14:42,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-17 12:14:42,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-17 12:14:42,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-17 12:14:42,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-17 12:14:42,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +3: [2023-03-17 12:14:42,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +3: [2023-03-17 12:14:42,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +3: [2023-03-17 12:14:42,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-17 12:14:42,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-17 12:14:42,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +7: [2023-03-17 12:14:42,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-17 12:14:42,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-17 12:14:42,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +7: [2023-03-17 12:14:42,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-17 12:14:42,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-17 12:14:42,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +7: [2023-03-17 12:14:42,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-17 12:14:42,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-17 12:14:42,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-17 12:14:42,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-17 12:14:42,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +7: [2023-03-17 12:14:42,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +5: [2023-03-17 12:14:42,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-17 12:14:42,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-17 12:14:42,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-17 12:14:42,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-17 12:14:42,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-17 12:14:42,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +5: [2023-03-17 12:14:42,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-17 12:14:42,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +5: [2023-03-17 12:14:42,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +5: [2023-03-17 12:14:42,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-17 12:14:42,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-17 12:14:42,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +5: [2023-03-17 12:14:42,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-17 12:14:42,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-17 12:14:42,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-17 12:14:42,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-17 12:14:42,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-17 12:14:42,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-17 12:14:42,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-17 12:14:42,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-17 12:14:42,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +5: [2023-03-17 12:14:42,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +5: [2023-03-17 12:14:42,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +5: [2023-03-17 12:14:42,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-17 12:14:42,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-17 12:14:42,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-17 12:14:42,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-17 12:14:42,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-17 12:14:42,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-17 12:14:42,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-17 12:14:42,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-17 12:14:42,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +1: [2023-03-17 12:14:42,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +0: [2023-03-17 12:14:42,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-17 12:14:42,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-17 12:14:42,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-17 12:14:42,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-17 12:14:42,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +6: [2023-03-17 12:14:42,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +6: [2023-03-17 12:14:42,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-17 12:14:42,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +6: [2023-03-17 12:14:42,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +6: [2023-03-17 12:14:42,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-17 12:14:42,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-17 12:14:42,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-17 12:14:42,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-17 12:14:42,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-17 12:14:42,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +4: [2023-03-17 12:14:42,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +4: [2023-03-17 12:14:42,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +4: [2023-03-17 12:14:42,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-17 12:14:42,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +2: [2023-03-17 12:14:42,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-17 12:14:42,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-17 12:14:42,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +2: [2023-03-17 12:14:42,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-17 12:14:42,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-17 12:14:42,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +2: [2023-03-17 12:14:42,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-17 12:14:42,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-17 12:14:42,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +2: [2023-03-17 12:14:42,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-17 12:14:42,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-17 12:14:42,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +2: [2023-03-17 12:14:42,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-17 12:14:42,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-17 12:14:42,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-17 12:14:42,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-17 12:14:42,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-17 12:14:42,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-17 12:14:42,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-17 12:14:42,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m91b400m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-17 12:14:42,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +2: [2023-03-17 12:14:42,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +2: [2023-03-17 12:14:42,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +2: [2023-03-17 12:14:42,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! +0: successfully saved checkpoint at iteration 110000 to checkpoints_221m91b400m +7: time (ms) | save-checkpoint: 819.01 +7: iteration 110010/ 173500 | consumed samples: 28162560 | consumed tokens: 57676922880 | elapsed time per iteration (s): 0.52 | learning rate: 7.416E-05 | global batch size: 256 | lm loss: 2.915894E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 492.935 | TFLOPs: 25.86 | +7: iteration 110020/ 173500 | consumed samples: 28165120 | consumed tokens: 57682165760 | elapsed time per iteration (s): 0.45 | learning rate: 7.415E-05 | global batch size: 256 | lm loss: 2.905995E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.339 | TFLOPs: 30.13 | +7: iteration 110030/ 173500 | consumed samples: 28167680 | consumed tokens: 57687408640 | elapsed time per iteration (s): 0.42 | learning rate: 7.413E-05 | global batch size: 256 | lm loss: 2.906077E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.158 | TFLOPs: 31.80 | +7: iteration 110040/ 173500 | consumed samples: 28170240 | consumed tokens: 57692651520 | elapsed time per iteration (s): 0.43 | learning rate: 7.412E-05 | global batch size: 256 | lm loss: 2.880557E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.942 | TFLOPs: 31.53 | +7: iteration 110050/ 173500 | consumed samples: 28172800 | consumed tokens: 57697894400 | elapsed time per iteration (s): 0.43 | learning rate: 7.410E-05 | global batch size: 256 | lm loss: 2.906541E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.095 | TFLOPs: 31.33 | +7: iteration 110060/ 173500 | consumed samples: 28175360 | consumed tokens: 57703137280 | elapsed time per iteration (s): 0.46 | learning rate: 7.409E-05 | global batch size: 256 | lm loss: 2.897709E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 550.920 | TFLOPs: 28.91 | +7: iteration 110070/ 173500 | consumed samples: 28177920 | consumed tokens: 57708380160 | elapsed time per iteration (s): 0.42 | learning rate: 7.407E-05 | global batch size: 256 | lm loss: 2.901401E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.038 | TFLOPs: 31.75 | +7: iteration 110080/ 173500 | consumed samples: 28180480 | consumed tokens: 57713623040 | elapsed time per iteration (s): 0.42 | learning rate: 7.406E-05 | global batch size: 256 | lm loss: 2.895487E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.426 | TFLOPs: 31.92 | +7: iteration 110090/ 173500 | consumed samples: 28183040 | consumed tokens: 57718865920 | elapsed time per iteration (s): 0.43 | learning rate: 7.404E-05 | global batch size: 256 | lm loss: 2.900488E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.198 | TFLOPs: 31.60 | +7: iteration 110100/ 173500 | consumed samples: 28185600 | consumed tokens: 57724108800 | elapsed time per iteration (s): 0.42 | learning rate: 7.403E-05 | global batch size: 256 | lm loss: 2.908331E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.070 | TFLOPs: 31.69 | +7: iteration 110110/ 173500 | consumed samples: 28188160 | consumed tokens: 57729351680 | elapsed time per iteration (s): 0.43 | learning rate: 7.401E-05 | global batch size: 256 | lm loss: 2.893513E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.710 | TFLOPs: 31.26 | +7: iteration 110120/ 173500 | consumed samples: 28190720 | consumed tokens: 57734594560 | elapsed time per iteration (s): 0.42 | learning rate: 7.400E-05 | global batch size: 256 | lm loss: 2.894077E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.869 | TFLOPs: 31.84 | +7: iteration 110130/ 173500 | consumed samples: 28193280 | consumed tokens: 57739837440 | elapsed time per iteration (s): 0.43 | learning rate: 7.398E-05 | global batch size: 256 | lm loss: 2.902397E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.118 | TFLOPs: 31.22 | +7: iteration 110140/ 173500 | consumed samples: 28195840 | consumed tokens: 57745080320 | elapsed time per iteration (s): 0.42 | learning rate: 7.397E-05 | global batch size: 256 | lm loss: 2.913558E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.559 | TFLOPs: 31.72 | +7: iteration 110150/ 173500 | consumed samples: 28198400 | consumed tokens: 57750323200 | elapsed time per iteration (s): 0.43 | learning rate: 7.395E-05 | global batch size: 256 | lm loss: 2.908380E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.314 | TFLOPs: 30.97 | +7: iteration 110160/ 173500 | consumed samples: 28200960 | consumed tokens: 57755566080 | elapsed time per iteration (s): 0.43 | learning rate: 7.394E-05 | global batch size: 256 | lm loss: 2.892717E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.122 | TFLOPs: 31.33 | +7: iteration 110170/ 173500 | consumed samples: 28203520 | consumed tokens: 57760808960 | elapsed time per iteration (s): 0.43 | learning rate: 7.392E-05 | global batch size: 256 | lm loss: 2.909546E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.595 | TFLOPs: 31.09 | +7: iteration 110180/ 173500 | consumed samples: 28206080 | consumed tokens: 57766051840 | elapsed time per iteration (s): 0.43 | learning rate: 7.391E-05 | global batch size: 256 | lm loss: 2.901565E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.391 | TFLOPs: 31.50 | +7: iteration 110190/ 173500 | consumed samples: 28208640 | consumed tokens: 57771294720 | elapsed time per iteration (s): 0.42 | learning rate: 7.389E-05 | global batch size: 256 | lm loss: 2.904217E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.758 | TFLOPs: 31.78 | +7: iteration 110200/ 173500 | consumed samples: 28211200 | consumed tokens: 57776537600 | elapsed time per iteration (s): 0.43 | learning rate: 7.388E-05 | global batch size: 256 | lm loss: 2.904520E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.426 | TFLOPs: 31.14 | +7: iteration 110210/ 173500 | consumed samples: 28213760 | consumed tokens: 57781780480 | elapsed time per iteration (s): 0.42 | learning rate: 7.386E-05 | global batch size: 256 | lm loss: 2.905200E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.818 | TFLOPs: 31.68 | +7: iteration 110220/ 173500 | consumed samples: 28216320 | consumed tokens: 57787023360 | elapsed time per iteration (s): 0.43 | learning rate: 7.385E-05 | global batch size: 256 | lm loss: 2.919886E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.353 | TFLOPs: 31.45 | +7: iteration 110230/ 173500 | consumed samples: 28218880 | consumed tokens: 57792266240 | elapsed time per iteration (s): 0.43 | learning rate: 7.383E-05 | global batch size: 256 | lm loss: 2.910781E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.477 | TFLOPs: 30.93 | +7: iteration 110240/ 173500 | consumed samples: 28221440 | consumed tokens: 57797509120 | elapsed time per iteration (s): 0.43 | learning rate: 7.382E-05 | global batch size: 256 | lm loss: 2.905955E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.227 | TFLOPs: 31.39 | +7: iteration 110250/ 173500 | consumed samples: 28224000 | consumed tokens: 57802752000 | elapsed time per iteration (s): 0.43 | learning rate: 7.380E-05 | global batch size: 256 | lm loss: 2.906373E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.545 | TFLOPs: 31.51 | +7: iteration 110260/ 173500 | consumed samples: 28226560 | consumed tokens: 57807994880 | elapsed time per iteration (s): 0.43 | learning rate: 7.378E-05 | global batch size: 256 | lm loss: 2.907617E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.843 | TFLOPs: 31.42 | +7: iteration 110270/ 173500 | consumed samples: 28229120 | consumed tokens: 57813237760 | elapsed time per iteration (s): 0.43 | learning rate: 7.377E-05 | global batch size: 256 | lm loss: 2.898138E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.322 | TFLOPs: 31.45 | +7: iteration 110280/ 173500 | consumed samples: 28231680 | consumed tokens: 57818480640 | elapsed time per iteration (s): 0.43 | learning rate: 7.375E-05 | global batch size: 256 | lm loss: 2.893497E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.889 | TFLOPs: 31.53 | +7: iteration 110290/ 173500 | consumed samples: 28234240 | consumed tokens: 57823723520 | elapsed time per iteration (s): 0.42 | learning rate: 7.374E-05 | global batch size: 256 | lm loss: 2.913577E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.832 | TFLOPs: 31.73 | +7: iteration 110300/ 173500 | consumed samples: 28236800 | consumed tokens: 57828966400 | elapsed time per iteration (s): 0.43 | learning rate: 7.372E-05 | global batch size: 256 | lm loss: 2.903708E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.581 | TFLOPs: 31.41 | +7: iteration 110310/ 173500 | consumed samples: 28239360 | consumed tokens: 57834209280 | elapsed time per iteration (s): 0.43 | learning rate: 7.371E-05 | global batch size: 256 | lm loss: 2.897104E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.563 | TFLOPs: 31.25 | +7: iteration 110320/ 173500 | consumed samples: 28241920 | consumed tokens: 57839452160 | elapsed time per iteration (s): 0.43 | learning rate: 7.369E-05 | global batch size: 256 | lm loss: 2.901817E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.700 | TFLOPs: 31.52 | +7: iteration 110330/ 173500 | consumed samples: 28244480 | consumed tokens: 57844695040 | elapsed time per iteration (s): 0.42 | learning rate: 7.368E-05 | global batch size: 256 | lm loss: 2.898385E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.433 | TFLOPs: 32.03 | +7: iteration 110340/ 173500 | consumed samples: 28247040 | consumed tokens: 57849937920 | elapsed time per iteration (s): 0.43 | learning rate: 7.366E-05 | global batch size: 256 | lm loss: 2.900267E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.018 | TFLOPs: 31.59 | +7: iteration 110350/ 173500 | consumed samples: 28249600 | consumed tokens: 57855180800 | elapsed time per iteration (s): 0.43 | learning rate: 7.365E-05 | global batch size: 256 | lm loss: 2.911278E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.552 | TFLOPs: 31.30 | +7: iteration 110360/ 173500 | consumed samples: 28252160 | consumed tokens: 57860423680 | elapsed time per iteration (s): 0.43 | learning rate: 7.363E-05 | global batch size: 256 | lm loss: 2.924413E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.791 | TFLOPs: 31.21 | +7: iteration 110370/ 173500 | consumed samples: 28254720 | consumed tokens: 57865666560 | elapsed time per iteration (s): 0.42 | learning rate: 7.362E-05 | global batch size: 256 | lm loss: 2.900200E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.570 | TFLOPs: 31.67 | +7: iteration 110380/ 173500 | consumed samples: 28257280 | consumed tokens: 57870909440 | elapsed time per iteration (s): 0.44 | learning rate: 7.360E-05 | global batch size: 256 | lm loss: 2.898167E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.884 | TFLOPs: 30.37 | +7: iteration 110390/ 173500 | consumed samples: 28259840 | consumed tokens: 57876152320 | elapsed time per iteration (s): 0.43 | learning rate: 7.359E-05 | global batch size: 256 | lm loss: 2.904708E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.034 | TFLOPs: 31.33 | +7: iteration 110400/ 173500 | consumed samples: 28262400 | consumed tokens: 57881395200 | elapsed time per iteration (s): 0.42 | learning rate: 7.357E-05 | global batch size: 256 | lm loss: 2.906481E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.918 | TFLOPs: 31.79 | +7: iteration 110410/ 173500 | consumed samples: 28264960 | consumed tokens: 57886638080 | elapsed time per iteration (s): 0.43 | learning rate: 7.356E-05 | global batch size: 256 | lm loss: 2.901615E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.990 | TFLOPs: 31.43 | +7: iteration 110420/ 173500 | consumed samples: 28267520 | consumed tokens: 57891880960 | elapsed time per iteration (s): 0.42 | learning rate: 7.354E-05 | global batch size: 256 | lm loss: 2.911427E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.154 | TFLOPs: 31.65 | +7: iteration 110430/ 173500 | consumed samples: 28270080 | consumed tokens: 57897123840 | elapsed time per iteration (s): 0.43 | learning rate: 7.353E-05 | global batch size: 256 | lm loss: 2.906616E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.650 | TFLOPs: 30.99 | +7: iteration 110440/ 173500 | consumed samples: 28272640 | consumed tokens: 57902366720 | elapsed time per iteration (s): 0.43 | learning rate: 7.351E-05 | global batch size: 256 | lm loss: 2.917702E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.868 | TFLOPs: 31.32 | +7: iteration 110450/ 173500 | consumed samples: 28275200 | consumed tokens: 57907609600 | elapsed time per iteration (s): 0.42 | learning rate: 7.350E-05 | global batch size: 256 | lm loss: 2.904248E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.400 | TFLOPs: 31.66 | +7: iteration 110460/ 173500 | consumed samples: 28277760 | consumed tokens: 57912852480 | elapsed time per iteration (s): 0.42 | learning rate: 7.348E-05 | global batch size: 256 | lm loss: 2.897515E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.949 | TFLOPs: 31.74 | +7: iteration 110470/ 173500 | consumed samples: 28280320 | consumed tokens: 57918095360 | elapsed time per iteration (s): 0.43 | learning rate: 7.347E-05 | global batch size: 256 | lm loss: 2.908273E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.659 | TFLOPs: 31.20 | +7: iteration 110480/ 173500 | consumed samples: 28282880 | consumed tokens: 57923338240 | elapsed time per iteration (s): 0.42 | learning rate: 7.345E-05 | global batch size: 256 | lm loss: 2.905845E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.139 | TFLOPs: 31.70 | +7: iteration 110490/ 173500 | consumed samples: 28285440 | consumed tokens: 57928581120 | elapsed time per iteration (s): 0.43 | learning rate: 7.344E-05 | global batch size: 256 | lm loss: 2.912857E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.874 | TFLOPs: 31.16 | +7: iteration 110500/ 173500 | consumed samples: 28288000 | consumed tokens: 57933824000 | elapsed time per iteration (s): 0.42 | learning rate: 7.342E-05 | global batch size: 256 | lm loss: 2.898337E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.625 | TFLOPs: 31.78 | +7: iteration 110510/ 173500 | consumed samples: 28290560 | consumed tokens: 57939066880 | elapsed time per iteration (s): 0.43 | learning rate: 7.341E-05 | global batch size: 256 | lm loss: 2.904737E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.188 | TFLOPs: 31.49 | +7: iteration 110520/ 173500 | consumed samples: 28293120 | consumed tokens: 57944309760 | elapsed time per iteration (s): 0.43 | learning rate: 7.339E-05 | global batch size: 256 | lm loss: 2.906534E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.778 | TFLOPs: 31.31 | +7: iteration 110530/ 173500 | consumed samples: 28295680 | consumed tokens: 57949552640 | elapsed time per iteration (s): 0.42 | learning rate: 7.338E-05 | global batch size: 256 | lm loss: 2.910672E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.903 | TFLOPs: 31.74 | +7: iteration 110540/ 173500 | consumed samples: 28298240 | consumed tokens: 57954795520 | elapsed time per iteration (s): 0.44 | learning rate: 7.336E-05 | global batch size: 256 | lm loss: 2.913057E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.180 | TFLOPs: 30.81 | +7: iteration 110550/ 173500 | consumed samples: 28300800 | consumed tokens: 57960038400 | elapsed time per iteration (s): 0.44 | learning rate: 7.335E-05 | global batch size: 256 | lm loss: 2.900435E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.794 | TFLOPs: 30.47 | +7: iteration 110560/ 173500 | consumed samples: 28303360 | consumed tokens: 57965281280 | elapsed time per iteration (s): 0.42 | learning rate: 7.333E-05 | global batch size: 256 | lm loss: 2.924512E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.647 | TFLOPs: 31.93 | +7: iteration 110570/ 173500 | consumed samples: 28305920 | consumed tokens: 57970524160 | elapsed time per iteration (s): 0.42 | learning rate: 7.332E-05 | global batch size: 256 | lm loss: 2.903915E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.195 | TFLOPs: 31.65 | +7: iteration 110580/ 173500 | consumed samples: 28308480 | consumed tokens: 57975767040 | elapsed time per iteration (s): 0.42 | learning rate: 7.330E-05 | global batch size: 256 | lm loss: 2.915791E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.486 | TFLOPs: 31.77 | +7: iteration 110590/ 173500 | consumed samples: 28311040 | consumed tokens: 57981009920 | elapsed time per iteration (s): 0.43 | learning rate: 7.329E-05 | global batch size: 256 | lm loss: 2.899961E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.226 | TFLOPs: 31.28 | +7: iteration 110600/ 173500 | consumed samples: 28313600 | consumed tokens: 57986252800 | elapsed time per iteration (s): 0.42 | learning rate: 7.327E-05 | global batch size: 256 | lm loss: 2.914728E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.968 | TFLOPs: 31.64 | +7: iteration 110610/ 173500 | consumed samples: 28316160 | consumed tokens: 57991495680 | elapsed time per iteration (s): 0.42 | learning rate: 7.326E-05 | global batch size: 256 | lm loss: 2.902338E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.201 | TFLOPs: 31.81 | +7: iteration 110620/ 173500 | consumed samples: 28318720 | consumed tokens: 57996738560 | elapsed time per iteration (s): 0.42 | learning rate: 7.324E-05 | global batch size: 256 | lm loss: 2.904869E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.921 | TFLOPs: 32.00 | +7: iteration 110630/ 173500 | consumed samples: 28321280 | consumed tokens: 58001981440 | elapsed time per iteration (s): 0.42 | learning rate: 7.323E-05 | global batch size: 256 | lm loss: 2.914784E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.238 | TFLOPs: 31.70 | +7: iteration 110640/ 173500 | consumed samples: 28323840 | consumed tokens: 58007224320 | elapsed time per iteration (s): 0.42 | learning rate: 7.321E-05 | global batch size: 256 | lm loss: 2.914557E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.353 | TFLOPs: 31.76 | +7: iteration 110650/ 173500 | consumed samples: 28326400 | consumed tokens: 58012467200 | elapsed time per iteration (s): 0.43 | learning rate: 7.320E-05 | global batch size: 256 | lm loss: 2.900919E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.902 | TFLOPs: 31.21 | +7: iteration 110660/ 173500 | consumed samples: 28328960 | consumed tokens: 58017710080 | elapsed time per iteration (s): 0.42 | learning rate: 7.318E-05 | global batch size: 256 | lm loss: 2.903065E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.899 | TFLOPs: 31.63 | +7: iteration 110670/ 173500 | consumed samples: 28331520 | consumed tokens: 58022952960 | elapsed time per iteration (s): 0.42 | learning rate: 7.317E-05 | global batch size: 256 | lm loss: 2.896385E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.496 | TFLOPs: 31.82 | +7: iteration 110680/ 173500 | consumed samples: 28334080 | consumed tokens: 58028195840 | elapsed time per iteration (s): 0.43 | learning rate: 7.315E-05 | global batch size: 256 | lm loss: 2.901754E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.513 | TFLOPs: 31.56 | +7: iteration 110690/ 173500 | consumed samples: 28336640 | consumed tokens: 58033438720 | elapsed time per iteration (s): 0.43 | learning rate: 7.314E-05 | global batch size: 256 | lm loss: 2.906077E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.274 | TFLOPs: 31.39 | +7: iteration 110700/ 173500 | consumed samples: 28339200 | consumed tokens: 58038681600 | elapsed time per iteration (s): 0.42 | learning rate: 7.312E-05 | global batch size: 256 | lm loss: 2.908828E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.033 | TFLOPs: 31.85 | +7: iteration 110710/ 173500 | consumed samples: 28341760 | consumed tokens: 58043924480 | elapsed time per iteration (s): 0.43 | learning rate: 7.311E-05 | global batch size: 256 | lm loss: 2.914779E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.366 | TFLOPs: 31.55 | +7: iteration 110720/ 173500 | consumed samples: 28344320 | consumed tokens: 58049167360 | elapsed time per iteration (s): 0.42 | learning rate: 7.309E-05 | global batch size: 256 | lm loss: 2.905149E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.119 | TFLOPs: 31.85 | +7: iteration 110730/ 173500 | consumed samples: 28346880 | consumed tokens: 58054410240 | elapsed time per iteration (s): 0.43 | learning rate: 7.308E-05 | global batch size: 256 | lm loss: 2.904387E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.141 | TFLOPs: 31.23 | +7: iteration 110740/ 173500 | consumed samples: 28349440 | consumed tokens: 58059653120 | elapsed time per iteration (s): 0.43 | learning rate: 7.306E-05 | global batch size: 256 | lm loss: 2.921199E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.768 | TFLOPs: 31.00 | +7: iteration 110750/ 173500 | consumed samples: 28352000 | consumed tokens: 58064896000 | elapsed time per iteration (s): 0.43 | learning rate: 7.305E-05 | global batch size: 256 | lm loss: 2.911372E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.346 | TFLOPs: 31.24 | +7: iteration 110760/ 173500 | consumed samples: 28354560 | consumed tokens: 58070138880 | elapsed time per iteration (s): 0.42 | learning rate: 7.303E-05 | global batch size: 256 | lm loss: 2.899728E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.783 | TFLOPs: 31.84 | +7: iteration 110770/ 173500 | consumed samples: 28357120 | consumed tokens: 58075381760 | elapsed time per iteration (s): 0.43 | learning rate: 7.302E-05 | global batch size: 256 | lm loss: 2.916138E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.066 | TFLOPs: 31.27 | +7: iteration 110780/ 173500 | consumed samples: 28359680 | consumed tokens: 58080624640 | elapsed time per iteration (s): 0.42 | learning rate: 7.300E-05 | global batch size: 256 | lm loss: 2.905674E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.725 | TFLOPs: 31.62 | +7: iteration 110790/ 173500 | consumed samples: 28362240 | consumed tokens: 58085867520 | elapsed time per iteration (s): 0.43 | learning rate: 7.299E-05 | global batch size: 256 | lm loss: 2.912724E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.119 | TFLOPs: 31.02 | +7: iteration 110800/ 173500 | consumed samples: 28364800 | consumed tokens: 58091110400 | elapsed time per iteration (s): 0.42 | learning rate: 7.297E-05 | global batch size: 256 | lm loss: 2.912717E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.357 | TFLOPs: 31.76 | +7: iteration 110810/ 173500 | consumed samples: 28367360 | consumed tokens: 58096353280 | elapsed time per iteration (s): 0.43 | learning rate: 7.296E-05 | global batch size: 256 | lm loss: 2.907187E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.091 | TFLOPs: 31.12 | +7: iteration 110820/ 173500 | consumed samples: 28369920 | consumed tokens: 58101596160 | elapsed time per iteration (s): 0.43 | learning rate: 7.294E-05 | global batch size: 256 | lm loss: 2.899346E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.279 | TFLOPs: 31.50 | +7: iteration 110830/ 173500 | consumed samples: 28372480 | consumed tokens: 58106839040 | elapsed time per iteration (s): 0.42 | learning rate: 7.293E-05 | global batch size: 256 | lm loss: 2.911110E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.177 | TFLOPs: 31.75 | +7: iteration 110840/ 173500 | consumed samples: 28375040 | consumed tokens: 58112081920 | elapsed time per iteration (s): 0.42 | learning rate: 7.291E-05 | global batch size: 256 | lm loss: 2.913449E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.664 | TFLOPs: 31.83 | +7: iteration 110850/ 173500 | consumed samples: 28377600 | consumed tokens: 58117324800 | elapsed time per iteration (s): 0.43 | learning rate: 7.290E-05 | global batch size: 256 | lm loss: 2.902364E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.883 | TFLOPs: 31.58 | +7: iteration 110860/ 173500 | consumed samples: 28380160 | consumed tokens: 58122567680 | elapsed time per iteration (s): 0.42 | learning rate: 7.288E-05 | global batch size: 256 | lm loss: 2.893888E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.807 | TFLOPs: 31.79 | +7: iteration 110870/ 173500 | consumed samples: 28382720 | consumed tokens: 58127810560 | elapsed time per iteration (s): 0.42 | learning rate: 7.287E-05 | global batch size: 256 | lm loss: 2.912296E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.279 | TFLOPs: 31.65 | +7: iteration 110880/ 173500 | consumed samples: 28385280 | consumed tokens: 58133053440 | elapsed time per iteration (s): 0.44 | learning rate: 7.285E-05 | global batch size: 256 | lm loss: 2.923487E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.969 | TFLOPs: 30.80 | +7: iteration 110890/ 173500 | consumed samples: 28387840 | consumed tokens: 58138296320 | elapsed time per iteration (s): 0.43 | learning rate: 7.284E-05 | global batch size: 256 | lm loss: 2.910011E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.105 | TFLOPs: 30.96 | +7: iteration 110900/ 173500 | consumed samples: 28390400 | consumed tokens: 58143539200 | elapsed time per iteration (s): 0.42 | learning rate: 7.282E-05 | global batch size: 256 | lm loss: 2.914739E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.559 | TFLOPs: 31.62 | +7: iteration 110910/ 173500 | consumed samples: 28392960 | consumed tokens: 58148782080 | elapsed time per iteration (s): 0.43 | learning rate: 7.281E-05 | global batch size: 256 | lm loss: 2.913197E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.727 | TFLOPs: 31.52 | +7: iteration 110920/ 173500 | consumed samples: 28395520 | consumed tokens: 58154024960 | elapsed time per iteration (s): 0.42 | learning rate: 7.279E-05 | global batch size: 256 | lm loss: 2.915622E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.073 | TFLOPs: 31.75 | +7: iteration 110930/ 173500 | consumed samples: 28398080 | consumed tokens: 58159267840 | elapsed time per iteration (s): 0.43 | learning rate: 7.278E-05 | global batch size: 256 | lm loss: 2.902666E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.279 | TFLOPs: 31.18 | +7: iteration 110940/ 173500 | consumed samples: 28400640 | consumed tokens: 58164510720 | elapsed time per iteration (s): 0.44 | learning rate: 7.276E-05 | global batch size: 256 | lm loss: 2.910199E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.847 | TFLOPs: 30.84 | +7: iteration 110950/ 173500 | consumed samples: 28403200 | consumed tokens: 58169753600 | elapsed time per iteration (s): 0.43 | learning rate: 7.275E-05 | global batch size: 256 | lm loss: 2.907861E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.669 | TFLOPs: 31.20 | +7: iteration 110960/ 173500 | consumed samples: 28405760 | consumed tokens: 58174996480 | elapsed time per iteration (s): 0.43 | learning rate: 7.273E-05 | global batch size: 256 | lm loss: 2.899650E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.068 | TFLOPs: 31.59 | +7: iteration 110970/ 173500 | consumed samples: 28408320 | consumed tokens: 58180239360 | elapsed time per iteration (s): 0.42 | learning rate: 7.272E-05 | global batch size: 256 | lm loss: 2.897727E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.653 | TFLOPs: 31.73 | +7: iteration 110980/ 173500 | consumed samples: 28410880 | consumed tokens: 58185482240 | elapsed time per iteration (s): 0.44 | learning rate: 7.270E-05 | global batch size: 256 | lm loss: 2.905954E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.164 | TFLOPs: 30.81 | +7: iteration 110990/ 173500 | consumed samples: 28413440 | consumed tokens: 58190725120 | elapsed time per iteration (s): 0.44 | learning rate: 7.269E-05 | global batch size: 256 | lm loss: 2.902794E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.040 | TFLOPs: 30.75 | +7: iteration 111000/ 173500 | consumed samples: 28416000 | consumed tokens: 58195968000 | elapsed time per iteration (s): 0.44 | learning rate: 7.267E-05 | global batch size: 256 | lm loss: 2.906990E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.590 | TFLOPs: 30.46 | +7: iteration 111010/ 173500 | consumed samples: 28418560 | consumed tokens: 58201210880 | elapsed time per iteration (s): 0.43 | learning rate: 7.266E-05 | global batch size: 256 | lm loss: 2.907867E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.332 | TFLOPs: 31.03 | +7: iteration 111020/ 173500 | consumed samples: 28421120 | consumed tokens: 58206453760 | elapsed time per iteration (s): 0.43 | learning rate: 7.264E-05 | global batch size: 256 | lm loss: 2.904499E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.782 | TFLOPs: 31.47 | +7: iteration 111030/ 173500 | consumed samples: 28423680 | consumed tokens: 58211696640 | elapsed time per iteration (s): 0.43 | learning rate: 7.263E-05 | global batch size: 256 | lm loss: 2.899647E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.604 | TFLOPs: 31.36 | +7: iteration 111040/ 173500 | consumed samples: 28426240 | consumed tokens: 58216939520 | elapsed time per iteration (s): 0.43 | learning rate: 7.261E-05 | global batch size: 256 | lm loss: 2.892880E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.077 | TFLOPs: 31.38 | +7: iteration 111050/ 173500 | consumed samples: 28428800 | consumed tokens: 58222182400 | elapsed time per iteration (s): 0.42 | learning rate: 7.260E-05 | global batch size: 256 | lm loss: 2.901963E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.443 | TFLOPs: 31.82 | +7: iteration 111060/ 173500 | consumed samples: 28431360 | consumed tokens: 58227425280 | elapsed time per iteration (s): 0.42 | learning rate: 7.258E-05 | global batch size: 256 | lm loss: 2.893858E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.649 | TFLOPs: 31.72 | +7: iteration 111070/ 173500 | consumed samples: 28433920 | consumed tokens: 58232668160 | elapsed time per iteration (s): 0.43 | learning rate: 7.257E-05 | global batch size: 256 | lm loss: 2.897958E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.308 | TFLOPs: 31.18 | +7: iteration 111080/ 173500 | consumed samples: 28436480 | consumed tokens: 58237911040 | elapsed time per iteration (s): 0.42 | learning rate: 7.255E-05 | global batch size: 256 | lm loss: 2.905253E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.382 | TFLOPs: 31.82 | +7: iteration 111090/ 173500 | consumed samples: 28439040 | consumed tokens: 58243153920 | elapsed time per iteration (s): 0.43 | learning rate: 7.254E-05 | global batch size: 256 | lm loss: 2.902887E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.846 | TFLOPs: 31.21 | +7: iteration 111100/ 173500 | consumed samples: 28441600 | consumed tokens: 58248396800 | elapsed time per iteration (s): 0.45 | learning rate: 7.252E-05 | global batch size: 256 | lm loss: 2.918387E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.094 | TFLOPs: 30.17 | +7: iteration 111110/ 173500 | consumed samples: 28444160 | consumed tokens: 58253639680 | elapsed time per iteration (s): 0.43 | learning rate: 7.251E-05 | global batch size: 256 | lm loss: 2.908107E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.144 | TFLOPs: 31.54 | +7: iteration 111120/ 173500 | consumed samples: 28446720 | consumed tokens: 58258882560 | elapsed time per iteration (s): 0.42 | learning rate: 7.249E-05 | global batch size: 256 | lm loss: 2.911946E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.818 | TFLOPs: 31.84 | +7: iteration 111130/ 173500 | consumed samples: 28449280 | consumed tokens: 58264125440 | elapsed time per iteration (s): 0.42 | learning rate: 7.248E-05 | global batch size: 256 | lm loss: 2.898241E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.501 | TFLOPs: 31.61 | +7: iteration 111140/ 173500 | consumed samples: 28451840 | consumed tokens: 58269368320 | elapsed time per iteration (s): 0.42 | learning rate: 7.246E-05 | global batch size: 256 | lm loss: 2.897319E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.165 | TFLOPs: 31.75 | +7: iteration 111150/ 173500 | consumed samples: 28454400 | consumed tokens: 58274611200 | elapsed time per iteration (s): 0.42 | learning rate: 7.245E-05 | global batch size: 256 | lm loss: 2.905945E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.360 | TFLOPs: 32.02 | +7: iteration 111160/ 173500 | consumed samples: 28456960 | consumed tokens: 58279854080 | elapsed time per iteration (s): 0.42 | learning rate: 7.243E-05 | global batch size: 256 | lm loss: 2.912074E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.202 | TFLOPs: 31.81 | +7: iteration 111170/ 173500 | consumed samples: 28459520 | consumed tokens: 58285096960 | elapsed time per iteration (s): 0.42 | learning rate: 7.242E-05 | global batch size: 256 | lm loss: 2.902479E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.730 | TFLOPs: 31.62 | +7: iteration 111180/ 173500 | consumed samples: 28462080 | consumed tokens: 58290339840 | elapsed time per iteration (s): 0.42 | learning rate: 7.240E-05 | global batch size: 256 | lm loss: 2.897332E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.714 | TFLOPs: 31.78 | +7: iteration 111190/ 173500 | consumed samples: 28464640 | consumed tokens: 58295582720 | elapsed time per iteration (s): 0.42 | learning rate: 7.239E-05 | global batch size: 256 | lm loss: 2.904932E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.030 | TFLOPs: 31.80 | +7: iteration 111200/ 173500 | consumed samples: 28467200 | consumed tokens: 58300825600 | elapsed time per iteration (s): 0.43 | learning rate: 7.237E-05 | global batch size: 256 | lm loss: 2.898142E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.266 | TFLOPs: 31.29 | +7: iteration 111210/ 173500 | consumed samples: 28469760 | consumed tokens: 58306068480 | elapsed time per iteration (s): 0.42 | learning rate: 7.236E-05 | global batch size: 256 | lm loss: 2.896873E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.430 | TFLOPs: 31.71 | +7: iteration 111220/ 173500 | consumed samples: 28472320 | consumed tokens: 58311311360 | elapsed time per iteration (s): 0.43 | learning rate: 7.234E-05 | global batch size: 256 | lm loss: 2.904182E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.872 | TFLOPs: 31.32 | +7: iteration 111230/ 173500 | consumed samples: 28474880 | consumed tokens: 58316554240 | elapsed time per iteration (s): 0.43 | learning rate: 7.233E-05 | global batch size: 256 | lm loss: 2.903077E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.007 | TFLOPs: 31.27 | +7: iteration 111240/ 173500 | consumed samples: 28477440 | consumed tokens: 58321797120 | elapsed time per iteration (s): 0.43 | learning rate: 7.231E-05 | global batch size: 256 | lm loss: 2.906019E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.856 | TFLOPs: 31.32 | +7: iteration 111250/ 173500 | consumed samples: 28480000 | consumed tokens: 58327040000 | elapsed time per iteration (s): 0.43 | learning rate: 7.230E-05 | global batch size: 256 | lm loss: 2.904294E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.667 | TFLOPs: 31.57 | +7: iteration 111260/ 173500 | consumed samples: 28482560 | consumed tokens: 58332282880 | elapsed time per iteration (s): 0.44 | learning rate: 7.228E-05 | global batch size: 256 | lm loss: 2.900237E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.480 | TFLOPs: 30.77 | +7: iteration 111270/ 173500 | consumed samples: 28485120 | consumed tokens: 58337525760 | elapsed time per iteration (s): 0.43 | learning rate: 7.227E-05 | global batch size: 256 | lm loss: 2.907487E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.730 | TFLOPs: 31.52 | +7: iteration 111280/ 173500 | consumed samples: 28487680 | consumed tokens: 58342768640 | elapsed time per iteration (s): 0.43 | learning rate: 7.225E-05 | global batch size: 256 | lm loss: 2.907305E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.888 | TFLOPs: 31.27 | +7: iteration 111290/ 173500 | consumed samples: 28490240 | consumed tokens: 58348011520 | elapsed time per iteration (s): 0.43 | learning rate: 7.224E-05 | global batch size: 256 | lm loss: 2.906741E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.007 | TFLOPs: 31.38 | +7: iteration 111300/ 173500 | consumed samples: 28492800 | consumed tokens: 58353254400 | elapsed time per iteration (s): 0.43 | learning rate: 7.222E-05 | global batch size: 256 | lm loss: 2.918272E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.969 | TFLOPs: 31.43 | +7: iteration 111310/ 173500 | consumed samples: 28495360 | consumed tokens: 58358497280 | elapsed time per iteration (s): 0.42 | learning rate: 7.221E-05 | global batch size: 256 | lm loss: 2.891013E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.907 | TFLOPs: 31.69 | +7: iteration 111320/ 173500 | consumed samples: 28497920 | consumed tokens: 58363740160 | elapsed time per iteration (s): 0.43 | learning rate: 7.219E-05 | global batch size: 256 | lm loss: 2.891405E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.109 | TFLOPs: 31.49 | +7: iteration 111330/ 173500 | consumed samples: 28500480 | consumed tokens: 58368983040 | elapsed time per iteration (s): 0.43 | learning rate: 7.218E-05 | global batch size: 256 | lm loss: 2.900602E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.551 | TFLOPs: 31.30 | +7: iteration 111340/ 173500 | consumed samples: 28503040 | consumed tokens: 58374225920 | elapsed time per iteration (s): 0.42 | learning rate: 7.216E-05 | global batch size: 256 | lm loss: 2.908537E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.428 | TFLOPs: 31.77 | +7: iteration 111350/ 173500 | consumed samples: 28505600 | consumed tokens: 58379468800 | elapsed time per iteration (s): 0.43 | learning rate: 7.215E-05 | global batch size: 256 | lm loss: 2.898038E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.113 | TFLOPs: 31.07 | +7: iteration 111360/ 173500 | consumed samples: 28508160 | consumed tokens: 58384711680 | elapsed time per iteration (s): 0.43 | learning rate: 7.213E-05 | global batch size: 256 | lm loss: 2.894736E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.498 | TFLOPs: 31.45 | +7: iteration 111370/ 173500 | consumed samples: 28510720 | consumed tokens: 58389954560 | elapsed time per iteration (s): 0.42 | learning rate: 7.212E-05 | global batch size: 256 | lm loss: 2.893103E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.218 | TFLOPs: 31.65 | +7: iteration 111380/ 173500 | consumed samples: 28513280 | consumed tokens: 58395197440 | elapsed time per iteration (s): 0.42 | learning rate: 7.210E-05 | global batch size: 256 | lm loss: 2.892875E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.914 | TFLOPs: 31.84 | +7: iteration 111390/ 173500 | consumed samples: 28515840 | consumed tokens: 58400440320 | elapsed time per iteration (s): 0.42 | learning rate: 7.209E-05 | global batch size: 256 | lm loss: 2.904034E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.288 | TFLOPs: 31.81 | +7: iteration 111400/ 173500 | consumed samples: 28518400 | consumed tokens: 58405683200 | elapsed time per iteration (s): 0.42 | learning rate: 7.207E-05 | global batch size: 256 | lm loss: 2.898680E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.105 | TFLOPs: 31.80 | +7: iteration 111410/ 173500 | consumed samples: 28520960 | consumed tokens: 58410926080 | elapsed time per iteration (s): 0.43 | learning rate: 7.206E-05 | global batch size: 256 | lm loss: 2.910724E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.511 | TFLOPs: 30.93 | +7: iteration 111420/ 173500 | consumed samples: 28523520 | consumed tokens: 58416168960 | elapsed time per iteration (s): 0.43 | learning rate: 7.205E-05 | global batch size: 256 | lm loss: 2.900935E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.789 | TFLOPs: 31.26 | +7: iteration 111430/ 173500 | consumed samples: 28526080 | consumed tokens: 58421411840 | elapsed time per iteration (s): 0.43 | learning rate: 7.203E-05 | global batch size: 256 | lm loss: 2.903819E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.969 | TFLOPs: 31.48 | +7: iteration 111440/ 173500 | consumed samples: 28528640 | consumed tokens: 58426654720 | elapsed time per iteration (s): 0.43 | learning rate: 7.202E-05 | global batch size: 256 | lm loss: 2.897119E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.280 | TFLOPs: 31.34 | +7: iteration 111450/ 173500 | consumed samples: 28531200 | consumed tokens: 58431897600 | elapsed time per iteration (s): 0.42 | learning rate: 7.200E-05 | global batch size: 256 | lm loss: 2.902494E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.124 | TFLOPs: 31.91 | +7: iteration 111460/ 173500 | consumed samples: 28533760 | consumed tokens: 58437140480 | elapsed time per iteration (s): 0.42 | learning rate: 7.199E-05 | global batch size: 256 | lm loss: 2.915295E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.323 | TFLOPs: 31.66 | +7: iteration 111470/ 173500 | consumed samples: 28536320 | consumed tokens: 58442383360 | elapsed time per iteration (s): 0.42 | learning rate: 7.197E-05 | global batch size: 256 | lm loss: 2.907135E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.903 | TFLOPs: 32.00 | +7: iteration 111480/ 173500 | consumed samples: 28538880 | consumed tokens: 58447626240 | elapsed time per iteration (s): 0.43 | learning rate: 7.196E-05 | global batch size: 256 | lm loss: 2.895414E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.966 | TFLOPs: 31.16 | +7: iteration 111490/ 173500 | consumed samples: 28541440 | consumed tokens: 58452869120 | elapsed time per iteration (s): 0.43 | learning rate: 7.194E-05 | global batch size: 256 | lm loss: 2.902653E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.197 | TFLOPs: 31.33 | +7: iteration 111500/ 173500 | consumed samples: 28544000 | consumed tokens: 58458112000 | elapsed time per iteration (s): 0.42 | learning rate: 7.193E-05 | global batch size: 256 | lm loss: 2.918007E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.952 | TFLOPs: 32.06 | +7: iteration 111510/ 173500 | consumed samples: 28546560 | consumed tokens: 58463354880 | elapsed time per iteration (s): 0.43 | learning rate: 7.191E-05 | global batch size: 256 | lm loss: 2.900759E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.990 | TFLOPs: 31.53 | +7: iteration 111520/ 173500 | consumed samples: 28549120 | consumed tokens: 58468597760 | elapsed time per iteration (s): 0.42 | learning rate: 7.190E-05 | global batch size: 256 | lm loss: 2.905058E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.354 | TFLOPs: 31.60 | +7: iteration 111530/ 173500 | consumed samples: 28551680 | consumed tokens: 58473840640 | elapsed time per iteration (s): 0.42 | learning rate: 7.188E-05 | global batch size: 256 | lm loss: 2.914910E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.164 | TFLOPs: 31.80 | +7: iteration 111540/ 173500 | consumed samples: 28554240 | consumed tokens: 58479083520 | elapsed time per iteration (s): 0.42 | learning rate: 7.187E-05 | global batch size: 256 | lm loss: 2.914236E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.657 | TFLOPs: 31.62 | +7: iteration 111550/ 173500 | consumed samples: 28556800 | consumed tokens: 58484326400 | elapsed time per iteration (s): 0.42 | learning rate: 7.185E-05 | global batch size: 256 | lm loss: 2.918080E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.399 | TFLOPs: 31.82 | +7: iteration 111560/ 173500 | consumed samples: 28559360 | consumed tokens: 58489569280 | elapsed time per iteration (s): 0.43 | learning rate: 7.184E-05 | global batch size: 256 | lm loss: 2.912857E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.466 | TFLOPs: 31.45 | +7: iteration 111570/ 173500 | consumed samples: 28561920 | consumed tokens: 58494812160 | elapsed time per iteration (s): 0.42 | learning rate: 7.182E-05 | global batch size: 256 | lm loss: 2.895494E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.061 | TFLOPs: 31.75 | +7: iteration 111580/ 173500 | consumed samples: 28564480 | consumed tokens: 58500055040 | elapsed time per iteration (s): 0.43 | learning rate: 7.181E-05 | global batch size: 256 | lm loss: 2.897466E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.627 | TFLOPs: 31.57 | +7: iteration 111590/ 173500 | consumed samples: 28567040 | consumed tokens: 58505297920 | elapsed time per iteration (s): 0.42 | learning rate: 7.179E-05 | global batch size: 256 | lm loss: 2.918160E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.470 | TFLOPs: 31.72 | +7: iteration 111600/ 173500 | consumed samples: 28569600 | consumed tokens: 58510540800 | elapsed time per iteration (s): 0.42 | learning rate: 7.178E-05 | global batch size: 256 | lm loss: 2.903021E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.998 | TFLOPs: 31.80 | +7: iteration 111610/ 173500 | consumed samples: 28572160 | consumed tokens: 58515783680 | elapsed time per iteration (s): 0.42 | learning rate: 7.176E-05 | global batch size: 256 | lm loss: 2.912157E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.771 | TFLOPs: 31.84 | +7: iteration 111620/ 173500 | consumed samples: 28574720 | consumed tokens: 58521026560 | elapsed time per iteration (s): 0.43 | learning rate: 7.175E-05 | global batch size: 256 | lm loss: 2.886113E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.945 | TFLOPs: 31.58 | +7: iteration 111630/ 173500 | consumed samples: 28577280 | consumed tokens: 58526269440 | elapsed time per iteration (s): 0.43 | learning rate: 7.173E-05 | global batch size: 256 | lm loss: 2.893199E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.538 | TFLOPs: 31.51 | +7: iteration 111640/ 173500 | consumed samples: 28579840 | consumed tokens: 58531512320 | elapsed time per iteration (s): 0.42 | learning rate: 7.172E-05 | global batch size: 256 | lm loss: 2.893894E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.914 | TFLOPs: 31.69 | +7: iteration 111650/ 173500 | consumed samples: 28582400 | consumed tokens: 58536755200 | elapsed time per iteration (s): 0.43 | learning rate: 7.170E-05 | global batch size: 256 | lm loss: 2.901760E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.224 | TFLOPs: 31.34 | +7: iteration 111660/ 173500 | consumed samples: 28584960 | consumed tokens: 58541998080 | elapsed time per iteration (s): 0.43 | learning rate: 7.169E-05 | global batch size: 256 | lm loss: 2.897108E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.968 | TFLOPs: 31.53 | +7: iteration 111670/ 173500 | consumed samples: 28587520 | consumed tokens: 58547240960 | elapsed time per iteration (s): 0.43 | learning rate: 7.167E-05 | global batch size: 256 | lm loss: 2.905932E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.094 | TFLOPs: 31.54 | +7: iteration 111680/ 173500 | consumed samples: 28590080 | consumed tokens: 58552483840 | elapsed time per iteration (s): 0.42 | learning rate: 7.166E-05 | global batch size: 256 | lm loss: 2.910289E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.648 | TFLOPs: 31.67 | +7: iteration 111690/ 173500 | consumed samples: 28592640 | consumed tokens: 58557726720 | elapsed time per iteration (s): 0.43 | learning rate: 7.164E-05 | global batch size: 256 | lm loss: 2.908521E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.336 | TFLOPs: 31.13 | +7: iteration 111700/ 173500 | consumed samples: 28595200 | consumed tokens: 58562969600 | elapsed time per iteration (s): 0.43 | learning rate: 7.163E-05 | global batch size: 256 | lm loss: 2.886782E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.942 | TFLOPs: 31.01 | +7: iteration 111710/ 173500 | consumed samples: 28597760 | consumed tokens: 58568212480 | elapsed time per iteration (s): 0.42 | learning rate: 7.161E-05 | global batch size: 256 | lm loss: 2.903567E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.185 | TFLOPs: 31.70 | +7: iteration 111720/ 173500 | consumed samples: 28600320 | consumed tokens: 58573455360 | elapsed time per iteration (s): 0.43 | learning rate: 7.160E-05 | global batch size: 256 | lm loss: 2.902750E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.226 | TFLOPs: 31.34 | +7: iteration 111730/ 173500 | consumed samples: 28602880 | consumed tokens: 58578698240 | elapsed time per iteration (s): 0.43 | learning rate: 7.158E-05 | global batch size: 256 | lm loss: 2.895828E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.628 | TFLOPs: 31.57 | +7: iteration 111740/ 173500 | consumed samples: 28605440 | consumed tokens: 58583941120 | elapsed time per iteration (s): 0.43 | learning rate: 7.157E-05 | global batch size: 256 | lm loss: 2.882998E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.168 | TFLOPs: 31.49 | +7: iteration 111750/ 173500 | consumed samples: 28608000 | consumed tokens: 58589184000 | elapsed time per iteration (s): 0.43 | learning rate: 7.155E-05 | global batch size: 256 | lm loss: 2.900512E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.694 | TFLOPs: 30.94 | +7: iteration 111760/ 173500 | consumed samples: 28610560 | consumed tokens: 58594426880 | elapsed time per iteration (s): 0.42 | learning rate: 7.154E-05 | global batch size: 256 | lm loss: 2.898975E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.974 | TFLOPs: 31.90 | +7: iteration 111770/ 173500 | consumed samples: 28613120 | consumed tokens: 58599669760 | elapsed time per iteration (s): 0.42 | learning rate: 7.152E-05 | global batch size: 256 | lm loss: 2.902804E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.817 | TFLOPs: 31.79 | +7: iteration 111780/ 173500 | consumed samples: 28615680 | consumed tokens: 58604912640 | elapsed time per iteration (s): 0.42 | learning rate: 7.151E-05 | global batch size: 256 | lm loss: 2.884883E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.788 | TFLOPs: 31.78 | +7: iteration 111790/ 173500 | consumed samples: 28618240 | consumed tokens: 58610155520 | elapsed time per iteration (s): 0.43 | learning rate: 7.149E-05 | global batch size: 256 | lm loss: 2.903895E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.035 | TFLOPs: 31.33 | +7: iteration 111800/ 173500 | consumed samples: 28620800 | consumed tokens: 58615398400 | elapsed time per iteration (s): 0.42 | learning rate: 7.148E-05 | global batch size: 256 | lm loss: 2.907474E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.960 | TFLOPs: 31.64 | +7: iteration 111810/ 173500 | consumed samples: 28623360 | consumed tokens: 58620641280 | elapsed time per iteration (s): 0.43 | learning rate: 7.146E-05 | global batch size: 256 | lm loss: 2.907557E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.326 | TFLOPs: 31.18 | +7: iteration 111820/ 173500 | consumed samples: 28625920 | consumed tokens: 58625884160 | elapsed time per iteration (s): 0.43 | learning rate: 7.145E-05 | global batch size: 256 | lm loss: 2.906231E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.520 | TFLOPs: 31.09 | +7: iteration 111830/ 173500 | consumed samples: 28628480 | consumed tokens: 58631127040 | elapsed time per iteration (s): 0.42 | learning rate: 7.143E-05 | global batch size: 256 | lm loss: 2.913949E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.247 | TFLOPs: 31.86 | +7: iteration 111840/ 173500 | consumed samples: 28631040 | consumed tokens: 58636369920 | elapsed time per iteration (s): 0.43 | learning rate: 7.142E-05 | global batch size: 256 | lm loss: 2.903731E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.878 | TFLOPs: 31.37 | +7: iteration 111850/ 173500 | consumed samples: 28633600 | consumed tokens: 58641612800 | elapsed time per iteration (s): 0.43 | learning rate: 7.140E-05 | global batch size: 256 | lm loss: 2.905762E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.711 | TFLOPs: 30.94 | +7: iteration 111860/ 173500 | consumed samples: 28636160 | consumed tokens: 58646855680 | elapsed time per iteration (s): 0.43 | learning rate: 7.139E-05 | global batch size: 256 | lm loss: 2.890180E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.570 | TFLOPs: 31.46 | +7: iteration 111870/ 173500 | consumed samples: 28638720 | consumed tokens: 58652098560 | elapsed time per iteration (s): 0.42 | learning rate: 7.137E-05 | global batch size: 256 | lm loss: 2.924953E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.373 | TFLOPs: 31.76 | +7: iteration 111880/ 173500 | consumed samples: 28641280 | consumed tokens: 58657341440 | elapsed time per iteration (s): 0.42 | learning rate: 7.136E-05 | global batch size: 256 | lm loss: 2.903558E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.005 | TFLOPs: 31.80 | +7: iteration 111890/ 173500 | consumed samples: 28643840 | consumed tokens: 58662584320 | elapsed time per iteration (s): 0.43 | learning rate: 7.135E-05 | global batch size: 256 | lm loss: 2.912093E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.399 | TFLOPs: 31.29 | +7: iteration 111900/ 173500 | consumed samples: 28646400 | consumed tokens: 58667827200 | elapsed time per iteration (s): 0.42 | learning rate: 7.133E-05 | global batch size: 256 | lm loss: 2.904002E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.396 | TFLOPs: 31.82 | +7: iteration 111910/ 173500 | consumed samples: 28648960 | consumed tokens: 58673070080 | elapsed time per iteration (s): 0.42 | learning rate: 7.132E-05 | global batch size: 256 | lm loss: 2.899074E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.811 | TFLOPs: 31.73 | +7: iteration 111920/ 173500 | consumed samples: 28651520 | consumed tokens: 58678312960 | elapsed time per iteration (s): 0.42 | learning rate: 7.130E-05 | global batch size: 256 | lm loss: 2.917298E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.268 | TFLOPs: 31.76 | +7: iteration 111930/ 173500 | consumed samples: 28654080 | consumed tokens: 58683555840 | elapsed time per iteration (s): 0.42 | learning rate: 7.129E-05 | global batch size: 256 | lm loss: 2.897639E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.670 | TFLOPs: 31.73 | +7: iteration 111940/ 173500 | consumed samples: 28656640 | consumed tokens: 58688798720 | elapsed time per iteration (s): 0.43 | learning rate: 7.127E-05 | global batch size: 256 | lm loss: 2.904112E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.576 | TFLOPs: 31.41 | +7: iteration 111950/ 173500 | consumed samples: 28659200 | consumed tokens: 58694041600 | elapsed time per iteration (s): 0.42 | learning rate: 7.126E-05 | global batch size: 256 | lm loss: 2.913420E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.393 | TFLOPs: 31.71 | +7: iteration 111960/ 173500 | consumed samples: 28661760 | consumed tokens: 58699284480 | elapsed time per iteration (s): 0.42 | learning rate: 7.124E-05 | global batch size: 256 | lm loss: 2.895237E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.476 | TFLOPs: 31.98 | +7: iteration 111970/ 173500 | consumed samples: 28664320 | consumed tokens: 58704527360 | elapsed time per iteration (s): 0.44 | learning rate: 7.123E-05 | global batch size: 256 | lm loss: 2.899775E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.998 | TFLOPs: 30.80 | +7: iteration 111980/ 173500 | consumed samples: 28666880 | consumed tokens: 58709770240 | elapsed time per iteration (s): 0.42 | learning rate: 7.121E-05 | global batch size: 256 | lm loss: 2.903718E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.667 | TFLOPs: 31.78 | +7: iteration 111990/ 173500 | consumed samples: 28669440 | consumed tokens: 58715013120 | elapsed time per iteration (s): 0.43 | learning rate: 7.120E-05 | global batch size: 256 | lm loss: 2.907569E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.776 | TFLOPs: 31.31 | +0: [2023-03-17 12:28:56,400] [INFO] [logging.py:68:log_dist] [Rank 0] step=112000, skipped=0, lr=[7.118156405567987e-05, 7.118156405567987e-05, 7.118156405567987e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 112000/ 173500 | consumed samples: 28672000 | consumed tokens: 58720256000 | elapsed time per iteration (s): 0.43 | learning rate: 7.118E-05 | global batch size: 256 | lm loss: 2.910889E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.335 | TFLOPs: 31.60 | +0: steps: 112000 loss: 2.9083 iter time (s): 0.425 samples/sec: 602.792 +7: iteration 112010/ 173500 | consumed samples: 28674560 | consumed tokens: 58725498880 | elapsed time per iteration (s): 0.43 | learning rate: 7.117E-05 | global batch size: 256 | lm loss: 2.909208E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.009 | TFLOPs: 31.27 | +7: iteration 112020/ 173500 | consumed samples: 28677120 | consumed tokens: 58730741760 | elapsed time per iteration (s): 0.42 | learning rate: 7.115E-05 | global batch size: 256 | lm loss: 2.915052E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.336 | TFLOPs: 31.71 | +7: iteration 112030/ 173500 | consumed samples: 28679680 | consumed tokens: 58735984640 | elapsed time per iteration (s): 0.43 | learning rate: 7.114E-05 | global batch size: 256 | lm loss: 2.900627E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.065 | TFLOPs: 31.22 | +7: iteration 112040/ 173500 | consumed samples: 28682240 | consumed tokens: 58741227520 | elapsed time per iteration (s): 0.43 | learning rate: 7.112E-05 | global batch size: 256 | lm loss: 2.901801E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.762 | TFLOPs: 31.05 | +7: iteration 112050/ 173500 | consumed samples: 28684800 | consumed tokens: 58746470400 | elapsed time per iteration (s): 0.43 | learning rate: 7.111E-05 | global batch size: 256 | lm loss: 2.904738E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.108 | TFLOPs: 31.59 | +7: iteration 112060/ 173500 | consumed samples: 28687360 | consumed tokens: 58751713280 | elapsed time per iteration (s): 0.48 | learning rate: 7.109E-05 | global batch size: 256 | lm loss: 2.907403E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 530.697 | TFLOPs: 27.84 | +7: iteration 112070/ 173500 | consumed samples: 28689920 | consumed tokens: 58756956160 | elapsed time per iteration (s): 0.42 | learning rate: 7.108E-05 | global batch size: 256 | lm loss: 2.901691E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.486 | TFLOPs: 31.98 | +7: iteration 112080/ 173500 | consumed samples: 28692480 | consumed tokens: 58762199040 | elapsed time per iteration (s): 0.42 | learning rate: 7.106E-05 | global batch size: 256 | lm loss: 2.899312E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.776 | TFLOPs: 31.68 | +7: iteration 112090/ 173500 | consumed samples: 28695040 | consumed tokens: 58767441920 | elapsed time per iteration (s): 0.42 | learning rate: 7.105E-05 | global batch size: 256 | lm loss: 2.910714E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.437 | TFLOPs: 31.87 | +7: iteration 112100/ 173500 | consumed samples: 28697600 | consumed tokens: 58772684800 | elapsed time per iteration (s): 0.43 | learning rate: 7.103E-05 | global batch size: 256 | lm loss: 2.915103E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.927 | TFLOPs: 31.53 | +7: iteration 112110/ 173500 | consumed samples: 28700160 | consumed tokens: 58777927680 | elapsed time per iteration (s): 0.42 | learning rate: 7.102E-05 | global batch size: 256 | lm loss: 2.905848E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.161 | TFLOPs: 31.65 | +7: iteration 112120/ 173500 | consumed samples: 28702720 | consumed tokens: 58783170560 | elapsed time per iteration (s): 0.43 | learning rate: 7.100E-05 | global batch size: 256 | lm loss: 2.901043E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.243 | TFLOPs: 31.39 | +7: iteration 112130/ 173500 | consumed samples: 28705280 | consumed tokens: 58788413440 | elapsed time per iteration (s): 0.43 | learning rate: 7.099E-05 | global batch size: 256 | lm loss: 2.901240E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.783 | TFLOPs: 31.21 | +7: iteration 112140/ 173500 | consumed samples: 28707840 | consumed tokens: 58793656320 | elapsed time per iteration (s): 0.43 | learning rate: 7.097E-05 | global batch size: 256 | lm loss: 2.901366E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.705 | TFLOPs: 31.41 | +7: iteration 112150/ 173500 | consumed samples: 28710400 | consumed tokens: 58798899200 | elapsed time per iteration (s): 0.42 | learning rate: 7.096E-05 | global batch size: 256 | lm loss: 2.897722E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.513 | TFLOPs: 32.03 | +7: iteration 112160/ 173500 | consumed samples: 28712960 | consumed tokens: 58804142080 | elapsed time per iteration (s): 0.42 | learning rate: 7.094E-05 | global batch size: 256 | lm loss: 2.903321E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.575 | TFLOPs: 31.72 | +7: iteration 112170/ 173500 | consumed samples: 28715520 | consumed tokens: 58809384960 | elapsed time per iteration (s): 0.42 | learning rate: 7.093E-05 | global batch size: 256 | lm loss: 2.896857E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.786 | TFLOPs: 31.99 | +7: iteration 112180/ 173500 | consumed samples: 28718080 | consumed tokens: 58814627840 | elapsed time per iteration (s): 0.42 | learning rate: 7.091E-05 | global batch size: 256 | lm loss: 2.903582E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.908 | TFLOPs: 31.69 | +7: iteration 112190/ 173500 | consumed samples: 28720640 | consumed tokens: 58819870720 | elapsed time per iteration (s): 0.43 | learning rate: 7.090E-05 | global batch size: 256 | lm loss: 2.900186E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.162 | TFLOPs: 31.49 | +7: iteration 112200/ 173500 | consumed samples: 28723200 | consumed tokens: 58825113600 | elapsed time per iteration (s): 0.44 | learning rate: 7.088E-05 | global batch size: 256 | lm loss: 2.913094E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.885 | TFLOPs: 30.79 | +7: iteration 112210/ 173500 | consumed samples: 28725760 | consumed tokens: 58830356480 | elapsed time per iteration (s): 0.43 | learning rate: 7.087E-05 | global batch size: 256 | lm loss: 2.896560E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.252 | TFLOPs: 30.97 | +7: iteration 112220/ 173500 | consumed samples: 28728320 | consumed tokens: 58835599360 | elapsed time per iteration (s): 0.42 | learning rate: 7.086E-05 | global batch size: 256 | lm loss: 2.890929E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.753 | TFLOPs: 31.73 | +7: iteration 112230/ 173500 | consumed samples: 28730880 | consumed tokens: 58840842240 | elapsed time per iteration (s): 0.43 | learning rate: 7.084E-05 | global batch size: 256 | lm loss: 2.905412E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.321 | TFLOPs: 31.55 | +7: iteration 112240/ 173500 | consumed samples: 28733440 | consumed tokens: 58846085120 | elapsed time per iteration (s): 0.43 | learning rate: 7.083E-05 | global batch size: 256 | lm loss: 2.902679E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.374 | TFLOPs: 31.55 | +7: iteration 112250/ 173500 | consumed samples: 28736000 | consumed tokens: 58851328000 | elapsed time per iteration (s): 0.42 | learning rate: 7.081E-05 | global batch size: 256 | lm loss: 2.911534E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.704 | TFLOPs: 31.62 | +7: iteration 112260/ 173500 | consumed samples: 28738560 | consumed tokens: 58856570880 | elapsed time per iteration (s): 0.42 | learning rate: 7.080E-05 | global batch size: 256 | lm loss: 2.903217E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.222 | TFLOPs: 31.70 | +7: iteration 112270/ 173500 | consumed samples: 28741120 | consumed tokens: 58861813760 | elapsed time per iteration (s): 0.43 | learning rate: 7.078E-05 | global batch size: 256 | lm loss: 2.921807E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.174 | TFLOPs: 31.28 | +7: iteration 112280/ 173500 | consumed samples: 28743680 | consumed tokens: 58867056640 | elapsed time per iteration (s): 0.43 | learning rate: 7.077E-05 | global batch size: 256 | lm loss: 2.904488E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.795 | TFLOPs: 31.58 | +7: iteration 112290/ 173500 | consumed samples: 28746240 | consumed tokens: 58872299520 | elapsed time per iteration (s): 0.43 | learning rate: 7.075E-05 | global batch size: 256 | lm loss: 2.888848E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.899 | TFLOPs: 31.58 | +7: iteration 112300/ 173500 | consumed samples: 28748800 | consumed tokens: 58877542400 | elapsed time per iteration (s): 0.42 | learning rate: 7.074E-05 | global batch size: 256 | lm loss: 2.906015E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.546 | TFLOPs: 31.88 | +7: iteration 112310/ 173500 | consumed samples: 28751360 | consumed tokens: 58882785280 | elapsed time per iteration (s): 0.42 | learning rate: 7.072E-05 | global batch size: 256 | lm loss: 2.913111E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.063 | TFLOPs: 31.85 | +7: iteration 112320/ 173500 | consumed samples: 28753920 | consumed tokens: 58888028160 | elapsed time per iteration (s): 0.43 | learning rate: 7.071E-05 | global batch size: 256 | lm loss: 2.917867E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.246 | TFLOPs: 31.28 | +7: iteration 112330/ 173500 | consumed samples: 28756480 | consumed tokens: 58893271040 | elapsed time per iteration (s): 0.42 | learning rate: 7.069E-05 | global batch size: 256 | lm loss: 2.893113E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.934 | TFLOPs: 32.00 | +7: iteration 112340/ 173500 | consumed samples: 28759040 | consumed tokens: 58898513920 | elapsed time per iteration (s): 0.43 | learning rate: 7.068E-05 | global batch size: 256 | lm loss: 2.919381E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.123 | TFLOPs: 31.44 | +7: iteration 112350/ 173500 | consumed samples: 28761600 | consumed tokens: 58903756800 | elapsed time per iteration (s): 0.42 | learning rate: 7.066E-05 | global batch size: 256 | lm loss: 2.900911E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.877 | TFLOPs: 31.63 | +7: iteration 112360/ 173500 | consumed samples: 28764160 | consumed tokens: 58908999680 | elapsed time per iteration (s): 0.43 | learning rate: 7.065E-05 | global batch size: 256 | lm loss: 2.904616E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.160 | TFLOPs: 31.59 | +7: iteration 112370/ 173500 | consumed samples: 28766720 | consumed tokens: 58914242560 | elapsed time per iteration (s): 0.47 | learning rate: 7.063E-05 | global batch size: 256 | lm loss: 2.889926E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.136 | TFLOPs: 28.34 | +7: iteration 112380/ 173500 | consumed samples: 28769280 | consumed tokens: 58919485440 | elapsed time per iteration (s): 0.42 | learning rate: 7.062E-05 | global batch size: 256 | lm loss: 2.907989E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.033 | TFLOPs: 31.85 | +7: iteration 112390/ 173500 | consumed samples: 28771840 | consumed tokens: 58924728320 | elapsed time per iteration (s): 0.42 | learning rate: 7.060E-05 | global batch size: 256 | lm loss: 2.910337E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.126 | TFLOPs: 31.85 | +7: iteration 112400/ 173500 | consumed samples: 28774400 | consumed tokens: 58929971200 | elapsed time per iteration (s): 0.43 | learning rate: 7.059E-05 | global batch size: 256 | lm loss: 2.903156E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.537 | TFLOPs: 31.40 | +7: iteration 112410/ 173500 | consumed samples: 28776960 | consumed tokens: 58935214080 | elapsed time per iteration (s): 0.42 | learning rate: 7.057E-05 | global batch size: 256 | lm loss: 2.902356E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.463 | TFLOPs: 31.61 | +7: iteration 112420/ 173500 | consumed samples: 28779520 | consumed tokens: 58940456960 | elapsed time per iteration (s): 0.42 | learning rate: 7.056E-05 | global batch size: 256 | lm loss: 2.903349E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.244 | TFLOPs: 31.70 | +7: iteration 112430/ 173500 | consumed samples: 28782080 | consumed tokens: 58945699840 | elapsed time per iteration (s): 0.44 | learning rate: 7.054E-05 | global batch size: 256 | lm loss: 2.895032E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.628 | TFLOPs: 30.57 | +7: iteration 112440/ 173500 | consumed samples: 28784640 | consumed tokens: 58950942720 | elapsed time per iteration (s): 0.42 | learning rate: 7.053E-05 | global batch size: 256 | lm loss: 2.906768E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.283 | TFLOPs: 31.81 | +7: iteration 112450/ 173500 | consumed samples: 28787200 | consumed tokens: 58956185600 | elapsed time per iteration (s): 0.43 | learning rate: 7.051E-05 | global batch size: 256 | lm loss: 2.908334E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.744 | TFLOPs: 31.26 | +7: iteration 112460/ 173500 | consumed samples: 28789760 | consumed tokens: 58961428480 | elapsed time per iteration (s): 0.42 | learning rate: 7.050E-05 | global batch size: 256 | lm loss: 2.913157E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.462 | TFLOPs: 31.87 | +7: iteration 112470/ 173500 | consumed samples: 28792320 | consumed tokens: 58966671360 | elapsed time per iteration (s): 0.43 | learning rate: 7.049E-05 | global batch size: 256 | lm loss: 2.904385E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.966 | TFLOPs: 31.32 | +7: iteration 112480/ 173500 | consumed samples: 28794880 | consumed tokens: 58971914240 | elapsed time per iteration (s): 0.42 | learning rate: 7.047E-05 | global batch size: 256 | lm loss: 2.889425E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.391 | TFLOPs: 31.92 | +7: iteration 112490/ 173500 | consumed samples: 28797440 | consumed tokens: 58977157120 | elapsed time per iteration (s): 0.43 | learning rate: 7.046E-05 | global batch size: 256 | lm loss: 2.915578E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.890 | TFLOPs: 31.53 | +7: iteration 112500/ 173500 | consumed samples: 28800000 | consumed tokens: 58982400000 | elapsed time per iteration (s): 0.42 | learning rate: 7.044E-05 | global batch size: 256 | lm loss: 2.898071E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.533 | TFLOPs: 31.77 | +7: iteration 112510/ 173500 | consumed samples: 28802560 | consumed tokens: 58987642880 | elapsed time per iteration (s): 0.42 | learning rate: 7.043E-05 | global batch size: 256 | lm loss: 2.896797E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.429 | TFLOPs: 31.66 | +7: iteration 112520/ 173500 | consumed samples: 28805120 | consumed tokens: 58992885760 | elapsed time per iteration (s): 0.43 | learning rate: 7.041E-05 | global batch size: 256 | lm loss: 2.907500E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.282 | TFLOPs: 31.34 | +7: iteration 112530/ 173500 | consumed samples: 28807680 | consumed tokens: 58998128640 | elapsed time per iteration (s): 0.42 | learning rate: 7.040E-05 | global batch size: 256 | lm loss: 2.905482E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.629 | TFLOPs: 31.67 | +7: iteration 112540/ 173500 | consumed samples: 28810240 | consumed tokens: 59003371520 | elapsed time per iteration (s): 0.43 | learning rate: 7.038E-05 | global batch size: 256 | lm loss: 2.885785E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.675 | TFLOPs: 31.10 | +7: iteration 112550/ 173500 | consumed samples: 28812800 | consumed tokens: 59008614400 | elapsed time per iteration (s): 0.42 | learning rate: 7.037E-05 | global batch size: 256 | lm loss: 2.916681E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.642 | TFLOPs: 31.72 | +7: iteration 112560/ 173500 | consumed samples: 28815360 | consumed tokens: 59013857280 | elapsed time per iteration (s): 0.42 | learning rate: 7.035E-05 | global batch size: 256 | lm loss: 2.907921E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.411 | TFLOPs: 31.71 | +7: iteration 112570/ 173500 | consumed samples: 28817920 | consumed tokens: 59019100160 | elapsed time per iteration (s): 0.43 | learning rate: 7.034E-05 | global batch size: 256 | lm loss: 2.891004E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.751 | TFLOPs: 31.26 | +7: iteration 112580/ 173500 | consumed samples: 28820480 | consumed tokens: 59024343040 | elapsed time per iteration (s): 0.42 | learning rate: 7.032E-05 | global batch size: 256 | lm loss: 2.895703E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.168 | TFLOPs: 31.80 | +7: iteration 112590/ 173500 | consumed samples: 28823040 | consumed tokens: 59029585920 | elapsed time per iteration (s): 0.43 | learning rate: 7.031E-05 | global batch size: 256 | lm loss: 2.906520E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.574 | TFLOPs: 31.41 | +7: iteration 112600/ 173500 | consumed samples: 28825600 | consumed tokens: 59034828800 | elapsed time per iteration (s): 0.42 | learning rate: 7.029E-05 | global batch size: 256 | lm loss: 2.901028E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.947 | TFLOPs: 31.79 | +7: iteration 112610/ 173500 | consumed samples: 28828160 | consumed tokens: 59040071680 | elapsed time per iteration (s): 0.43 | learning rate: 7.028E-05 | global batch size: 256 | lm loss: 2.896414E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.441 | TFLOPs: 31.56 | +7: iteration 112620/ 173500 | consumed samples: 28830720 | consumed tokens: 59045314560 | elapsed time per iteration (s): 0.42 | learning rate: 7.026E-05 | global batch size: 256 | lm loss: 2.913802E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.524 | TFLOPs: 31.77 | +7: iteration 112630/ 173500 | consumed samples: 28833280 | consumed tokens: 59050557440 | elapsed time per iteration (s): 0.42 | learning rate: 7.025E-05 | global batch size: 256 | lm loss: 2.905558E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.413 | TFLOPs: 31.97 | +7: iteration 112640/ 173500 | consumed samples: 28835840 | consumed tokens: 59055800320 | elapsed time per iteration (s): 0.43 | learning rate: 7.023E-05 | global batch size: 256 | lm loss: 2.905744E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.784 | TFLOPs: 31.57 | +7: iteration 112650/ 173500 | consumed samples: 28838400 | consumed tokens: 59061043200 | elapsed time per iteration (s): 0.43 | learning rate: 7.022E-05 | global batch size: 256 | lm loss: 2.904457E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.294 | TFLOPs: 31.34 | +7: iteration 112660/ 173500 | consumed samples: 28840960 | consumed tokens: 59066286080 | elapsed time per iteration (s): 0.42 | learning rate: 7.020E-05 | global batch size: 256 | lm loss: 2.895300E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.010 | TFLOPs: 31.80 | +7: iteration 112670/ 173500 | consumed samples: 28843520 | consumed tokens: 59071528960 | elapsed time per iteration (s): 0.43 | learning rate: 7.019E-05 | global batch size: 256 | lm loss: 2.899326E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.163 | TFLOPs: 31.54 | +7: iteration 112680/ 173500 | consumed samples: 28846080 | consumed tokens: 59076771840 | elapsed time per iteration (s): 0.43 | learning rate: 7.017E-05 | global batch size: 256 | lm loss: 2.908715E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.991 | TFLOPs: 31.27 | +7: iteration 112690/ 173500 | consumed samples: 28848640 | consumed tokens: 59082014720 | elapsed time per iteration (s): 0.43 | learning rate: 7.016E-05 | global batch size: 256 | lm loss: 2.916299E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.457 | TFLOPs: 31.56 | +7: iteration 112700/ 173500 | consumed samples: 28851200 | consumed tokens: 59087257600 | elapsed time per iteration (s): 0.43 | learning rate: 7.015E-05 | global batch size: 256 | lm loss: 2.895913E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.810 | TFLOPs: 31.37 | +7: iteration 112710/ 173500 | consumed samples: 28853760 | consumed tokens: 59092500480 | elapsed time per iteration (s): 0.43 | learning rate: 7.013E-05 | global batch size: 256 | lm loss: 2.895301E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.597 | TFLOPs: 30.94 | +7: iteration 112720/ 173500 | consumed samples: 28856320 | consumed tokens: 59097743360 | elapsed time per iteration (s): 0.44 | learning rate: 7.012E-05 | global batch size: 256 | lm loss: 2.910810E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.503 | TFLOPs: 30.67 | +7: iteration 112730/ 173500 | consumed samples: 28858880 | consumed tokens: 59102986240 | elapsed time per iteration (s): 0.44 | learning rate: 7.010E-05 | global batch size: 256 | lm loss: 2.899692E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.949 | TFLOPs: 30.80 | +7: iteration 112740/ 173500 | consumed samples: 28861440 | consumed tokens: 59108229120 | elapsed time per iteration (s): 0.43 | learning rate: 7.009E-05 | global batch size: 256 | lm loss: 2.905164E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.124 | TFLOPs: 31.49 | +7: iteration 112750/ 173500 | consumed samples: 28864000 | consumed tokens: 59113472000 | elapsed time per iteration (s): 0.42 | learning rate: 7.007E-05 | global batch size: 256 | lm loss: 2.897419E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.787 | TFLOPs: 31.84 | +7: iteration 112760/ 173500 | consumed samples: 28866560 | consumed tokens: 59118714880 | elapsed time per iteration (s): 0.44 | learning rate: 7.006E-05 | global batch size: 256 | lm loss: 2.886556E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.776 | TFLOPs: 30.79 | +7: iteration 112770/ 173500 | consumed samples: 28869120 | consumed tokens: 59123957760 | elapsed time per iteration (s): 0.43 | learning rate: 7.004E-05 | global batch size: 256 | lm loss: 2.913289E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.569 | TFLOPs: 31.30 | +7: iteration 112780/ 173500 | consumed samples: 28871680 | consumed tokens: 59129200640 | elapsed time per iteration (s): 0.43 | learning rate: 7.003E-05 | global batch size: 256 | lm loss: 2.895791E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.410 | TFLOPs: 31.50 | +7: iteration 112790/ 173500 | consumed samples: 28874240 | consumed tokens: 59134443520 | elapsed time per iteration (s): 0.42 | learning rate: 7.001E-05 | global batch size: 256 | lm loss: 2.906003E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.816 | TFLOPs: 31.84 | +7: iteration 112800/ 173500 | consumed samples: 28876800 | consumed tokens: 59139686400 | elapsed time per iteration (s): 0.43 | learning rate: 7.000E-05 | global batch size: 256 | lm loss: 2.899814E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.725 | TFLOPs: 31.15 | +7: iteration 112810/ 173500 | consumed samples: 28879360 | consumed tokens: 59144929280 | elapsed time per iteration (s): 0.43 | learning rate: 6.998E-05 | global batch size: 256 | lm loss: 2.894697E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.177 | TFLOPs: 31.54 | +7: iteration 112820/ 173500 | consumed samples: 28881920 | consumed tokens: 59150172160 | elapsed time per iteration (s): 0.43 | learning rate: 6.997E-05 | global batch size: 256 | lm loss: 2.915538E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.202 | TFLOPs: 31.49 | +7: iteration 112830/ 173500 | consumed samples: 28884480 | consumed tokens: 59155415040 | elapsed time per iteration (s): 0.43 | learning rate: 6.995E-05 | global batch size: 256 | lm loss: 2.904267E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.531 | TFLOPs: 31.51 | +7: iteration 112840/ 173500 | consumed samples: 28887040 | consumed tokens: 59160657920 | elapsed time per iteration (s): 0.43 | learning rate: 6.994E-05 | global batch size: 256 | lm loss: 2.905105E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.642 | TFLOPs: 30.89 | +7: iteration 112850/ 173500 | consumed samples: 28889600 | consumed tokens: 59165900800 | elapsed time per iteration (s): 0.43 | learning rate: 6.992E-05 | global batch size: 256 | lm loss: 2.900246E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.290 | TFLOPs: 31.02 | +7: iteration 112860/ 173500 | consumed samples: 28892160 | consumed tokens: 59171143680 | elapsed time per iteration (s): 0.43 | learning rate: 6.991E-05 | global batch size: 256 | lm loss: 2.912998E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.181 | TFLOPs: 31.54 | +7: iteration 112870/ 173500 | consumed samples: 28894720 | consumed tokens: 59176386560 | elapsed time per iteration (s): 0.43 | learning rate: 6.989E-05 | global batch size: 256 | lm loss: 2.903304E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.123 | TFLOPs: 30.96 | +7: iteration 112880/ 173500 | consumed samples: 28897280 | consumed tokens: 59181629440 | elapsed time per iteration (s): 0.43 | learning rate: 6.988E-05 | global batch size: 256 | lm loss: 2.911341E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.804 | TFLOPs: 31.26 | +7: iteration 112890/ 173500 | consumed samples: 28899840 | consumed tokens: 59186872320 | elapsed time per iteration (s): 0.42 | learning rate: 6.987E-05 | global batch size: 256 | lm loss: 2.901189E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.303 | TFLOPs: 31.71 | +7: iteration 112900/ 173500 | consumed samples: 28902400 | consumed tokens: 59192115200 | elapsed time per iteration (s): 0.42 | learning rate: 6.985E-05 | global batch size: 256 | lm loss: 2.905059E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.299 | TFLOPs: 31.65 | +7: iteration 112910/ 173500 | consumed samples: 28904960 | consumed tokens: 59197358080 | elapsed time per iteration (s): 0.43 | learning rate: 6.984E-05 | global batch size: 256 | lm loss: 2.907499E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.250 | TFLOPs: 31.49 | +7: iteration 112920/ 173500 | consumed samples: 28907520 | consumed tokens: 59202600960 | elapsed time per iteration (s): 0.42 | learning rate: 6.982E-05 | global batch size: 256 | lm loss: 2.905363E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.559 | TFLOPs: 32.04 | +7: iteration 112930/ 173500 | consumed samples: 28910080 | consumed tokens: 59207843840 | elapsed time per iteration (s): 0.44 | learning rate: 6.981E-05 | global batch size: 256 | lm loss: 2.900987E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.472 | TFLOPs: 30.67 | +7: iteration 112940/ 173500 | consumed samples: 28912640 | consumed tokens: 59213086720 | elapsed time per iteration (s): 0.43 | learning rate: 6.979E-05 | global batch size: 256 | lm loss: 2.907567E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.485 | TFLOPs: 31.40 | +7: iteration 112950/ 173500 | consumed samples: 28915200 | consumed tokens: 59218329600 | elapsed time per iteration (s): 0.43 | learning rate: 6.978E-05 | global batch size: 256 | lm loss: 2.911905E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.241 | TFLOPs: 31.44 | +7: iteration 112960/ 173500 | consumed samples: 28917760 | consumed tokens: 59223572480 | elapsed time per iteration (s): 0.43 | learning rate: 6.976E-05 | global batch size: 256 | lm loss: 2.904489E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.887 | TFLOPs: 31.58 | +7: iteration 112970/ 173500 | consumed samples: 28920320 | consumed tokens: 59228815360 | elapsed time per iteration (s): 0.43 | learning rate: 6.975E-05 | global batch size: 256 | lm loss: 2.899603E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.056 | TFLOPs: 31.12 | +7: iteration 112980/ 173500 | consumed samples: 28922880 | consumed tokens: 59234058240 | elapsed time per iteration (s): 0.43 | learning rate: 6.973E-05 | global batch size: 256 | lm loss: 2.898893E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.996 | TFLOPs: 31.53 | +7: iteration 112990/ 173500 | consumed samples: 28925440 | consumed tokens: 59239301120 | elapsed time per iteration (s): 0.43 | learning rate: 6.972E-05 | global batch size: 256 | lm loss: 2.890698E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.925 | TFLOPs: 31.48 | +7: iteration 113000/ 173500 | consumed samples: 28928000 | consumed tokens: 59244544000 | elapsed time per iteration (s): 0.43 | learning rate: 6.970E-05 | global batch size: 256 | lm loss: 2.898100E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.599 | TFLOPs: 31.30 | +7: iteration 113010/ 173500 | consumed samples: 28930560 | consumed tokens: 59249786880 | elapsed time per iteration (s): 0.42 | learning rate: 6.969E-05 | global batch size: 256 | lm loss: 2.917146E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.303 | TFLOPs: 31.86 | +7: iteration 113020/ 173500 | consumed samples: 28933120 | consumed tokens: 59255029760 | elapsed time per iteration (s): 0.42 | learning rate: 6.967E-05 | global batch size: 256 | lm loss: 2.902665E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.759 | TFLOPs: 32.05 | +7: iteration 113030/ 173500 | consumed samples: 28935680 | consumed tokens: 59260272640 | elapsed time per iteration (s): 0.43 | learning rate: 6.966E-05 | global batch size: 256 | lm loss: 2.895774E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.542 | TFLOPs: 31.14 | +7: iteration 113040/ 173500 | consumed samples: 28938240 | consumed tokens: 59265515520 | elapsed time per iteration (s): 0.43 | learning rate: 6.964E-05 | global batch size: 256 | lm loss: 2.897020E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.158 | TFLOPs: 31.54 | +7: iteration 113050/ 173500 | consumed samples: 28940800 | consumed tokens: 59270758400 | elapsed time per iteration (s): 0.42 | learning rate: 6.963E-05 | global batch size: 256 | lm loss: 2.901864E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.299 | TFLOPs: 31.71 | +7: iteration 113060/ 173500 | consumed samples: 28943360 | consumed tokens: 59276001280 | elapsed time per iteration (s): 0.42 | learning rate: 6.961E-05 | global batch size: 256 | lm loss: 2.894480E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.920 | TFLOPs: 31.74 | +7: iteration 113070/ 173500 | consumed samples: 28945920 | consumed tokens: 59281244160 | elapsed time per iteration (s): 0.42 | learning rate: 6.960E-05 | global batch size: 256 | lm loss: 2.890833E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.314 | TFLOPs: 31.76 | +7: iteration 113080/ 173500 | consumed samples: 28948480 | consumed tokens: 59286487040 | elapsed time per iteration (s): 0.42 | learning rate: 6.959E-05 | global batch size: 256 | lm loss: 2.893175E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.485 | TFLOPs: 31.61 | +7: iteration 113090/ 173500 | consumed samples: 28951040 | consumed tokens: 59291729920 | elapsed time per iteration (s): 0.43 | learning rate: 6.957E-05 | global batch size: 256 | lm loss: 2.896463E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.437 | TFLOPs: 31.50 | +7: iteration 113100/ 173500 | consumed samples: 28953600 | consumed tokens: 59296972800 | elapsed time per iteration (s): 0.43 | learning rate: 6.956E-05 | global batch size: 256 | lm loss: 2.900350E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.655 | TFLOPs: 31.20 | +7: iteration 113110/ 173500 | consumed samples: 28956160 | consumed tokens: 59302215680 | elapsed time per iteration (s): 0.43 | learning rate: 6.954E-05 | global batch size: 256 | lm loss: 2.901360E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.243 | TFLOPs: 31.02 | +7: iteration 113120/ 173500 | consumed samples: 28958720 | consumed tokens: 59307458560 | elapsed time per iteration (s): 0.42 | learning rate: 6.953E-05 | global batch size: 256 | lm loss: 2.895248E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.874 | TFLOPs: 31.74 | +7: iteration 113130/ 173500 | consumed samples: 28961280 | consumed tokens: 59312701440 | elapsed time per iteration (s): 0.42 | learning rate: 6.951E-05 | global batch size: 256 | lm loss: 2.922462E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.400 | TFLOPs: 32.03 | +7: iteration 113140/ 173500 | consumed samples: 28963840 | consumed tokens: 59317944320 | elapsed time per iteration (s): 0.43 | learning rate: 6.950E-05 | global batch size: 256 | lm loss: 2.902104E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.079 | TFLOPs: 31.49 | +7: iteration 113150/ 173500 | consumed samples: 28966400 | consumed tokens: 59323187200 | elapsed time per iteration (s): 0.42 | learning rate: 6.948E-05 | global batch size: 256 | lm loss: 2.908341E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.278 | TFLOPs: 31.71 | +7: iteration 113160/ 173500 | consumed samples: 28968960 | consumed tokens: 59328430080 | elapsed time per iteration (s): 0.42 | learning rate: 6.947E-05 | global batch size: 256 | lm loss: 2.906469E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.903 | TFLOPs: 31.69 | +7: iteration 113170/ 173500 | consumed samples: 28971520 | consumed tokens: 59333672960 | elapsed time per iteration (s): 0.42 | learning rate: 6.945E-05 | global batch size: 256 | lm loss: 2.914169E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.968 | TFLOPs: 31.64 | +7: iteration 113180/ 173500 | consumed samples: 28974080 | consumed tokens: 59338915840 | elapsed time per iteration (s): 0.42 | learning rate: 6.944E-05 | global batch size: 256 | lm loss: 2.904661E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.889 | TFLOPs: 31.84 | +7: iteration 113190/ 173500 | consumed samples: 28976640 | consumed tokens: 59344158720 | elapsed time per iteration (s): 0.43 | learning rate: 6.942E-05 | global batch size: 256 | lm loss: 2.899570E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.888 | TFLOPs: 31.42 | +7: iteration 113200/ 173500 | consumed samples: 28979200 | consumed tokens: 59349401600 | elapsed time per iteration (s): 0.43 | learning rate: 6.941E-05 | global batch size: 256 | lm loss: 2.896889E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.172 | TFLOPs: 31.44 | +7: iteration 113210/ 173500 | consumed samples: 28981760 | consumed tokens: 59354644480 | elapsed time per iteration (s): 0.42 | learning rate: 6.939E-05 | global batch size: 256 | lm loss: 2.901269E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.393 | TFLOPs: 31.61 | +7: iteration 113220/ 173500 | consumed samples: 28984320 | consumed tokens: 59359887360 | elapsed time per iteration (s): 0.43 | learning rate: 6.938E-05 | global batch size: 256 | lm loss: 2.905965E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.488 | TFLOPs: 31.30 | +7: iteration 113230/ 173500 | consumed samples: 28986880 | consumed tokens: 59365130240 | elapsed time per iteration (s): 0.43 | learning rate: 6.936E-05 | global batch size: 256 | lm loss: 2.906506E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.811 | TFLOPs: 31.42 | +7: iteration 113240/ 173500 | consumed samples: 28989440 | consumed tokens: 59370373120 | elapsed time per iteration (s): 0.44 | learning rate: 6.935E-05 | global batch size: 256 | lm loss: 2.909429E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.306 | TFLOPs: 30.71 | +7: iteration 113250/ 173500 | consumed samples: 28992000 | consumed tokens: 59375616000 | elapsed time per iteration (s): 0.42 | learning rate: 6.934E-05 | global batch size: 256 | lm loss: 2.907854E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.366 | TFLOPs: 31.61 | +7: iteration 113260/ 173500 | consumed samples: 28994560 | consumed tokens: 59380858880 | elapsed time per iteration (s): 0.42 | learning rate: 6.932E-05 | global batch size: 256 | lm loss: 2.902857E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.584 | TFLOPs: 31.62 | +7: iteration 113270/ 173500 | consumed samples: 28997120 | consumed tokens: 59386101760 | elapsed time per iteration (s): 0.42 | learning rate: 6.931E-05 | global batch size: 256 | lm loss: 2.906336E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.748 | TFLOPs: 31.78 | +7: iteration 113280/ 173500 | consumed samples: 28999680 | consumed tokens: 59391344640 | elapsed time per iteration (s): 0.42 | learning rate: 6.929E-05 | global batch size: 256 | lm loss: 2.907241E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.999 | TFLOPs: 31.69 | +7: iteration 113290/ 173500 | consumed samples: 29002240 | consumed tokens: 59396587520 | elapsed time per iteration (s): 0.42 | learning rate: 6.928E-05 | global batch size: 256 | lm loss: 2.884912E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.460 | TFLOPs: 31.61 | +7: iteration 113300/ 173500 | consumed samples: 29004800 | consumed tokens: 59401830400 | elapsed time per iteration (s): 0.43 | learning rate: 6.926E-05 | global batch size: 256 | lm loss: 2.903353E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.938 | TFLOPs: 31.37 | +7: iteration 113310/ 173500 | consumed samples: 29007360 | consumed tokens: 59407073280 | elapsed time per iteration (s): 0.42 | learning rate: 6.925E-05 | global batch size: 256 | lm loss: 2.898946E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.222 | TFLOPs: 31.76 | +7: iteration 113320/ 173500 | consumed samples: 29009920 | consumed tokens: 59412316160 | elapsed time per iteration (s): 0.43 | learning rate: 6.923E-05 | global batch size: 256 | lm loss: 2.916528E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.507 | TFLOPs: 31.51 | +7: iteration 113330/ 173500 | consumed samples: 29012480 | consumed tokens: 59417559040 | elapsed time per iteration (s): 0.42 | learning rate: 6.922E-05 | global batch size: 256 | lm loss: 2.906779E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.136 | TFLOPs: 31.86 | +7: iteration 113340/ 173500 | consumed samples: 29015040 | consumed tokens: 59422801920 | elapsed time per iteration (s): 0.42 | learning rate: 6.920E-05 | global batch size: 256 | lm loss: 2.900156E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.042 | TFLOPs: 31.85 | +7: iteration 113350/ 173500 | consumed samples: 29017600 | consumed tokens: 59428044800 | elapsed time per iteration (s): 0.43 | learning rate: 6.919E-05 | global batch size: 256 | lm loss: 2.889289E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.733 | TFLOPs: 31.52 | +7: iteration 113360/ 173500 | consumed samples: 29020160 | consumed tokens: 59433287680 | elapsed time per iteration (s): 0.43 | learning rate: 6.917E-05 | global batch size: 256 | lm loss: 2.918471E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.172 | TFLOPs: 31.49 | +7: iteration 113370/ 173500 | consumed samples: 29022720 | consumed tokens: 59438530560 | elapsed time per iteration (s): 0.42 | learning rate: 6.916E-05 | global batch size: 256 | lm loss: 2.889215E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.327 | TFLOPs: 32.02 | +7: iteration 113380/ 173500 | consumed samples: 29025280 | consumed tokens: 59443773440 | elapsed time per iteration (s): 0.43 | learning rate: 6.914E-05 | global batch size: 256 | lm loss: 2.904051E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.564 | TFLOPs: 30.88 | +7: iteration 113390/ 173500 | consumed samples: 29027840 | consumed tokens: 59449016320 | elapsed time per iteration (s): 0.42 | learning rate: 6.913E-05 | global batch size: 256 | lm loss: 2.906700E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.982 | TFLOPs: 32.06 | +7: iteration 113400/ 173500 | consumed samples: 29030400 | consumed tokens: 59454259200 | elapsed time per iteration (s): 0.42 | learning rate: 6.912E-05 | global batch size: 256 | lm loss: 2.887584E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.887 | TFLOPs: 31.79 | +7: iteration 113410/ 173500 | consumed samples: 29032960 | consumed tokens: 59459502080 | elapsed time per iteration (s): 0.42 | learning rate: 6.910E-05 | global batch size: 256 | lm loss: 2.906020E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.282 | TFLOPs: 32.02 | +7: iteration 113420/ 173500 | consumed samples: 29035520 | consumed tokens: 59464744960 | elapsed time per iteration (s): 0.42 | learning rate: 6.909E-05 | global batch size: 256 | lm loss: 2.899633E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.814 | TFLOPs: 32.00 | +7: iteration 113430/ 173500 | consumed samples: 29038080 | consumed tokens: 59469987840 | elapsed time per iteration (s): 0.42 | learning rate: 6.907E-05 | global batch size: 256 | lm loss: 2.890691E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.309 | TFLOPs: 31.92 | +7: iteration 113440/ 173500 | consumed samples: 29040640 | consumed tokens: 59475230720 | elapsed time per iteration (s): 0.42 | learning rate: 6.906E-05 | global batch size: 256 | lm loss: 2.899705E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.476 | TFLOPs: 31.93 | +7: iteration 113450/ 173500 | consumed samples: 29043200 | consumed tokens: 59480473600 | elapsed time per iteration (s): 0.43 | learning rate: 6.904E-05 | global batch size: 256 | lm loss: 2.903089E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.757 | TFLOPs: 31.57 | +7: iteration 113460/ 173500 | consumed samples: 29045760 | consumed tokens: 59485716480 | elapsed time per iteration (s): 0.42 | learning rate: 6.903E-05 | global batch size: 256 | lm loss: 2.910112E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.839 | TFLOPs: 32.00 | +7: iteration 113470/ 173500 | consumed samples: 29048320 | consumed tokens: 59490959360 | elapsed time per iteration (s): 0.43 | learning rate: 6.901E-05 | global batch size: 256 | lm loss: 2.899012E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.561 | TFLOPs: 31.41 | +7: iteration 113480/ 173500 | consumed samples: 29050880 | consumed tokens: 59496202240 | elapsed time per iteration (s): 0.43 | learning rate: 6.900E-05 | global batch size: 256 | lm loss: 2.900113E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.209 | TFLOPs: 31.60 | +7: iteration 113490/ 173500 | consumed samples: 29053440 | consumed tokens: 59501445120 | elapsed time per iteration (s): 0.42 | learning rate: 6.898E-05 | global batch size: 256 | lm loss: 2.903180E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.595 | TFLOPs: 31.83 | +7: iteration 113500/ 173500 | consumed samples: 29056000 | consumed tokens: 59506688000 | elapsed time per iteration (s): 0.42 | learning rate: 6.897E-05 | global batch size: 256 | lm loss: 2.889119E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.454 | TFLOPs: 31.77 | +7: iteration 113510/ 173500 | consumed samples: 29058560 | consumed tokens: 59511930880 | elapsed time per iteration (s): 0.42 | learning rate: 6.895E-05 | global batch size: 256 | lm loss: 2.890728E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.631 | TFLOPs: 31.83 | +7: iteration 113520/ 173500 | consumed samples: 29061120 | consumed tokens: 59517173760 | elapsed time per iteration (s): 0.42 | learning rate: 6.894E-05 | global batch size: 256 | lm loss: 2.906707E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.349 | TFLOPs: 31.87 | +7: iteration 113530/ 173500 | consumed samples: 29063680 | consumed tokens: 59522416640 | elapsed time per iteration (s): 0.42 | learning rate: 6.892E-05 | global batch size: 256 | lm loss: 2.903461E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.923 | TFLOPs: 31.95 | +7: iteration 113540/ 173500 | consumed samples: 29066240 | consumed tokens: 59527659520 | elapsed time per iteration (s): 0.43 | learning rate: 6.891E-05 | global batch size: 256 | lm loss: 2.900940E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.908 | TFLOPs: 31.53 | +7: iteration 113550/ 173500 | consumed samples: 29068800 | consumed tokens: 59532902400 | elapsed time per iteration (s): 0.42 | learning rate: 6.890E-05 | global batch size: 256 | lm loss: 2.901263E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.904 | TFLOPs: 31.95 | +7: iteration 113560/ 173500 | consumed samples: 29071360 | consumed tokens: 59538145280 | elapsed time per iteration (s): 0.42 | learning rate: 6.888E-05 | global batch size: 256 | lm loss: 2.896004E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.622 | TFLOPs: 31.88 | +7: iteration 113570/ 173500 | consumed samples: 29073920 | consumed tokens: 59543388160 | elapsed time per iteration (s): 0.42 | learning rate: 6.887E-05 | global batch size: 256 | lm loss: 2.921457E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.186 | TFLOPs: 31.86 | +7: iteration 113580/ 173500 | consumed samples: 29076480 | consumed tokens: 59548631040 | elapsed time per iteration (s): 0.42 | learning rate: 6.885E-05 | global batch size: 256 | lm loss: 2.896034E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.174 | TFLOPs: 31.80 | +7: iteration 113590/ 173500 | consumed samples: 29079040 | consumed tokens: 59553873920 | elapsed time per iteration (s): 0.42 | learning rate: 6.884E-05 | global batch size: 256 | lm loss: 2.904179E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.018 | TFLOPs: 31.95 | +7: iteration 113600/ 173500 | consumed samples: 29081600 | consumed tokens: 59559116800 | elapsed time per iteration (s): 0.42 | learning rate: 6.882E-05 | global batch size: 256 | lm loss: 2.915833E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.162 | TFLOPs: 31.91 | +7: iteration 113610/ 173500 | consumed samples: 29084160 | consumed tokens: 59564359680 | elapsed time per iteration (s): 0.42 | learning rate: 6.881E-05 | global batch size: 256 | lm loss: 2.896428E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.439 | TFLOPs: 31.92 | +7: iteration 113620/ 173500 | consumed samples: 29086720 | consumed tokens: 59569602560 | elapsed time per iteration (s): 0.42 | learning rate: 6.879E-05 | global batch size: 256 | lm loss: 2.899127E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.695 | TFLOPs: 31.67 | +7: iteration 113630/ 173500 | consumed samples: 29089280 | consumed tokens: 59574845440 | elapsed time per iteration (s): 0.42 | learning rate: 6.878E-05 | global batch size: 256 | lm loss: 2.900501E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.762 | TFLOPs: 31.63 | +7: iteration 113640/ 173500 | consumed samples: 29091840 | consumed tokens: 59580088320 | elapsed time per iteration (s): 0.42 | learning rate: 6.876E-05 | global batch size: 256 | lm loss: 2.916090E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.524 | TFLOPs: 31.88 | +7: iteration 113650/ 173500 | consumed samples: 29094400 | consumed tokens: 59585331200 | elapsed time per iteration (s): 0.42 | learning rate: 6.875E-05 | global batch size: 256 | lm loss: 2.896627E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.364 | TFLOPs: 31.87 | +7: iteration 113660/ 173500 | consumed samples: 29096960 | consumed tokens: 59590574080 | elapsed time per iteration (s): 0.42 | learning rate: 6.873E-05 | global batch size: 256 | lm loss: 2.893067E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.615 | TFLOPs: 31.83 | +7: iteration 113670/ 173500 | consumed samples: 29099520 | consumed tokens: 59595816960 | elapsed time per iteration (s): 0.42 | learning rate: 6.872E-05 | global batch size: 256 | lm loss: 2.902935E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.524 | TFLOPs: 31.72 | +7: iteration 113680/ 173500 | consumed samples: 29102080 | consumed tokens: 59601059840 | elapsed time per iteration (s): 0.45 | learning rate: 6.871E-05 | global batch size: 256 | lm loss: 2.897002E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.447 | TFLOPs: 30.09 | +7: iteration 113690/ 173500 | consumed samples: 29104640 | consumed tokens: 59606302720 | elapsed time per iteration (s): 0.42 | learning rate: 6.869E-05 | global batch size: 256 | lm loss: 2.902493E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.220 | TFLOPs: 31.75 | +7: iteration 113700/ 173500 | consumed samples: 29107200 | consumed tokens: 59611545600 | elapsed time per iteration (s): 0.43 | learning rate: 6.868E-05 | global batch size: 256 | lm loss: 2.900579E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.269 | TFLOPs: 31.39 | +7: iteration 113710/ 173500 | consumed samples: 29109760 | consumed tokens: 59616788480 | elapsed time per iteration (s): 0.42 | learning rate: 6.866E-05 | global batch size: 256 | lm loss: 2.898792E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.560 | TFLOPs: 31.98 | +7: iteration 113720/ 173500 | consumed samples: 29112320 | consumed tokens: 59622031360 | elapsed time per iteration (s): 0.42 | learning rate: 6.865E-05 | global batch size: 256 | lm loss: 2.912659E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.031 | TFLOPs: 31.95 | +7: iteration 113730/ 173500 | consumed samples: 29114880 | consumed tokens: 59627274240 | elapsed time per iteration (s): 0.42 | learning rate: 6.863E-05 | global batch size: 256 | lm loss: 2.910728E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.429 | TFLOPs: 31.77 | +7: iteration 113740/ 173500 | consumed samples: 29117440 | consumed tokens: 59632517120 | elapsed time per iteration (s): 0.44 | learning rate: 6.862E-05 | global batch size: 256 | lm loss: 2.915450E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.783 | TFLOPs: 30.32 | +7: iteration 113750/ 173500 | consumed samples: 29120000 | consumed tokens: 59637760000 | elapsed time per iteration (s): 0.42 | learning rate: 6.860E-05 | global batch size: 256 | lm loss: 2.899533E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.778 | TFLOPs: 31.73 | +7: iteration 113760/ 173500 | consumed samples: 29122560 | consumed tokens: 59643002880 | elapsed time per iteration (s): 0.42 | learning rate: 6.859E-05 | global batch size: 256 | lm loss: 2.898158E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.038 | TFLOPs: 31.80 | +7: iteration 113770/ 173500 | consumed samples: 29125120 | consumed tokens: 59648245760 | elapsed time per iteration (s): 0.42 | learning rate: 6.857E-05 | global batch size: 256 | lm loss: 2.904778E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.048 | TFLOPs: 32.01 | +7: iteration 113780/ 173500 | consumed samples: 29127680 | consumed tokens: 59653488640 | elapsed time per iteration (s): 0.43 | learning rate: 6.856E-05 | global batch size: 256 | lm loss: 2.894492E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.406 | TFLOPs: 31.03 | +7: iteration 113790/ 173500 | consumed samples: 29130240 | consumed tokens: 59658731520 | elapsed time per iteration (s): 0.45 | learning rate: 6.854E-05 | global batch size: 256 | lm loss: 2.905082E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.340 | TFLOPs: 29.71 | +7: iteration 113800/ 173500 | consumed samples: 29132800 | consumed tokens: 59663974400 | elapsed time per iteration (s): 0.42 | learning rate: 6.853E-05 | global batch size: 256 | lm loss: 2.901527E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.969 | TFLOPs: 31.74 | +7: iteration 113810/ 173500 | consumed samples: 29135360 | consumed tokens: 59669217280 | elapsed time per iteration (s): 0.42 | learning rate: 6.852E-05 | global batch size: 256 | lm loss: 2.896599E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.464 | TFLOPs: 32.03 | +7: iteration 113820/ 173500 | consumed samples: 29137920 | consumed tokens: 59674460160 | elapsed time per iteration (s): 0.43 | learning rate: 6.850E-05 | global batch size: 256 | lm loss: 2.915821E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.248 | TFLOPs: 30.97 | +7: iteration 113830/ 173500 | consumed samples: 29140480 | consumed tokens: 59679703040 | elapsed time per iteration (s): 0.42 | learning rate: 6.849E-05 | global batch size: 256 | lm loss: 2.891161E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.939 | TFLOPs: 32.00 | +7: iteration 113840/ 173500 | consumed samples: 29143040 | consumed tokens: 59684945920 | elapsed time per iteration (s): 0.42 | learning rate: 6.847E-05 | global batch size: 256 | lm loss: 2.898502E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.073 | TFLOPs: 32.01 | +7: iteration 113850/ 173500 | consumed samples: 29145600 | consumed tokens: 59690188800 | elapsed time per iteration (s): 0.42 | learning rate: 6.846E-05 | global batch size: 256 | lm loss: 2.896130E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.531 | TFLOPs: 31.77 | +7: iteration 113860/ 173500 | consumed samples: 29148160 | consumed tokens: 59695431680 | elapsed time per iteration (s): 0.42 | learning rate: 6.844E-05 | global batch size: 256 | lm loss: 2.902334E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.650 | TFLOPs: 31.78 | +7: iteration 113870/ 173500 | consumed samples: 29150720 | consumed tokens: 59700674560 | elapsed time per iteration (s): 0.42 | learning rate: 6.843E-05 | global batch size: 256 | lm loss: 2.897390E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.030 | TFLOPs: 32.01 | +7: iteration 113880/ 173500 | consumed samples: 29153280 | consumed tokens: 59705917440 | elapsed time per iteration (s): 0.42 | learning rate: 6.841E-05 | global batch size: 256 | lm loss: 2.891097E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.237 | TFLOPs: 31.81 | +7: iteration 113890/ 173500 | consumed samples: 29155840 | consumed tokens: 59711160320 | elapsed time per iteration (s): 0.42 | learning rate: 6.840E-05 | global batch size: 256 | lm loss: 2.898198E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.608 | TFLOPs: 31.99 | +7: iteration 113900/ 173500 | consumed samples: 29158400 | consumed tokens: 59716403200 | elapsed time per iteration (s): 0.43 | learning rate: 6.838E-05 | global batch size: 256 | lm loss: 2.902104E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.414 | TFLOPs: 30.93 | +7: iteration 113910/ 173500 | consumed samples: 29160960 | consumed tokens: 59721646080 | elapsed time per iteration (s): 0.42 | learning rate: 6.837E-05 | global batch size: 256 | lm loss: 2.911504E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.949 | TFLOPs: 31.69 | +7: iteration 113920/ 173500 | consumed samples: 29163520 | consumed tokens: 59726888960 | elapsed time per iteration (s): 0.48 | learning rate: 6.835E-05 | global batch size: 256 | lm loss: 2.894237E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.145 | TFLOPs: 28.13 | +7: iteration 113930/ 173500 | consumed samples: 29166080 | consumed tokens: 59732131840 | elapsed time per iteration (s): 0.46 | learning rate: 6.834E-05 | global batch size: 256 | lm loss: 2.896812E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.870 | TFLOPs: 29.22 | +7: iteration 113940/ 173500 | consumed samples: 29168640 | consumed tokens: 59737374720 | elapsed time per iteration (s): 0.46 | learning rate: 6.833E-05 | global batch size: 256 | lm loss: 2.901100E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.364 | TFLOPs: 29.19 | +7: iteration 113950/ 173500 | consumed samples: 29171200 | consumed tokens: 59742617600 | elapsed time per iteration (s): 0.47 | learning rate: 6.831E-05 | global batch size: 256 | lm loss: 2.893895E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 546.130 | TFLOPs: 28.65 | +7: iteration 113960/ 173500 | consumed samples: 29173760 | consumed tokens: 59747860480 | elapsed time per iteration (s): 0.46 | learning rate: 6.830E-05 | global batch size: 256 | lm loss: 2.905994E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.359 | TFLOPs: 29.03 | +7: iteration 113970/ 173500 | consumed samples: 29176320 | consumed tokens: 59753103360 | elapsed time per iteration (s): 0.44 | learning rate: 6.828E-05 | global batch size: 256 | lm loss: 2.899328E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.149 | TFLOPs: 30.86 | +7: iteration 113980/ 173500 | consumed samples: 29178880 | consumed tokens: 59758346240 | elapsed time per iteration (s): 0.45 | learning rate: 6.827E-05 | global batch size: 256 | lm loss: 2.882873E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.126 | TFLOPs: 29.86 | +7: iteration 113990/ 173500 | consumed samples: 29181440 | consumed tokens: 59763589120 | elapsed time per iteration (s): 0.43 | learning rate: 6.825E-05 | global batch size: 256 | lm loss: 2.896073E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.793 | TFLOPs: 31.10 | +0: [2023-03-17 12:43:11,708] [INFO] [logging.py:68:log_dist] [Rank 0] step=114000, skipped=0, lr=[6.823796836261315e-05, 6.823796836261315e-05, 6.823796836261315e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 114000/ 173500 | consumed samples: 29184000 | consumed tokens: 59768832000 | elapsed time per iteration (s): 0.48 | learning rate: 6.824E-05 | global batch size: 256 | lm loss: 2.896651E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.325 | TFLOPs: 27.93 | +0: steps: 114000 loss: 2.8917 iter time (s): 0.425 samples/sec: 601.946 +7: iteration 114010/ 173500 | consumed samples: 29186560 | consumed tokens: 59774074880 | elapsed time per iteration (s): 0.42 | learning rate: 6.822E-05 | global batch size: 256 | lm loss: 2.902794E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.703 | TFLOPs: 32.10 | +7: iteration 114020/ 173500 | consumed samples: 29189120 | consumed tokens: 59779317760 | elapsed time per iteration (s): 0.46 | learning rate: 6.821E-05 | global batch size: 256 | lm loss: 2.910549E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.968 | TFLOPs: 29.17 | +7: iteration 114030/ 173500 | consumed samples: 29191680 | consumed tokens: 59784560640 | elapsed time per iteration (s): 0.48 | learning rate: 6.819E-05 | global batch size: 256 | lm loss: 2.891471E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.841 | TFLOPs: 28.27 | +7: iteration 114040/ 173500 | consumed samples: 29194240 | consumed tokens: 59789803520 | elapsed time per iteration (s): 0.45 | learning rate: 6.818E-05 | global batch size: 256 | lm loss: 2.891386E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.226 | TFLOPs: 29.97 | +7: iteration 114050/ 173500 | consumed samples: 29196800 | consumed tokens: 59795046400 | elapsed time per iteration (s): 0.46 | learning rate: 6.817E-05 | global batch size: 256 | lm loss: 2.901368E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.600 | TFLOPs: 29.05 | +7: iteration 114060/ 173500 | consumed samples: 29199360 | consumed tokens: 59800289280 | elapsed time per iteration (s): 0.48 | learning rate: 6.815E-05 | global batch size: 256 | lm loss: 2.895093E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.400 | TFLOPs: 27.88 | +7: iteration 114070/ 173500 | consumed samples: 29201920 | consumed tokens: 59805532160 | elapsed time per iteration (s): 0.43 | learning rate: 6.814E-05 | global batch size: 256 | lm loss: 2.908378E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.050 | TFLOPs: 31.17 | +7: iteration 114080/ 173500 | consumed samples: 29204480 | consumed tokens: 59810775040 | elapsed time per iteration (s): 0.46 | learning rate: 6.812E-05 | global batch size: 256 | lm loss: 2.891838E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.656 | TFLOPs: 29.21 | +7: iteration 114090/ 173500 | consumed samples: 29207040 | consumed tokens: 59816017920 | elapsed time per iteration (s): 0.42 | learning rate: 6.811E-05 | global batch size: 256 | lm loss: 2.914474E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.199 | TFLOPs: 32.23 | +7: iteration 114100/ 173500 | consumed samples: 29209600 | consumed tokens: 59821260800 | elapsed time per iteration (s): 0.43 | learning rate: 6.809E-05 | global batch size: 256 | lm loss: 2.908260E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.246 | TFLOPs: 31.23 | +7: iteration 114110/ 173500 | consumed samples: 29212160 | consumed tokens: 59826503680 | elapsed time per iteration (s): 0.42 | learning rate: 6.808E-05 | global batch size: 256 | lm loss: 2.905345E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.860 | TFLOPs: 31.79 | +7: iteration 114120/ 173500 | consumed samples: 29214720 | consumed tokens: 59831746560 | elapsed time per iteration (s): 0.43 | learning rate: 6.806E-05 | global batch size: 256 | lm loss: 2.894005E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.617 | TFLOPs: 31.20 | +7: iteration 114130/ 173500 | consumed samples: 29217280 | consumed tokens: 59836989440 | elapsed time per iteration (s): 0.42 | learning rate: 6.805E-05 | global batch size: 256 | lm loss: 2.905756E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.751 | TFLOPs: 31.84 | +7: iteration 114140/ 173500 | consumed samples: 29219840 | consumed tokens: 59842232320 | elapsed time per iteration (s): 0.42 | learning rate: 6.803E-05 | global batch size: 256 | lm loss: 2.897739E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.616 | TFLOPs: 32.14 | +7: iteration 114150/ 173500 | consumed samples: 29222400 | consumed tokens: 59847475200 | elapsed time per iteration (s): 0.42 | learning rate: 6.802E-05 | global batch size: 256 | lm loss: 2.892830E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.835 | TFLOPs: 31.79 | +7: iteration 114160/ 173500 | consumed samples: 29224960 | consumed tokens: 59852718080 | elapsed time per iteration (s): 0.42 | learning rate: 6.800E-05 | global batch size: 256 | lm loss: 2.913363E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.837 | TFLOPs: 32.10 | +7: iteration 114170/ 173500 | consumed samples: 29227520 | consumed tokens: 59857960960 | elapsed time per iteration (s): 0.42 | learning rate: 6.799E-05 | global batch size: 256 | lm loss: 2.913771E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.071 | TFLOPs: 32.06 | +7: iteration 114180/ 173500 | consumed samples: 29230080 | consumed tokens: 59863203840 | elapsed time per iteration (s): 0.42 | learning rate: 6.798E-05 | global batch size: 256 | lm loss: 2.890489E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.421 | TFLOPs: 32.03 | +7: iteration 114190/ 173500 | consumed samples: 29232640 | consumed tokens: 59868446720 | elapsed time per iteration (s): 0.42 | learning rate: 6.796E-05 | global batch size: 256 | lm loss: 2.904365E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.613 | TFLOPs: 32.04 | +7: iteration 114200/ 173500 | consumed samples: 29235200 | consumed tokens: 59873689600 | elapsed time per iteration (s): 0.42 | learning rate: 6.795E-05 | global batch size: 256 | lm loss: 2.896279E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.341 | TFLOPs: 32.02 | +7: iteration 114210/ 173500 | consumed samples: 29237760 | consumed tokens: 59878932480 | elapsed time per iteration (s): 0.42 | learning rate: 6.793E-05 | global batch size: 256 | lm loss: 2.900389E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.772 | TFLOPs: 31.63 | +7: iteration 114220/ 173500 | consumed samples: 29240320 | consumed tokens: 59884175360 | elapsed time per iteration (s): 0.44 | learning rate: 6.792E-05 | global batch size: 256 | lm loss: 2.903786E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.730 | TFLOPs: 30.52 | +7: iteration 114230/ 173500 | consumed samples: 29242880 | consumed tokens: 59889418240 | elapsed time per iteration (s): 0.42 | learning rate: 6.790E-05 | global batch size: 256 | lm loss: 2.908267E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.257 | TFLOPs: 32.07 | +7: iteration 114240/ 173500 | consumed samples: 29245440 | consumed tokens: 59894661120 | elapsed time per iteration (s): 0.42 | learning rate: 6.789E-05 | global batch size: 256 | lm loss: 2.898058E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.864 | TFLOPs: 32.05 | +7: iteration 114250/ 173500 | consumed samples: 29248000 | consumed tokens: 59899904000 | elapsed time per iteration (s): 0.42 | learning rate: 6.787E-05 | global batch size: 256 | lm loss: 2.920203E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.007 | TFLOPs: 31.85 | +7: iteration 114260/ 173500 | consumed samples: 29250560 | consumed tokens: 59905146880 | elapsed time per iteration (s): 0.42 | learning rate: 6.786E-05 | global batch size: 256 | lm loss: 2.902499E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.060 | TFLOPs: 32.06 | +7: iteration 114270/ 173500 | consumed samples: 29253120 | consumed tokens: 59910389760 | elapsed time per iteration (s): 0.42 | learning rate: 6.784E-05 | global batch size: 256 | lm loss: 2.916694E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.673 | TFLOPs: 32.04 | +7: iteration 114280/ 173500 | consumed samples: 29255680 | consumed tokens: 59915632640 | elapsed time per iteration (s): 0.42 | learning rate: 6.783E-05 | global batch size: 256 | lm loss: 2.893418E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.317 | TFLOPs: 32.02 | +7: iteration 114290/ 173500 | consumed samples: 29258240 | consumed tokens: 59920875520 | elapsed time per iteration (s): 0.42 | learning rate: 6.782E-05 | global batch size: 256 | lm loss: 2.894535E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.505 | TFLOPs: 32.03 | +7: iteration 114300/ 173500 | consumed samples: 29260800 | consumed tokens: 59926118400 | elapsed time per iteration (s): 0.42 | learning rate: 6.780E-05 | global batch size: 256 | lm loss: 2.913495E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.059 | TFLOPs: 32.01 | +7: iteration 114310/ 173500 | consumed samples: 29263360 | consumed tokens: 59931361280 | elapsed time per iteration (s): 0.42 | learning rate: 6.779E-05 | global batch size: 256 | lm loss: 2.902484E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.132 | TFLOPs: 32.01 | +7: iteration 114320/ 173500 | consumed samples: 29265920 | consumed tokens: 59936604160 | elapsed time per iteration (s): 0.42 | learning rate: 6.777E-05 | global batch size: 256 | lm loss: 2.914320E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.435 | TFLOPs: 31.82 | +7: iteration 114330/ 173500 | consumed samples: 29268480 | consumed tokens: 59941847040 | elapsed time per iteration (s): 0.42 | learning rate: 6.776E-05 | global batch size: 256 | lm loss: 2.880424E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.106 | TFLOPs: 32.01 | +7: iteration 114340/ 173500 | consumed samples: 29271040 | consumed tokens: 59947089920 | elapsed time per iteration (s): 0.42 | learning rate: 6.774E-05 | global batch size: 256 | lm loss: 2.914888E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.124 | TFLOPs: 32.01 | +7: iteration 114350/ 173500 | consumed samples: 29273600 | consumed tokens: 59952332800 | elapsed time per iteration (s): 0.42 | learning rate: 6.773E-05 | global batch size: 256 | lm loss: 2.895307E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.870 | TFLOPs: 32.00 | +7: iteration 114360/ 173500 | consumed samples: 29276160 | consumed tokens: 59957575680 | elapsed time per iteration (s): 0.42 | learning rate: 6.771E-05 | global batch size: 256 | lm loss: 2.892870E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.164 | TFLOPs: 32.01 | +7: iteration 114370/ 173500 | consumed samples: 29278720 | consumed tokens: 59962818560 | elapsed time per iteration (s): 0.42 | learning rate: 6.770E-05 | global batch size: 256 | lm loss: 2.893393E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.908 | TFLOPs: 32.00 | +7: iteration 114380/ 173500 | consumed samples: 29281280 | consumed tokens: 59968061440 | elapsed time per iteration (s): 0.42 | learning rate: 6.768E-05 | global batch size: 256 | lm loss: 2.894585E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.659 | TFLOPs: 31.99 | +7: iteration 114390/ 173500 | consumed samples: 29283840 | consumed tokens: 59973304320 | elapsed time per iteration (s): 0.42 | learning rate: 6.767E-05 | global batch size: 256 | lm loss: 2.892189E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.096 | TFLOPs: 32.01 | +7: iteration 114400/ 173500 | consumed samples: 29286400 | consumed tokens: 59978547200 | elapsed time per iteration (s): 0.42 | learning rate: 6.766E-05 | global batch size: 256 | lm loss: 2.895963E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.641 | TFLOPs: 31.99 | +7: iteration 114410/ 173500 | consumed samples: 29288960 | consumed tokens: 59983790080 | elapsed time per iteration (s): 0.42 | learning rate: 6.764E-05 | global batch size: 256 | lm loss: 2.897960E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.635 | TFLOPs: 31.99 | +7: iteration 114420/ 173500 | consumed samples: 29291520 | consumed tokens: 59989032960 | elapsed time per iteration (s): 0.42 | learning rate: 6.763E-05 | global batch size: 256 | lm loss: 2.908666E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.679 | TFLOPs: 31.99 | +7: iteration 114430/ 173500 | consumed samples: 29294080 | consumed tokens: 59994275840 | elapsed time per iteration (s): 0.42 | learning rate: 6.761E-05 | global batch size: 256 | lm loss: 2.902948E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.435 | TFLOPs: 31.98 | +7: iteration 114440/ 173500 | consumed samples: 29296640 | consumed tokens: 59999518720 | elapsed time per iteration (s): 0.42 | learning rate: 6.760E-05 | global batch size: 256 | lm loss: 2.886474E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.154 | TFLOPs: 31.70 | +7: iteration 114450/ 173500 | consumed samples: 29299200 | consumed tokens: 60004761600 | elapsed time per iteration (s): 0.42 | learning rate: 6.758E-05 | global batch size: 256 | lm loss: 2.900718E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.067 | TFLOPs: 31.96 | +7: iteration 114460/ 173500 | consumed samples: 29301760 | consumed tokens: 60010004480 | elapsed time per iteration (s): 0.42 | learning rate: 6.757E-05 | global batch size: 256 | lm loss: 2.908147E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.172 | TFLOPs: 31.70 | +7: iteration 114470/ 173500 | consumed samples: 29304320 | consumed tokens: 60015247360 | elapsed time per iteration (s): 0.42 | learning rate: 6.755E-05 | global batch size: 256 | lm loss: 2.902588E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.616 | TFLOPs: 31.99 | +7: iteration 114480/ 173500 | consumed samples: 29306880 | consumed tokens: 60020490240 | elapsed time per iteration (s): 0.42 | learning rate: 6.754E-05 | global batch size: 256 | lm loss: 2.902694E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.825 | TFLOPs: 31.94 | +7: iteration 114490/ 173500 | consumed samples: 29309440 | consumed tokens: 60025733120 | elapsed time per iteration (s): 0.42 | learning rate: 6.753E-05 | global batch size: 256 | lm loss: 2.900293E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.651 | TFLOPs: 31.93 | +7: iteration 114500/ 173500 | consumed samples: 29312000 | consumed tokens: 60030976000 | elapsed time per iteration (s): 0.42 | learning rate: 6.751E-05 | global batch size: 256 | lm loss: 2.915213E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.276 | TFLOPs: 31.92 | +7: iteration 114510/ 173500 | consumed samples: 29314560 | consumed tokens: 60036218880 | elapsed time per iteration (s): 0.42 | learning rate: 6.750E-05 | global batch size: 256 | lm loss: 2.903478E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.286 | TFLOPs: 31.92 | +7: iteration 114520/ 173500 | consumed samples: 29317120 | consumed tokens: 60041461760 | elapsed time per iteration (s): 0.42 | learning rate: 6.748E-05 | global batch size: 256 | lm loss: 2.907017E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.740 | TFLOPs: 31.68 | +7: iteration 114530/ 173500 | consumed samples: 29319680 | consumed tokens: 60046704640 | elapsed time per iteration (s): 0.42 | learning rate: 6.747E-05 | global batch size: 256 | lm loss: 2.899139E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.464 | TFLOPs: 31.72 | +7: iteration 114540/ 173500 | consumed samples: 29322240 | consumed tokens: 60051947520 | elapsed time per iteration (s): 0.42 | learning rate: 6.745E-05 | global batch size: 256 | lm loss: 2.898171E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.058 | TFLOPs: 31.75 | +7: iteration 114550/ 173500 | consumed samples: 29324800 | consumed tokens: 60057190400 | elapsed time per iteration (s): 0.42 | learning rate: 6.744E-05 | global batch size: 256 | lm loss: 2.893956E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.906 | TFLOPs: 31.95 | +7: iteration 114560/ 173500 | consumed samples: 29327360 | consumed tokens: 60062433280 | elapsed time per iteration (s): 0.42 | learning rate: 6.742E-05 | global batch size: 256 | lm loss: 2.903135E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.658 | TFLOPs: 31.94 | +7: iteration 114570/ 173500 | consumed samples: 29329920 | consumed tokens: 60067676160 | elapsed time per iteration (s): 0.42 | learning rate: 6.741E-05 | global batch size: 256 | lm loss: 2.892246E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.845 | TFLOPs: 31.95 | +7: iteration 114580/ 173500 | consumed samples: 29332480 | consumed tokens: 60072919040 | elapsed time per iteration (s): 0.42 | learning rate: 6.739E-05 | global batch size: 256 | lm loss: 2.893936E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.752 | TFLOPs: 31.94 | +7: iteration 114590/ 173500 | consumed samples: 29335040 | consumed tokens: 60078161920 | elapsed time per iteration (s): 0.42 | learning rate: 6.738E-05 | global batch size: 256 | lm loss: 2.908607E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.573 | TFLOPs: 31.93 | +7: iteration 114600/ 173500 | consumed samples: 29337600 | consumed tokens: 60083404800 | elapsed time per iteration (s): 0.42 | learning rate: 6.737E-05 | global batch size: 256 | lm loss: 2.913985E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.838 | TFLOPs: 31.94 | +7: iteration 114610/ 173500 | consumed samples: 29340160 | consumed tokens: 60088647680 | elapsed time per iteration (s): 0.42 | learning rate: 6.735E-05 | global batch size: 256 | lm loss: 2.915942E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.009 | TFLOPs: 31.95 | +7: iteration 114620/ 173500 | consumed samples: 29342720 | consumed tokens: 60093890560 | elapsed time per iteration (s): 0.42 | learning rate: 6.734E-05 | global batch size: 256 | lm loss: 2.897306E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.303 | TFLOPs: 31.76 | +7: iteration 114630/ 173500 | consumed samples: 29345280 | consumed tokens: 60099133440 | elapsed time per iteration (s): 0.42 | learning rate: 6.732E-05 | global batch size: 256 | lm loss: 2.901584E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.079 | TFLOPs: 31.90 | +7: iteration 114640/ 173500 | consumed samples: 29347840 | consumed tokens: 60104376320 | elapsed time per iteration (s): 0.42 | learning rate: 6.731E-05 | global batch size: 256 | lm loss: 2.887248E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.308 | TFLOPs: 31.92 | +7: iteration 114650/ 173500 | consumed samples: 29350400 | consumed tokens: 60109619200 | elapsed time per iteration (s): 0.42 | learning rate: 6.729E-05 | global batch size: 256 | lm loss: 2.897909E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.460 | TFLOPs: 31.92 | +7: iteration 114660/ 173500 | consumed samples: 29352960 | consumed tokens: 60114862080 | elapsed time per iteration (s): 0.42 | learning rate: 6.728E-05 | global batch size: 256 | lm loss: 2.907133E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.177 | TFLOPs: 31.91 | +7: iteration 114670/ 173500 | consumed samples: 29355520 | consumed tokens: 60120104960 | elapsed time per iteration (s): 0.42 | learning rate: 6.726E-05 | global batch size: 256 | lm loss: 2.901346E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.504 | TFLOPs: 31.93 | +7: iteration 114680/ 173500 | consumed samples: 29358080 | consumed tokens: 60125347840 | elapsed time per iteration (s): 0.42 | learning rate: 6.725E-05 | global batch size: 256 | lm loss: 2.897519E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.424 | TFLOPs: 31.92 | +7: iteration 114690/ 173500 | consumed samples: 29360640 | consumed tokens: 60130590720 | elapsed time per iteration (s): 0.42 | learning rate: 6.724E-05 | global batch size: 256 | lm loss: 2.917815E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.056 | TFLOPs: 31.90 | +7: iteration 114700/ 173500 | consumed samples: 29363200 | consumed tokens: 60135833600 | elapsed time per iteration (s): 0.43 | learning rate: 6.722E-05 | global batch size: 256 | lm loss: 2.889372E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.883 | TFLOPs: 31.06 | +7: iteration 114710/ 173500 | consumed samples: 29365760 | consumed tokens: 60141076480 | elapsed time per iteration (s): 0.42 | learning rate: 6.721E-05 | global batch size: 256 | lm loss: 2.890658E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.455 | TFLOPs: 31.98 | +7: iteration 114720/ 173500 | consumed samples: 29368320 | consumed tokens: 60146319360 | elapsed time per iteration (s): 0.42 | learning rate: 6.719E-05 | global batch size: 256 | lm loss: 2.909509E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.617 | TFLOPs: 31.67 | +7: iteration 114730/ 173500 | consumed samples: 29370880 | consumed tokens: 60151562240 | elapsed time per iteration (s): 0.42 | learning rate: 6.718E-05 | global batch size: 256 | lm loss: 2.899701E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.171 | TFLOPs: 31.96 | +7: iteration 114740/ 173500 | consumed samples: 29373440 | consumed tokens: 60156805120 | elapsed time per iteration (s): 0.42 | learning rate: 6.716E-05 | global batch size: 256 | lm loss: 2.908891E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.526 | TFLOPs: 31.93 | +7: iteration 114750/ 173500 | consumed samples: 29376000 | consumed tokens: 60162048000 | elapsed time per iteration (s): 0.42 | learning rate: 6.715E-05 | global batch size: 256 | lm loss: 2.903154E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.907 | TFLOPs: 31.95 | +7: iteration 114760/ 173500 | consumed samples: 29378560 | consumed tokens: 60167290880 | elapsed time per iteration (s): 0.42 | learning rate: 6.713E-05 | global batch size: 256 | lm loss: 2.893937E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.338 | TFLOPs: 31.92 | +7: iteration 114770/ 173500 | consumed samples: 29381120 | consumed tokens: 60172533760 | elapsed time per iteration (s): 0.42 | learning rate: 6.712E-05 | global batch size: 256 | lm loss: 2.894030E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.478 | TFLOPs: 31.93 | +7: iteration 114780/ 173500 | consumed samples: 29383680 | consumed tokens: 60177776640 | elapsed time per iteration (s): 0.42 | learning rate: 6.710E-05 | global batch size: 256 | lm loss: 2.908361E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.247 | TFLOPs: 31.91 | +7: iteration 114790/ 173500 | consumed samples: 29386240 | consumed tokens: 60183019520 | elapsed time per iteration (s): 0.42 | learning rate: 6.709E-05 | global batch size: 256 | lm loss: 2.893501E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.276 | TFLOPs: 31.92 | +7: iteration 114800/ 173500 | consumed samples: 29388800 | consumed tokens: 60188262400 | elapsed time per iteration (s): 0.42 | learning rate: 6.708E-05 | global batch size: 256 | lm loss: 2.899132E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.133 | TFLOPs: 31.80 | +7: iteration 114810/ 173500 | consumed samples: 29391360 | consumed tokens: 60193505280 | elapsed time per iteration (s): 0.42 | learning rate: 6.706E-05 | global batch size: 256 | lm loss: 2.932025E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.874 | TFLOPs: 31.89 | +7: iteration 114820/ 173500 | consumed samples: 29393920 | consumed tokens: 60198748160 | elapsed time per iteration (s): 0.42 | learning rate: 6.705E-05 | global batch size: 256 | lm loss: 2.899859E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.002 | TFLOPs: 31.90 | +7: iteration 114830/ 173500 | consumed samples: 29396480 | consumed tokens: 60203991040 | elapsed time per iteration (s): 0.42 | learning rate: 6.703E-05 | global batch size: 256 | lm loss: 2.889537E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.830 | TFLOPs: 31.73 | +7: iteration 114840/ 173500 | consumed samples: 29399040 | consumed tokens: 60209233920 | elapsed time per iteration (s): 0.42 | learning rate: 6.702E-05 | global batch size: 256 | lm loss: 2.898771E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.651 | TFLOPs: 31.93 | +7: iteration 114850/ 173500 | consumed samples: 29401600 | consumed tokens: 60214476800 | elapsed time per iteration (s): 0.44 | learning rate: 6.700E-05 | global batch size: 256 | lm loss: 2.906252E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.005 | TFLOPs: 30.69 | +7: iteration 114860/ 173500 | consumed samples: 29404160 | consumed tokens: 60219719680 | elapsed time per iteration (s): 0.48 | learning rate: 6.699E-05 | global batch size: 256 | lm loss: 2.889829E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.223 | TFLOPs: 27.72 | +7: iteration 114870/ 173500 | consumed samples: 29406720 | consumed tokens: 60224962560 | elapsed time per iteration (s): 0.42 | learning rate: 6.697E-05 | global batch size: 256 | lm loss: 2.893655E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.098 | TFLOPs: 32.06 | +7: iteration 114880/ 173500 | consumed samples: 29409280 | consumed tokens: 60230205440 | elapsed time per iteration (s): 0.42 | learning rate: 6.696E-05 | global batch size: 256 | lm loss: 2.902539E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.866 | TFLOPs: 32.00 | +7: iteration 114890/ 173500 | consumed samples: 29411840 | consumed tokens: 60235448320 | elapsed time per iteration (s): 0.42 | learning rate: 6.695E-05 | global batch size: 256 | lm loss: 2.898893E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.704 | TFLOPs: 31.94 | +7: iteration 114900/ 173500 | consumed samples: 29414400 | consumed tokens: 60240691200 | elapsed time per iteration (s): 0.42 | learning rate: 6.693E-05 | global batch size: 256 | lm loss: 2.903702E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.944 | TFLOPs: 31.74 | +7: iteration 114910/ 173500 | consumed samples: 29416960 | consumed tokens: 60245934080 | elapsed time per iteration (s): 0.42 | learning rate: 6.692E-05 | global batch size: 256 | lm loss: 2.896907E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.858 | TFLOPs: 31.95 | +7: iteration 114920/ 173500 | consumed samples: 29419520 | consumed tokens: 60251176960 | elapsed time per iteration (s): 0.42 | learning rate: 6.690E-05 | global batch size: 256 | lm loss: 2.921743E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.516 | TFLOPs: 31.98 | +7: iteration 114930/ 173500 | consumed samples: 29422080 | consumed tokens: 60256419840 | elapsed time per iteration (s): 0.42 | learning rate: 6.689E-05 | global batch size: 256 | lm loss: 2.905064E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.614 | TFLOPs: 31.99 | +7: iteration 114940/ 173500 | consumed samples: 29424640 | consumed tokens: 60261662720 | elapsed time per iteration (s): 0.42 | learning rate: 6.687E-05 | global batch size: 256 | lm loss: 2.900700E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.011 | TFLOPs: 31.95 | +7: iteration 114950/ 173500 | consumed samples: 29427200 | consumed tokens: 60266905600 | elapsed time per iteration (s): 0.42 | learning rate: 6.686E-05 | global batch size: 256 | lm loss: 2.915722E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.876 | TFLOPs: 31.74 | +7: iteration 114960/ 173500 | consumed samples: 29429760 | consumed tokens: 60272148480 | elapsed time per iteration (s): 0.42 | learning rate: 6.684E-05 | global batch size: 256 | lm loss: 2.898017E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.303 | TFLOPs: 31.97 | +7: iteration 114970/ 173500 | consumed samples: 29432320 | consumed tokens: 60277391360 | elapsed time per iteration (s): 0.42 | learning rate: 6.683E-05 | global batch size: 256 | lm loss: 2.899833E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.816 | TFLOPs: 31.89 | +7: iteration 114980/ 173500 | consumed samples: 29434880 | consumed tokens: 60282634240 | elapsed time per iteration (s): 0.42 | learning rate: 6.682E-05 | global batch size: 256 | lm loss: 2.901982E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.985 | TFLOPs: 31.95 | +7: iteration 114990/ 173500 | consumed samples: 29437440 | consumed tokens: 60287877120 | elapsed time per iteration (s): 0.42 | learning rate: 6.680E-05 | global batch size: 256 | lm loss: 2.906870E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.992 | TFLOPs: 31.95 | +7: iteration 115000/ 173500 | consumed samples: 29440000 | consumed tokens: 60293120000 | elapsed time per iteration (s): 0.42 | learning rate: 6.679E-05 | global batch size: 256 | lm loss: 2.891368E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.756 | TFLOPs: 31.94 | +7: iteration 115010/ 173500 | consumed samples: 29442560 | consumed tokens: 60298362880 | elapsed time per iteration (s): 0.42 | learning rate: 6.677E-05 | global batch size: 256 | lm loss: 2.896581E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.180 | TFLOPs: 31.75 | +7: iteration 115020/ 173500 | consumed samples: 29445120 | consumed tokens: 60303605760 | elapsed time per iteration (s): 0.42 | learning rate: 6.676E-05 | global batch size: 256 | lm loss: 2.910570E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.757 | TFLOPs: 31.94 | +7: iteration 115030/ 173500 | consumed samples: 29447680 | consumed tokens: 60308848640 | elapsed time per iteration (s): 0.42 | learning rate: 6.674E-05 | global batch size: 256 | lm loss: 2.891684E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.669 | TFLOPs: 31.94 | +7: iteration 115040/ 173500 | consumed samples: 29450240 | consumed tokens: 60314091520 | elapsed time per iteration (s): 0.42 | learning rate: 6.673E-05 | global batch size: 256 | lm loss: 2.903014E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.762 | TFLOPs: 31.94 | +7: iteration 115050/ 173500 | consumed samples: 29452800 | consumed tokens: 60319334400 | elapsed time per iteration (s): 0.42 | learning rate: 6.671E-05 | global batch size: 256 | lm loss: 2.885203E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.000 | TFLOPs: 31.95 | +7: iteration 115060/ 173500 | consumed samples: 29455360 | consumed tokens: 60324577280 | elapsed time per iteration (s): 0.42 | learning rate: 6.670E-05 | global batch size: 256 | lm loss: 2.900079E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.620 | TFLOPs: 31.67 | +7: iteration 115070/ 173500 | consumed samples: 29457920 | consumed tokens: 60329820160 | elapsed time per iteration (s): 0.42 | learning rate: 6.669E-05 | global batch size: 256 | lm loss: 2.893401E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.827 | TFLOPs: 31.94 | +7: iteration 115080/ 173500 | consumed samples: 29460480 | consumed tokens: 60335063040 | elapsed time per iteration (s): 0.42 | learning rate: 6.667E-05 | global batch size: 256 | lm loss: 2.892998E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.515 | TFLOPs: 31.93 | +7: iteration 115090/ 173500 | consumed samples: 29463040 | consumed tokens: 60340305920 | elapsed time per iteration (s): 0.42 | learning rate: 6.666E-05 | global batch size: 256 | lm loss: 2.909433E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.004 | TFLOPs: 31.95 | +7: iteration 115100/ 173500 | consumed samples: 29465600 | consumed tokens: 60345548800 | elapsed time per iteration (s): 0.42 | learning rate: 6.664E-05 | global batch size: 256 | lm loss: 2.890156E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.326 | TFLOPs: 31.92 | +7: iteration 115110/ 173500 | consumed samples: 29468160 | consumed tokens: 60350791680 | elapsed time per iteration (s): 0.42 | learning rate: 6.663E-05 | global batch size: 256 | lm loss: 2.919936E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.695 | TFLOPs: 31.94 | +7: iteration 115120/ 173500 | consumed samples: 29470720 | consumed tokens: 60356034560 | elapsed time per iteration (s): 0.42 | learning rate: 6.661E-05 | global batch size: 256 | lm loss: 2.895233E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.235 | TFLOPs: 31.91 | +7: iteration 115130/ 173500 | consumed samples: 29473280 | consumed tokens: 60361277440 | elapsed time per iteration (s): 0.42 | learning rate: 6.660E-05 | global batch size: 256 | lm loss: 2.878434E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.376 | TFLOPs: 31.92 | +7: iteration 115140/ 173500 | consumed samples: 29475840 | consumed tokens: 60366520320 | elapsed time per iteration (s): 0.42 | learning rate: 6.658E-05 | global batch size: 256 | lm loss: 2.905170E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.304 | TFLOPs: 31.92 | +7: iteration 115150/ 173500 | consumed samples: 29478400 | consumed tokens: 60371763200 | elapsed time per iteration (s): 0.42 | learning rate: 6.657E-05 | global batch size: 256 | lm loss: 2.911667E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.289 | TFLOPs: 31.92 | +7: iteration 115160/ 173500 | consumed samples: 29480960 | consumed tokens: 60377006080 | elapsed time per iteration (s): 0.42 | learning rate: 6.656E-05 | global batch size: 256 | lm loss: 2.893466E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.159 | TFLOPs: 31.91 | +7: iteration 115170/ 173500 | consumed samples: 29483520 | consumed tokens: 60382248960 | elapsed time per iteration (s): 0.42 | learning rate: 6.654E-05 | global batch size: 256 | lm loss: 2.904852E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.240 | TFLOPs: 31.91 | +7: iteration 115180/ 173500 | consumed samples: 29486080 | consumed tokens: 60387491840 | elapsed time per iteration (s): 0.42 | learning rate: 6.653E-05 | global batch size: 256 | lm loss: 2.887522E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.265 | TFLOPs: 31.91 | +7: iteration 115190/ 173500 | consumed samples: 29488640 | consumed tokens: 60392734720 | elapsed time per iteration (s): 0.42 | learning rate: 6.651E-05 | global batch size: 256 | lm loss: 2.903167E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.108 | TFLOPs: 31.91 | +7: iteration 115200/ 173500 | consumed samples: 29491200 | consumed tokens: 60397977600 | elapsed time per iteration (s): 0.42 | learning rate: 6.650E-05 | global batch size: 256 | lm loss: 2.899953E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.021 | TFLOPs: 31.90 | +7: iteration 115210/ 173500 | consumed samples: 29493760 | consumed tokens: 60403220480 | elapsed time per iteration (s): 0.42 | learning rate: 6.648E-05 | global batch size: 256 | lm loss: 2.901735E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.333 | TFLOPs: 31.92 | +7: iteration 115220/ 173500 | consumed samples: 29496320 | consumed tokens: 60408463360 | elapsed time per iteration (s): 0.42 | learning rate: 6.647E-05 | global batch size: 256 | lm loss: 2.911884E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.885 | TFLOPs: 31.89 | +7: iteration 115230/ 173500 | consumed samples: 29498880 | consumed tokens: 60413706240 | elapsed time per iteration (s): 0.42 | learning rate: 6.646E-05 | global batch size: 256 | lm loss: 2.896356E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.790 | TFLOPs: 31.89 | +7: iteration 115240/ 173500 | consumed samples: 29501440 | consumed tokens: 60418949120 | elapsed time per iteration (s): 0.42 | learning rate: 6.644E-05 | global batch size: 256 | lm loss: 2.901222E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.858 | TFLOPs: 31.89 | +7: iteration 115250/ 173500 | consumed samples: 29504000 | consumed tokens: 60424192000 | elapsed time per iteration (s): 0.42 | learning rate: 6.643E-05 | global batch size: 256 | lm loss: 2.899226E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.533 | TFLOPs: 31.93 | +7: iteration 115260/ 173500 | consumed samples: 29506560 | consumed tokens: 60429434880 | elapsed time per iteration (s): 0.42 | learning rate: 6.641E-05 | global batch size: 256 | lm loss: 2.900428E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.117 | TFLOPs: 31.91 | +7: iteration 115270/ 173500 | consumed samples: 29509120 | consumed tokens: 60434677760 | elapsed time per iteration (s): 0.42 | learning rate: 6.640E-05 | global batch size: 256 | lm loss: 2.875681E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.132 | TFLOPs: 31.91 | +7: iteration 115280/ 173500 | consumed samples: 29511680 | consumed tokens: 60439920640 | elapsed time per iteration (s): 0.42 | learning rate: 6.638E-05 | global batch size: 256 | lm loss: 2.893058E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.491 | TFLOPs: 31.93 | +7: iteration 115290/ 173500 | consumed samples: 29514240 | consumed tokens: 60445163520 | elapsed time per iteration (s): 0.42 | learning rate: 6.637E-05 | global batch size: 256 | lm loss: 2.908463E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.499 | TFLOPs: 31.93 | +7: iteration 115300/ 173500 | consumed samples: 29516800 | consumed tokens: 60450406400 | elapsed time per iteration (s): 0.42 | learning rate: 6.635E-05 | global batch size: 256 | lm loss: 2.903556E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.411 | TFLOPs: 31.92 | +7: iteration 115310/ 173500 | consumed samples: 29519360 | consumed tokens: 60455649280 | elapsed time per iteration (s): 0.42 | learning rate: 6.634E-05 | global batch size: 256 | lm loss: 2.907742E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.181 | TFLOPs: 31.91 | +7: iteration 115320/ 173500 | consumed samples: 29521920 | consumed tokens: 60460892160 | elapsed time per iteration (s): 0.42 | learning rate: 6.633E-05 | global batch size: 256 | lm loss: 2.893472E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.390 | TFLOPs: 31.92 | +7: iteration 115330/ 173500 | consumed samples: 29524480 | consumed tokens: 60466135040 | elapsed time per iteration (s): 0.42 | learning rate: 6.631E-05 | global batch size: 256 | lm loss: 2.907999E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.054 | TFLOPs: 31.90 | +7: iteration 115340/ 173500 | consumed samples: 29527040 | consumed tokens: 60471377920 | elapsed time per iteration (s): 0.42 | learning rate: 6.630E-05 | global batch size: 256 | lm loss: 2.895898E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.690 | TFLOPs: 31.94 | +7: iteration 115350/ 173500 | consumed samples: 29529600 | consumed tokens: 60476620800 | elapsed time per iteration (s): 0.42 | learning rate: 6.628E-05 | global batch size: 256 | lm loss: 2.886553E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.666 | TFLOPs: 31.94 | +7: iteration 115360/ 173500 | consumed samples: 29532160 | consumed tokens: 60481863680 | elapsed time per iteration (s): 0.42 | learning rate: 6.627E-05 | global batch size: 256 | lm loss: 2.890768E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.337 | TFLOPs: 31.92 | +7: iteration 115370/ 173500 | consumed samples: 29534720 | consumed tokens: 60487106560 | elapsed time per iteration (s): 0.42 | learning rate: 6.625E-05 | global batch size: 256 | lm loss: 2.896831E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.252 | TFLOPs: 31.91 | +7: iteration 115380/ 173500 | consumed samples: 29537280 | consumed tokens: 60492349440 | elapsed time per iteration (s): 0.42 | learning rate: 6.624E-05 | global batch size: 256 | lm loss: 2.906842E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.459 | TFLOPs: 31.92 | +7: iteration 115390/ 173500 | consumed samples: 29539840 | consumed tokens: 60497592320 | elapsed time per iteration (s): 0.42 | learning rate: 6.622E-05 | global batch size: 256 | lm loss: 2.900643E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.450 | TFLOPs: 31.92 | +7: iteration 115400/ 173500 | consumed samples: 29542400 | consumed tokens: 60502835200 | elapsed time per iteration (s): 0.42 | learning rate: 6.621E-05 | global batch size: 256 | lm loss: 2.920005E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.159 | TFLOPs: 31.91 | +7: iteration 115410/ 173500 | consumed samples: 29544960 | consumed tokens: 60508078080 | elapsed time per iteration (s): 0.42 | learning rate: 6.620E-05 | global batch size: 256 | lm loss: 2.896509E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.097 | TFLOPs: 31.91 | +7: iteration 115420/ 173500 | consumed samples: 29547520 | consumed tokens: 60513320960 | elapsed time per iteration (s): 0.42 | learning rate: 6.618E-05 | global batch size: 256 | lm loss: 2.896858E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.025 | TFLOPs: 31.90 | +7: iteration 115430/ 173500 | consumed samples: 29550080 | consumed tokens: 60518563840 | elapsed time per iteration (s): 0.42 | learning rate: 6.617E-05 | global batch size: 256 | lm loss: 2.907883E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.943 | TFLOPs: 31.90 | +7: iteration 115440/ 173500 | consumed samples: 29552640 | consumed tokens: 60523806720 | elapsed time per iteration (s): 0.42 | learning rate: 6.615E-05 | global batch size: 256 | lm loss: 2.904960E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.892 | TFLOPs: 31.90 | +7: iteration 115450/ 173500 | consumed samples: 29555200 | consumed tokens: 60529049600 | elapsed time per iteration (s): 0.42 | learning rate: 6.614E-05 | global batch size: 256 | lm loss: 2.896785E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.259 | TFLOPs: 31.91 | +7: iteration 115460/ 173500 | consumed samples: 29557760 | consumed tokens: 60534292480 | elapsed time per iteration (s): 0.42 | learning rate: 6.612E-05 | global batch size: 256 | lm loss: 2.899751E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.917 | TFLOPs: 31.90 | +7: iteration 115470/ 173500 | consumed samples: 29560320 | consumed tokens: 60539535360 | elapsed time per iteration (s): 0.42 | learning rate: 6.611E-05 | global batch size: 256 | lm loss: 2.911031E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.119 | TFLOPs: 31.91 | +7: iteration 115480/ 173500 | consumed samples: 29562880 | consumed tokens: 60544778240 | elapsed time per iteration (s): 0.42 | learning rate: 6.610E-05 | global batch size: 256 | lm loss: 2.910178E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.721 | TFLOPs: 31.89 | +7: iteration 115490/ 173500 | consumed samples: 29565440 | consumed tokens: 60550021120 | elapsed time per iteration (s): 0.42 | learning rate: 6.608E-05 | global batch size: 256 | lm loss: 2.895691E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.454 | TFLOPs: 31.87 | +7: iteration 115500/ 173500 | consumed samples: 29568000 | consumed tokens: 60555264000 | elapsed time per iteration (s): 0.42 | learning rate: 6.607E-05 | global batch size: 256 | lm loss: 2.907062E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.264 | TFLOPs: 31.86 | +7: iteration 115510/ 173500 | consumed samples: 29570560 | consumed tokens: 60560506880 | elapsed time per iteration (s): 0.42 | learning rate: 6.605E-05 | global batch size: 256 | lm loss: 2.895131E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.041 | TFLOPs: 31.90 | +7: iteration 115520/ 173500 | consumed samples: 29573120 | consumed tokens: 60565749760 | elapsed time per iteration (s): 0.42 | learning rate: 6.604E-05 | global batch size: 256 | lm loss: 2.904300E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.037 | TFLOPs: 31.90 | +7: iteration 115530/ 173500 | consumed samples: 29575680 | consumed tokens: 60570992640 | elapsed time per iteration (s): 0.42 | learning rate: 6.602E-05 | global batch size: 256 | lm loss: 2.895073E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.120 | TFLOPs: 31.91 | +7: iteration 115540/ 173500 | consumed samples: 29578240 | consumed tokens: 60576235520 | elapsed time per iteration (s): 0.42 | learning rate: 6.601E-05 | global batch size: 256 | lm loss: 2.898460E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.512 | TFLOPs: 31.93 | +7: iteration 115550/ 173500 | consumed samples: 29580800 | consumed tokens: 60581478400 | elapsed time per iteration (s): 0.42 | learning rate: 6.599E-05 | global batch size: 256 | lm loss: 2.891958E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.107 | TFLOPs: 31.91 | +7: iteration 115560/ 173500 | consumed samples: 29583360 | consumed tokens: 60586721280 | elapsed time per iteration (s): 0.42 | learning rate: 6.598E-05 | global batch size: 256 | lm loss: 2.912947E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.066 | TFLOPs: 31.90 | +7: iteration 115570/ 173500 | consumed samples: 29585920 | consumed tokens: 60591964160 | elapsed time per iteration (s): 0.42 | learning rate: 6.597E-05 | global batch size: 256 | lm loss: 2.883949E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.951 | TFLOPs: 31.90 | +7: iteration 115580/ 173500 | consumed samples: 29588480 | consumed tokens: 60597207040 | elapsed time per iteration (s): 0.42 | learning rate: 6.595E-05 | global batch size: 256 | lm loss: 2.897786E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.127 | TFLOPs: 31.91 | +7: iteration 115590/ 173500 | consumed samples: 29591040 | consumed tokens: 60602449920 | elapsed time per iteration (s): 0.42 | learning rate: 6.594E-05 | global batch size: 256 | lm loss: 2.916904E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.496 | TFLOPs: 31.93 | +7: iteration 115600/ 173500 | consumed samples: 29593600 | consumed tokens: 60607692800 | elapsed time per iteration (s): 0.43 | learning rate: 6.592E-05 | global batch size: 256 | lm loss: 2.902511E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.773 | TFLOPs: 31.15 | +7: iteration 115610/ 173500 | consumed samples: 29596160 | consumed tokens: 60612935680 | elapsed time per iteration (s): 0.42 | learning rate: 6.591E-05 | global batch size: 256 | lm loss: 2.907332E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.340 | TFLOPs: 31.92 | +7: iteration 115620/ 173500 | consumed samples: 29598720 | consumed tokens: 60618178560 | elapsed time per iteration (s): 0.42 | learning rate: 6.589E-05 | global batch size: 256 | lm loss: 2.896195E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.442 | TFLOPs: 31.92 | +7: iteration 115630/ 173500 | consumed samples: 29601280 | consumed tokens: 60623421440 | elapsed time per iteration (s): 0.42 | learning rate: 6.588E-05 | global batch size: 256 | lm loss: 2.906765E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.476 | TFLOPs: 31.93 | +7: iteration 115640/ 173500 | consumed samples: 29603840 | consumed tokens: 60628664320 | elapsed time per iteration (s): 0.42 | learning rate: 6.587E-05 | global batch size: 256 | lm loss: 2.886326E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.111 | TFLOPs: 31.91 | +7: iteration 115650/ 173500 | consumed samples: 29606400 | consumed tokens: 60633907200 | elapsed time per iteration (s): 0.42 | learning rate: 6.585E-05 | global batch size: 256 | lm loss: 2.907176E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.048 | TFLOPs: 31.90 | +7: iteration 115660/ 173500 | consumed samples: 29608960 | consumed tokens: 60639150080 | elapsed time per iteration (s): 0.42 | learning rate: 6.584E-05 | global batch size: 256 | lm loss: 2.910072E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.089 | TFLOPs: 31.91 | +7: iteration 115670/ 173500 | consumed samples: 29611520 | consumed tokens: 60644392960 | elapsed time per iteration (s): 0.42 | learning rate: 6.582E-05 | global batch size: 256 | lm loss: 2.889499E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.328 | TFLOPs: 31.92 | +7: iteration 115680/ 173500 | consumed samples: 29614080 | consumed tokens: 60649635840 | elapsed time per iteration (s): 0.43 | learning rate: 6.581E-05 | global batch size: 256 | lm loss: 2.898113E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.430 | TFLOPs: 31.35 | +7: iteration 115690/ 173500 | consumed samples: 29616640 | consumed tokens: 60654878720 | elapsed time per iteration (s): 0.42 | learning rate: 6.579E-05 | global batch size: 256 | lm loss: 2.903558E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.139 | TFLOPs: 31.96 | +7: iteration 115700/ 173500 | consumed samples: 29619200 | consumed tokens: 60660121600 | elapsed time per iteration (s): 0.42 | learning rate: 6.578E-05 | global batch size: 256 | lm loss: 2.900692E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.274 | TFLOPs: 31.92 | +7: iteration 115710/ 173500 | consumed samples: 29621760 | consumed tokens: 60665364480 | elapsed time per iteration (s): 0.42 | learning rate: 6.577E-05 | global batch size: 256 | lm loss: 2.896465E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.430 | TFLOPs: 31.92 | +7: iteration 115720/ 173500 | consumed samples: 29624320 | consumed tokens: 60670607360 | elapsed time per iteration (s): 0.42 | learning rate: 6.575E-05 | global batch size: 256 | lm loss: 2.894038E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.701 | TFLOPs: 31.62 | +7: iteration 115730/ 173500 | consumed samples: 29626880 | consumed tokens: 60675850240 | elapsed time per iteration (s): 0.42 | learning rate: 6.574E-05 | global batch size: 256 | lm loss: 2.900018E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.854 | TFLOPs: 31.95 | +7: iteration 115740/ 173500 | consumed samples: 29629440 | consumed tokens: 60681093120 | elapsed time per iteration (s): 0.42 | learning rate: 6.572E-05 | global batch size: 256 | lm loss: 2.896030E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.013 | TFLOPs: 31.90 | +7: iteration 115750/ 173500 | consumed samples: 29632000 | consumed tokens: 60686336000 | elapsed time per iteration (s): 0.42 | learning rate: 6.571E-05 | global batch size: 256 | lm loss: 2.907586E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.583 | TFLOPs: 31.93 | +7: iteration 115760/ 173500 | consumed samples: 29634560 | consumed tokens: 60691578880 | elapsed time per iteration (s): 0.42 | learning rate: 6.569E-05 | global batch size: 256 | lm loss: 2.894142E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.015 | TFLOPs: 31.90 | +7: iteration 115770/ 173500 | consumed samples: 29637120 | consumed tokens: 60696821760 | elapsed time per iteration (s): 0.42 | learning rate: 6.568E-05 | global batch size: 256 | lm loss: 2.896739E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.251 | TFLOPs: 31.91 | +7: iteration 115780/ 173500 | consumed samples: 29639680 | consumed tokens: 60702064640 | elapsed time per iteration (s): 0.42 | learning rate: 6.567E-05 | global batch size: 256 | lm loss: 2.895872E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.201 | TFLOPs: 31.91 | +7: iteration 115790/ 173500 | consumed samples: 29642240 | consumed tokens: 60707307520 | elapsed time per iteration (s): 0.42 | learning rate: 6.565E-05 | global batch size: 256 | lm loss: 2.898409E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.837 | TFLOPs: 31.89 | +7: iteration 115800/ 173500 | consumed samples: 29644800 | consumed tokens: 60712550400 | elapsed time per iteration (s): 0.42 | learning rate: 6.564E-05 | global batch size: 256 | lm loss: 2.893822E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.006 | TFLOPs: 31.90 | +7: iteration 115810/ 173500 | consumed samples: 29647360 | consumed tokens: 60717793280 | elapsed time per iteration (s): 0.42 | learning rate: 6.562E-05 | global batch size: 256 | lm loss: 2.908499E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.557 | TFLOPs: 31.93 | +7: iteration 115820/ 173500 | consumed samples: 29649920 | consumed tokens: 60723036160 | elapsed time per iteration (s): 0.42 | learning rate: 6.561E-05 | global batch size: 256 | lm loss: 2.892182E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.317 | TFLOPs: 31.92 | +7: iteration 115830/ 173500 | consumed samples: 29652480 | consumed tokens: 60728279040 | elapsed time per iteration (s): 0.42 | learning rate: 6.559E-05 | global batch size: 256 | lm loss: 2.905626E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.533 | TFLOPs: 31.93 | +7: iteration 115840/ 173500 | consumed samples: 29655040 | consumed tokens: 60733521920 | elapsed time per iteration (s): 0.42 | learning rate: 6.558E-05 | global batch size: 256 | lm loss: 2.891400E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.229 | TFLOPs: 31.91 | +7: iteration 115850/ 173500 | consumed samples: 29657600 | consumed tokens: 60738764800 | elapsed time per iteration (s): 0.42 | learning rate: 6.556E-05 | global batch size: 256 | lm loss: 2.907958E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.013 | TFLOPs: 31.90 | +7: iteration 115860/ 173500 | consumed samples: 29660160 | consumed tokens: 60744007680 | elapsed time per iteration (s): 0.42 | learning rate: 6.555E-05 | global batch size: 256 | lm loss: 2.901675E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.947 | TFLOPs: 31.90 | +7: iteration 115870/ 173500 | consumed samples: 29662720 | consumed tokens: 60749250560 | elapsed time per iteration (s): 0.42 | learning rate: 6.554E-05 | global batch size: 256 | lm loss: 2.898108E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.832 | TFLOPs: 31.89 | +7: iteration 115880/ 173500 | consumed samples: 29665280 | consumed tokens: 60754493440 | elapsed time per iteration (s): 0.42 | learning rate: 6.552E-05 | global batch size: 256 | lm loss: 2.898080E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.074 | TFLOPs: 31.90 | +7: iteration 115890/ 173500 | consumed samples: 29667840 | consumed tokens: 60759736320 | elapsed time per iteration (s): 0.42 | learning rate: 6.551E-05 | global batch size: 256 | lm loss: 2.900303E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.912 | TFLOPs: 31.90 | +7: iteration 115900/ 173500 | consumed samples: 29670400 | consumed tokens: 60764979200 | elapsed time per iteration (s): 0.42 | learning rate: 6.549E-05 | global batch size: 256 | lm loss: 2.893797E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.275 | TFLOPs: 31.92 | +7: iteration 115910/ 173500 | consumed samples: 29672960 | consumed tokens: 60770222080 | elapsed time per iteration (s): 0.42 | learning rate: 6.548E-05 | global batch size: 256 | lm loss: 2.888345E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.309 | TFLOPs: 31.92 | +7: iteration 115920/ 173500 | consumed samples: 29675520 | consumed tokens: 60775464960 | elapsed time per iteration (s): 0.42 | learning rate: 6.546E-05 | global batch size: 256 | lm loss: 2.882504E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.825 | TFLOPs: 31.89 | +7: iteration 115930/ 173500 | consumed samples: 29678080 | consumed tokens: 60780707840 | elapsed time per iteration (s): 0.42 | learning rate: 6.545E-05 | global batch size: 256 | lm loss: 2.903723E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.828 | TFLOPs: 31.89 | +7: iteration 115940/ 173500 | consumed samples: 29680640 | consumed tokens: 60785950720 | elapsed time per iteration (s): 0.42 | learning rate: 6.544E-05 | global batch size: 256 | lm loss: 2.901550E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.248 | TFLOPs: 31.91 | +7: iteration 115950/ 173500 | consumed samples: 29683200 | consumed tokens: 60791193600 | elapsed time per iteration (s): 0.42 | learning rate: 6.542E-05 | global batch size: 256 | lm loss: 2.894105E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.662 | TFLOPs: 31.88 | +7: iteration 115960/ 173500 | consumed samples: 29685760 | consumed tokens: 60796436480 | elapsed time per iteration (s): 0.42 | learning rate: 6.541E-05 | global batch size: 256 | lm loss: 2.909731E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.312 | TFLOPs: 31.81 | +7: iteration 115970/ 173500 | consumed samples: 29688320 | consumed tokens: 60801679360 | elapsed time per iteration (s): 0.42 | learning rate: 6.539E-05 | global batch size: 256 | lm loss: 2.909449E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.097 | TFLOPs: 31.85 | +7: iteration 115980/ 173500 | consumed samples: 29690880 | consumed tokens: 60806922240 | elapsed time per iteration (s): 0.42 | learning rate: 6.538E-05 | global batch size: 256 | lm loss: 2.898037E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.565 | TFLOPs: 31.88 | +7: iteration 115990/ 173500 | consumed samples: 29693440 | consumed tokens: 60812165120 | elapsed time per iteration (s): 0.42 | learning rate: 6.536E-05 | global batch size: 256 | lm loss: 2.903166E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.973 | TFLOPs: 31.90 | +0: [2023-03-17 12:57:17,601] [INFO] [logging.py:68:log_dist] [Rank 0] step=116000, skipped=0, lr=[6.535024808618106e-05, 6.535024808618106e-05, 6.535024808618106e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 116000/ 173500 | consumed samples: 29696000 | consumed tokens: 60817408000 | elapsed time per iteration (s): 0.42 | learning rate: 6.535E-05 | global batch size: 256 | lm loss: 2.894146E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.457 | TFLOPs: 31.87 | +0: steps: 116000 loss: 2.9107 iter time (s): 0.421 samples/sec: 608.751 +7: iteration 116010/ 173500 | consumed samples: 29698560 | consumed tokens: 60822650880 | elapsed time per iteration (s): 0.42 | learning rate: 6.534E-05 | global batch size: 256 | lm loss: 2.903650E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.347 | TFLOPs: 31.81 | +7: iteration 116020/ 173500 | consumed samples: 29701120 | consumed tokens: 60827893760 | elapsed time per iteration (s): 0.42 | learning rate: 6.532E-05 | global batch size: 256 | lm loss: 2.913470E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.862 | TFLOPs: 31.89 | +7: iteration 116030/ 173500 | consumed samples: 29703680 | consumed tokens: 60833136640 | elapsed time per iteration (s): 0.42 | learning rate: 6.531E-05 | global batch size: 256 | lm loss: 2.896913E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.163 | TFLOPs: 31.86 | +7: iteration 116040/ 173500 | consumed samples: 29706240 | consumed tokens: 60838379520 | elapsed time per iteration (s): 0.42 | learning rate: 6.529E-05 | global batch size: 256 | lm loss: 2.884194E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.159 | TFLOPs: 31.86 | +7: iteration 116050/ 173500 | consumed samples: 29708800 | consumed tokens: 60843622400 | elapsed time per iteration (s): 0.42 | learning rate: 6.528E-05 | global batch size: 256 | lm loss: 2.896627E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.843 | TFLOPs: 31.63 | +7: iteration 116060/ 173500 | consumed samples: 29711360 | consumed tokens: 60848865280 | elapsed time per iteration (s): 0.42 | learning rate: 6.526E-05 | global batch size: 256 | lm loss: 2.895254E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.098 | TFLOPs: 31.91 | +7: iteration 116070/ 173500 | consumed samples: 29713920 | consumed tokens: 60854108160 | elapsed time per iteration (s): 0.42 | learning rate: 6.525E-05 | global batch size: 256 | lm loss: 2.883187E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.020 | TFLOPs: 31.90 | +7: iteration 116080/ 173500 | consumed samples: 29716480 | consumed tokens: 60859351040 | elapsed time per iteration (s): 0.42 | learning rate: 6.524E-05 | global batch size: 256 | lm loss: 2.891102E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.799 | TFLOPs: 31.89 | +7: iteration 116090/ 173500 | consumed samples: 29719040 | consumed tokens: 60864593920 | elapsed time per iteration (s): 0.42 | learning rate: 6.522E-05 | global batch size: 256 | lm loss: 2.894987E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.510 | TFLOPs: 31.88 | +7: iteration 116100/ 173500 | consumed samples: 29721600 | consumed tokens: 60869836800 | elapsed time per iteration (s): 0.42 | learning rate: 6.521E-05 | global batch size: 256 | lm loss: 2.902132E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.563 | TFLOPs: 31.88 | +7: iteration 116110/ 173500 | consumed samples: 29724160 | consumed tokens: 60875079680 | elapsed time per iteration (s): 0.42 | learning rate: 6.519E-05 | global batch size: 256 | lm loss: 2.891563E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.228 | TFLOPs: 31.91 | +7: iteration 116120/ 173500 | consumed samples: 29726720 | consumed tokens: 60880322560 | elapsed time per iteration (s): 0.42 | learning rate: 6.518E-05 | global batch size: 256 | lm loss: 2.891570E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.855 | TFLOPs: 31.89 | +7: iteration 116130/ 173500 | consumed samples: 29729280 | consumed tokens: 60885565440 | elapsed time per iteration (s): 0.42 | learning rate: 6.516E-05 | global batch size: 256 | lm loss: 2.888029E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.487 | TFLOPs: 31.87 | +7: iteration 116140/ 173500 | consumed samples: 29731840 | consumed tokens: 60890808320 | elapsed time per iteration (s): 0.42 | learning rate: 6.515E-05 | global batch size: 256 | lm loss: 2.900372E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.573 | TFLOPs: 31.88 | +7: iteration 116150/ 173500 | consumed samples: 29734400 | consumed tokens: 60896051200 | elapsed time per iteration (s): 0.42 | learning rate: 6.514E-05 | global batch size: 256 | lm loss: 2.904223E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.488 | TFLOPs: 31.87 | +7: iteration 116160/ 173500 | consumed samples: 29736960 | consumed tokens: 60901294080 | elapsed time per iteration (s): 0.42 | learning rate: 6.512E-05 | global batch size: 256 | lm loss: 2.899794E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.684 | TFLOPs: 31.88 | +7: iteration 116170/ 173500 | consumed samples: 29739520 | consumed tokens: 60906536960 | elapsed time per iteration (s): 0.42 | learning rate: 6.511E-05 | global batch size: 256 | lm loss: 2.888859E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.753 | TFLOPs: 31.89 | +7: iteration 116180/ 173500 | consumed samples: 29742080 | consumed tokens: 60911779840 | elapsed time per iteration (s): 0.42 | learning rate: 6.509E-05 | global batch size: 256 | lm loss: 2.908982E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.294 | TFLOPs: 31.86 | +7: iteration 116190/ 173500 | consumed samples: 29744640 | consumed tokens: 60917022720 | elapsed time per iteration (s): 0.42 | learning rate: 6.508E-05 | global batch size: 256 | lm loss: 2.904548E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.183 | TFLOPs: 31.86 | +7: iteration 116200/ 173500 | consumed samples: 29747200 | consumed tokens: 60922265600 | elapsed time per iteration (s): 0.42 | learning rate: 6.506E-05 | global batch size: 256 | lm loss: 2.901466E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.959 | TFLOPs: 31.90 | +7: iteration 116210/ 173500 | consumed samples: 29749760 | consumed tokens: 60927508480 | elapsed time per iteration (s): 0.42 | learning rate: 6.505E-05 | global batch size: 256 | lm loss: 2.902151E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.537 | TFLOPs: 31.88 | +7: iteration 116220/ 173500 | consumed samples: 29752320 | consumed tokens: 60932751360 | elapsed time per iteration (s): 0.42 | learning rate: 6.504E-05 | global batch size: 256 | lm loss: 2.902401E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.348 | TFLOPs: 31.66 | +7: iteration 116230/ 173500 | consumed samples: 29754880 | consumed tokens: 60937994240 | elapsed time per iteration (s): 0.42 | learning rate: 6.502E-05 | global batch size: 256 | lm loss: 2.895272E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.220 | TFLOPs: 31.86 | +7: iteration 116240/ 173500 | consumed samples: 29757440 | consumed tokens: 60943237120 | elapsed time per iteration (s): 0.44 | learning rate: 6.501E-05 | global batch size: 256 | lm loss: 2.916079E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.571 | TFLOPs: 30.57 | +7: iteration 116250/ 173500 | consumed samples: 29760000 | consumed tokens: 60948480000 | elapsed time per iteration (s): 0.42 | learning rate: 6.499E-05 | global batch size: 256 | lm loss: 2.886736E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.121 | TFLOPs: 31.91 | +7: iteration 116260/ 173500 | consumed samples: 29762560 | consumed tokens: 60953722880 | elapsed time per iteration (s): 0.42 | learning rate: 6.498E-05 | global batch size: 256 | lm loss: 2.894533E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.278 | TFLOPs: 31.92 | +7: iteration 116270/ 173500 | consumed samples: 29765120 | consumed tokens: 60958965760 | elapsed time per iteration (s): 0.42 | learning rate: 6.496E-05 | global batch size: 256 | lm loss: 2.905081E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.140 | TFLOPs: 31.91 | +7: iteration 116280/ 173500 | consumed samples: 29767680 | consumed tokens: 60964208640 | elapsed time per iteration (s): 0.42 | learning rate: 6.495E-05 | global batch size: 256 | lm loss: 2.900454E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.999 | TFLOPs: 31.90 | +7: iteration 116290/ 173500 | consumed samples: 29770240 | consumed tokens: 60969451520 | elapsed time per iteration (s): 0.42 | learning rate: 6.494E-05 | global batch size: 256 | lm loss: 2.898081E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.832 | TFLOPs: 31.89 | +7: iteration 116300/ 173500 | consumed samples: 29772800 | consumed tokens: 60974694400 | elapsed time per iteration (s): 0.42 | learning rate: 6.492E-05 | global batch size: 256 | lm loss: 2.896457E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.617 | TFLOPs: 31.88 | +7: iteration 116310/ 173500 | consumed samples: 29775360 | consumed tokens: 60979937280 | elapsed time per iteration (s): 0.42 | learning rate: 6.491E-05 | global batch size: 256 | lm loss: 2.904836E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.601 | TFLOPs: 31.88 | +7: iteration 116320/ 173500 | consumed samples: 29777920 | consumed tokens: 60985180160 | elapsed time per iteration (s): 0.42 | learning rate: 6.489E-05 | global batch size: 256 | lm loss: 2.900247E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.330 | TFLOPs: 31.87 | +7: iteration 116330/ 173500 | consumed samples: 29780480 | consumed tokens: 60990423040 | elapsed time per iteration (s): 0.42 | learning rate: 6.488E-05 | global batch size: 256 | lm loss: 2.901865E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.417 | TFLOPs: 31.87 | +7: iteration 116340/ 173500 | consumed samples: 29783040 | consumed tokens: 60995665920 | elapsed time per iteration (s): 0.42 | learning rate: 6.487E-05 | global batch size: 256 | lm loss: 2.903829E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.659 | TFLOPs: 31.88 | +7: iteration 116350/ 173500 | consumed samples: 29785600 | consumed tokens: 61000908800 | elapsed time per iteration (s): 0.42 | learning rate: 6.485E-05 | global batch size: 256 | lm loss: 2.891071E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.036 | TFLOPs: 31.85 | +7: iteration 116360/ 173500 | consumed samples: 29788160 | consumed tokens: 61006151680 | elapsed time per iteration (s): 0.42 | learning rate: 6.484E-05 | global batch size: 256 | lm loss: 2.889939E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.264 | TFLOPs: 31.86 | +7: iteration 116370/ 173500 | consumed samples: 29790720 | consumed tokens: 61011394560 | elapsed time per iteration (s): 0.42 | learning rate: 6.482E-05 | global batch size: 256 | lm loss: 2.894639E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.632 | TFLOPs: 31.88 | +7: iteration 116380/ 173500 | consumed samples: 29793280 | consumed tokens: 61016637440 | elapsed time per iteration (s): 0.42 | learning rate: 6.481E-05 | global batch size: 256 | lm loss: 2.888433E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.587 | TFLOPs: 31.83 | +7: iteration 116390/ 173500 | consumed samples: 29795840 | consumed tokens: 61021880320 | elapsed time per iteration (s): 0.42 | learning rate: 6.479E-05 | global batch size: 256 | lm loss: 2.890979E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.055 | TFLOPs: 31.85 | +7: iteration 116400/ 173500 | consumed samples: 29798400 | consumed tokens: 61027123200 | elapsed time per iteration (s): 0.42 | learning rate: 6.478E-05 | global batch size: 256 | lm loss: 2.893432E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.939 | TFLOPs: 31.85 | +7: iteration 116410/ 173500 | consumed samples: 29800960 | consumed tokens: 61032366080 | elapsed time per iteration (s): 0.42 | learning rate: 6.477E-05 | global batch size: 256 | lm loss: 2.888989E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.155 | TFLOPs: 31.86 | +7: iteration 116420/ 173500 | consumed samples: 29803520 | consumed tokens: 61037608960 | elapsed time per iteration (s): 0.42 | learning rate: 6.475E-05 | global batch size: 256 | lm loss: 2.902227E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.453 | TFLOPs: 31.87 | +7: iteration 116430/ 173500 | consumed samples: 29806080 | consumed tokens: 61042851840 | elapsed time per iteration (s): 0.42 | learning rate: 6.474E-05 | global batch size: 256 | lm loss: 2.892621E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.711 | TFLOPs: 31.89 | +7: iteration 116440/ 173500 | consumed samples: 29808640 | consumed tokens: 61048094720 | elapsed time per iteration (s): 0.42 | learning rate: 6.472E-05 | global batch size: 256 | lm loss: 2.889059E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.961 | TFLOPs: 31.85 | +7: iteration 116450/ 173500 | consumed samples: 29811200 | consumed tokens: 61053337600 | elapsed time per iteration (s): 0.42 | learning rate: 6.471E-05 | global batch size: 256 | lm loss: 2.896825E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.351 | TFLOPs: 31.87 | +7: iteration 116460/ 173500 | consumed samples: 29813760 | consumed tokens: 61058580480 | elapsed time per iteration (s): 0.42 | learning rate: 6.469E-05 | global batch size: 256 | lm loss: 2.895565E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.083 | TFLOPs: 31.85 | +7: iteration 116470/ 173500 | consumed samples: 29816320 | consumed tokens: 61063823360 | elapsed time per iteration (s): 0.42 | learning rate: 6.468E-05 | global batch size: 256 | lm loss: 2.891023E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.067 | TFLOPs: 31.85 | +7: iteration 116480/ 173500 | consumed samples: 29818880 | consumed tokens: 61069066240 | elapsed time per iteration (s): 0.42 | learning rate: 6.467E-05 | global batch size: 256 | lm loss: 2.889916E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.360 | TFLOPs: 31.87 | +7: iteration 116490/ 173500 | consumed samples: 29821440 | consumed tokens: 61074309120 | elapsed time per iteration (s): 0.42 | learning rate: 6.465E-05 | global batch size: 256 | lm loss: 2.898780E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.111 | TFLOPs: 31.85 | +7: iteration 116500/ 173500 | consumed samples: 29824000 | consumed tokens: 61079552000 | elapsed time per iteration (s): 0.42 | learning rate: 6.464E-05 | global batch size: 256 | lm loss: 2.888293E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.007 | TFLOPs: 31.85 | +7: iteration 116510/ 173500 | consumed samples: 29826560 | consumed tokens: 61084794880 | elapsed time per iteration (s): 0.42 | learning rate: 6.462E-05 | global batch size: 256 | lm loss: 2.893353E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.854 | TFLOPs: 31.84 | +7: iteration 116520/ 173500 | consumed samples: 29829120 | consumed tokens: 61090037760 | elapsed time per iteration (s): 0.42 | learning rate: 6.461E-05 | global batch size: 256 | lm loss: 2.886370E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.861 | TFLOPs: 31.89 | +7: iteration 116530/ 173500 | consumed samples: 29831680 | consumed tokens: 61095280640 | elapsed time per iteration (s): 0.42 | learning rate: 6.459E-05 | global batch size: 256 | lm loss: 2.892942E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.342 | TFLOPs: 31.87 | +7: iteration 116540/ 173500 | consumed samples: 29834240 | consumed tokens: 61100523520 | elapsed time per iteration (s): 0.42 | learning rate: 6.458E-05 | global batch size: 256 | lm loss: 2.894538E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.523 | TFLOPs: 31.88 | +7: iteration 116550/ 173500 | consumed samples: 29836800 | consumed tokens: 61105766400 | elapsed time per iteration (s): 0.42 | learning rate: 6.457E-05 | global batch size: 256 | lm loss: 2.896029E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.326 | TFLOPs: 31.87 | +7: iteration 116560/ 173500 | consumed samples: 29839360 | consumed tokens: 61111009280 | elapsed time per iteration (s): 0.42 | learning rate: 6.455E-05 | global batch size: 256 | lm loss: 2.909040E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.373 | TFLOPs: 31.87 | +7: iteration 116570/ 173500 | consumed samples: 29841920 | consumed tokens: 61116252160 | elapsed time per iteration (s): 0.42 | learning rate: 6.454E-05 | global batch size: 256 | lm loss: 2.881943E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.237 | TFLOPs: 31.86 | +7: iteration 116580/ 173500 | consumed samples: 29844480 | consumed tokens: 61121495040 | elapsed time per iteration (s): 0.42 | learning rate: 6.452E-05 | global batch size: 256 | lm loss: 2.885148E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.522 | TFLOPs: 31.88 | +7: iteration 116590/ 173500 | consumed samples: 29847040 | consumed tokens: 61126737920 | elapsed time per iteration (s): 0.42 | learning rate: 6.451E-05 | global batch size: 256 | lm loss: 2.901074E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.487 | TFLOPs: 31.87 | +7: iteration 116600/ 173500 | consumed samples: 29849600 | consumed tokens: 61131980800 | elapsed time per iteration (s): 0.42 | learning rate: 6.450E-05 | global batch size: 256 | lm loss: 2.907137E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.927 | TFLOPs: 31.69 | +7: iteration 116610/ 173500 | consumed samples: 29852160 | consumed tokens: 61137223680 | elapsed time per iteration (s): 0.42 | learning rate: 6.448E-05 | global batch size: 256 | lm loss: 2.898640E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.406 | TFLOPs: 31.87 | +7: iteration 116620/ 173500 | consumed samples: 29854720 | consumed tokens: 61142466560 | elapsed time per iteration (s): 0.42 | learning rate: 6.447E-05 | global batch size: 256 | lm loss: 2.907212E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.076 | TFLOPs: 31.85 | +7: iteration 116630/ 173500 | consumed samples: 29857280 | consumed tokens: 61147709440 | elapsed time per iteration (s): 0.43 | learning rate: 6.445E-05 | global batch size: 256 | lm loss: 2.895919E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.474 | TFLOPs: 31.51 | +7: iteration 116640/ 173500 | consumed samples: 29859840 | consumed tokens: 61152952320 | elapsed time per iteration (s): 0.43 | learning rate: 6.444E-05 | global batch size: 256 | lm loss: 2.898624E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.110 | TFLOPs: 31.33 | +7: iteration 116650/ 173500 | consumed samples: 29862400 | consumed tokens: 61158195200 | elapsed time per iteration (s): 0.42 | learning rate: 6.442E-05 | global batch size: 256 | lm loss: 2.902676E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.468 | TFLOPs: 31.61 | +7: iteration 116660/ 173500 | consumed samples: 29864960 | consumed tokens: 61163438080 | elapsed time per iteration (s): 0.42 | learning rate: 6.441E-05 | global batch size: 256 | lm loss: 2.903268E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.673 | TFLOPs: 31.88 | +7: iteration 116670/ 173500 | consumed samples: 29867520 | consumed tokens: 61168680960 | elapsed time per iteration (s): 0.42 | learning rate: 6.440E-05 | global batch size: 256 | lm loss: 2.892274E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.041 | TFLOPs: 31.90 | +7: iteration 116680/ 173500 | consumed samples: 29870080 | consumed tokens: 61173923840 | elapsed time per iteration (s): 0.42 | learning rate: 6.438E-05 | global batch size: 256 | lm loss: 2.906479E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.803 | TFLOPs: 31.89 | +7: iteration 116690/ 173500 | consumed samples: 29872640 | consumed tokens: 61179166720 | elapsed time per iteration (s): 0.42 | learning rate: 6.437E-05 | global batch size: 256 | lm loss: 2.911018E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.441 | TFLOPs: 31.87 | +7: iteration 116700/ 173500 | consumed samples: 29875200 | consumed tokens: 61184409600 | elapsed time per iteration (s): 0.42 | learning rate: 6.435E-05 | global batch size: 256 | lm loss: 2.893943E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.815 | TFLOPs: 31.89 | +7: iteration 116710/ 173500 | consumed samples: 29877760 | consumed tokens: 61189652480 | elapsed time per iteration (s): 0.42 | learning rate: 6.434E-05 | global batch size: 256 | lm loss: 2.914187E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.697 | TFLOPs: 31.88 | +7: iteration 116720/ 173500 | consumed samples: 29880320 | consumed tokens: 61194895360 | elapsed time per iteration (s): 0.42 | learning rate: 6.433E-05 | global batch size: 256 | lm loss: 2.905283E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.844 | TFLOPs: 31.89 | +7: iteration 116730/ 173500 | consumed samples: 29882880 | consumed tokens: 61200138240 | elapsed time per iteration (s): 0.42 | learning rate: 6.431E-05 | global batch size: 256 | lm loss: 2.899850E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.005 | TFLOPs: 31.90 | +7: iteration 116740/ 173500 | consumed samples: 29885440 | consumed tokens: 61205381120 | elapsed time per iteration (s): 0.42 | learning rate: 6.430E-05 | global batch size: 256 | lm loss: 2.902924E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.051 | TFLOPs: 31.90 | +7: iteration 116750/ 173500 | consumed samples: 29888000 | consumed tokens: 61210624000 | elapsed time per iteration (s): 0.42 | learning rate: 6.428E-05 | global batch size: 256 | lm loss: 2.886929E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.631 | TFLOPs: 31.88 | +7: iteration 116760/ 173500 | consumed samples: 29890560 | consumed tokens: 61215866880 | elapsed time per iteration (s): 0.42 | learning rate: 6.427E-05 | global batch size: 256 | lm loss: 2.891435E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.748 | TFLOPs: 31.89 | +7: iteration 116770/ 173500 | consumed samples: 29893120 | consumed tokens: 61221109760 | elapsed time per iteration (s): 0.42 | learning rate: 6.425E-05 | global batch size: 256 | lm loss: 2.896809E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.639 | TFLOPs: 31.88 | +7: iteration 116780/ 173500 | consumed samples: 29895680 | consumed tokens: 61226352640 | elapsed time per iteration (s): 0.42 | learning rate: 6.424E-05 | global batch size: 256 | lm loss: 2.896789E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.427 | TFLOPs: 31.87 | +7: iteration 116790/ 173500 | consumed samples: 29898240 | consumed tokens: 61231595520 | elapsed time per iteration (s): 0.42 | learning rate: 6.423E-05 | global batch size: 256 | lm loss: 2.895401E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.303 | TFLOPs: 31.86 | +7: iteration 116800/ 173500 | consumed samples: 29900800 | consumed tokens: 61236838400 | elapsed time per iteration (s): 0.42 | learning rate: 6.421E-05 | global batch size: 256 | lm loss: 2.906292E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.769 | TFLOPs: 31.89 | +7: iteration 116810/ 173500 | consumed samples: 29903360 | consumed tokens: 61242081280 | elapsed time per iteration (s): 0.42 | learning rate: 6.420E-05 | global batch size: 256 | lm loss: 2.892261E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.069 | TFLOPs: 31.85 | +7: iteration 116820/ 173500 | consumed samples: 29905920 | consumed tokens: 61247324160 | elapsed time per iteration (s): 0.42 | learning rate: 6.418E-05 | global batch size: 256 | lm loss: 2.906465E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.699 | TFLOPs: 31.88 | +7: iteration 116830/ 173500 | consumed samples: 29908480 | consumed tokens: 61252567040 | elapsed time per iteration (s): 0.42 | learning rate: 6.417E-05 | global batch size: 256 | lm loss: 2.895588E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.335 | TFLOPs: 31.92 | +7: iteration 116840/ 173500 | consumed samples: 29911040 | consumed tokens: 61257809920 | elapsed time per iteration (s): 0.42 | learning rate: 6.415E-05 | global batch size: 256 | lm loss: 2.893018E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.403 | TFLOPs: 31.87 | +7: iteration 116850/ 173500 | consumed samples: 29913600 | consumed tokens: 61263052800 | elapsed time per iteration (s): 0.42 | learning rate: 6.414E-05 | global batch size: 256 | lm loss: 2.891151E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.963 | TFLOPs: 31.90 | +7: iteration 116860/ 173500 | consumed samples: 29916160 | consumed tokens: 61268295680 | elapsed time per iteration (s): 0.42 | learning rate: 6.413E-05 | global batch size: 256 | lm loss: 2.916430E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.388 | TFLOPs: 31.87 | +7: iteration 116870/ 173500 | consumed samples: 29918720 | consumed tokens: 61273538560 | elapsed time per iteration (s): 0.42 | learning rate: 6.411E-05 | global batch size: 256 | lm loss: 2.889029E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.873 | TFLOPs: 31.89 | +7: iteration 116880/ 173500 | consumed samples: 29921280 | consumed tokens: 61278781440 | elapsed time per iteration (s): 0.42 | learning rate: 6.410E-05 | global batch size: 256 | lm loss: 2.898101E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.229 | TFLOPs: 31.86 | +7: iteration 116890/ 173500 | consumed samples: 29923840 | consumed tokens: 61284024320 | elapsed time per iteration (s): 0.42 | learning rate: 6.408E-05 | global batch size: 256 | lm loss: 2.896383E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.496 | TFLOPs: 31.87 | +7: iteration 116900/ 173500 | consumed samples: 29926400 | consumed tokens: 61289267200 | elapsed time per iteration (s): 0.42 | learning rate: 6.407E-05 | global batch size: 256 | lm loss: 2.895155E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.802 | TFLOPs: 31.89 | +7: iteration 116910/ 173500 | consumed samples: 29928960 | consumed tokens: 61294510080 | elapsed time per iteration (s): 0.42 | learning rate: 6.406E-05 | global batch size: 256 | lm loss: 2.893324E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.518 | TFLOPs: 31.88 | +7: iteration 116920/ 173500 | consumed samples: 29931520 | consumed tokens: 61299752960 | elapsed time per iteration (s): 0.42 | learning rate: 6.404E-05 | global batch size: 256 | lm loss: 2.886831E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.235 | TFLOPs: 31.86 | +7: iteration 116930/ 173500 | consumed samples: 29934080 | consumed tokens: 61304995840 | elapsed time per iteration (s): 0.42 | learning rate: 6.403E-05 | global batch size: 256 | lm loss: 2.907363E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.277 | TFLOPs: 31.86 | +7: iteration 116940/ 173500 | consumed samples: 29936640 | consumed tokens: 61310238720 | elapsed time per iteration (s): 0.42 | learning rate: 6.401E-05 | global batch size: 256 | lm loss: 2.903920E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.492 | TFLOPs: 31.87 | +7: iteration 116950/ 173500 | consumed samples: 29939200 | consumed tokens: 61315481600 | elapsed time per iteration (s): 0.42 | learning rate: 6.400E-05 | global batch size: 256 | lm loss: 2.893044E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.538 | TFLOPs: 31.88 | +7: iteration 116960/ 173500 | consumed samples: 29941760 | consumed tokens: 61320724480 | elapsed time per iteration (s): 0.42 | learning rate: 6.399E-05 | global batch size: 256 | lm loss: 2.905861E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.624 | TFLOPs: 31.88 | +7: iteration 116970/ 173500 | consumed samples: 29944320 | consumed tokens: 61325967360 | elapsed time per iteration (s): 0.42 | learning rate: 6.397E-05 | global batch size: 256 | lm loss: 2.897588E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.565 | TFLOPs: 31.88 | +7: iteration 116980/ 173500 | consumed samples: 29946880 | consumed tokens: 61331210240 | elapsed time per iteration (s): 0.42 | learning rate: 6.396E-05 | global batch size: 256 | lm loss: 2.915728E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.911 | TFLOPs: 31.84 | +7: iteration 116990/ 173500 | consumed samples: 29949440 | consumed tokens: 61336453120 | elapsed time per iteration (s): 0.42 | learning rate: 6.394E-05 | global batch size: 256 | lm loss: 2.889453E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.044 | TFLOPs: 31.85 | +7: iteration 117000/ 173500 | consumed samples: 29952000 | consumed tokens: 61341696000 | elapsed time per iteration (s): 0.42 | learning rate: 6.393E-05 | global batch size: 256 | lm loss: 2.907049E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.288 | TFLOPs: 31.86 | +7: iteration 117010/ 173500 | consumed samples: 29954560 | consumed tokens: 61346938880 | elapsed time per iteration (s): 0.42 | learning rate: 6.391E-05 | global batch size: 256 | lm loss: 2.916292E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.230 | TFLOPs: 31.86 | +7: iteration 117020/ 173500 | consumed samples: 29957120 | consumed tokens: 61352181760 | elapsed time per iteration (s): 0.42 | learning rate: 6.390E-05 | global batch size: 256 | lm loss: 2.892652E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.359 | TFLOPs: 31.87 | +7: iteration 117030/ 173500 | consumed samples: 29959680 | consumed tokens: 61357424640 | elapsed time per iteration (s): 0.42 | learning rate: 6.389E-05 | global batch size: 256 | lm loss: 2.899797E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.676 | TFLOPs: 31.83 | +7: iteration 117040/ 173500 | consumed samples: 29962240 | consumed tokens: 61362667520 | elapsed time per iteration (s): 0.42 | learning rate: 6.387E-05 | global batch size: 256 | lm loss: 2.884637E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.245 | TFLOPs: 31.86 | +7: iteration 117050/ 173500 | consumed samples: 29964800 | consumed tokens: 61367910400 | elapsed time per iteration (s): 0.42 | learning rate: 6.386E-05 | global batch size: 256 | lm loss: 2.901416E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.245 | TFLOPs: 31.86 | +7: iteration 117060/ 173500 | consumed samples: 29967360 | consumed tokens: 61373153280 | elapsed time per iteration (s): 0.42 | learning rate: 6.384E-05 | global batch size: 256 | lm loss: 2.900575E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.016 | TFLOPs: 31.85 | +7: iteration 117070/ 173500 | consumed samples: 29969920 | consumed tokens: 61378396160 | elapsed time per iteration (s): 0.42 | learning rate: 6.383E-05 | global batch size: 256 | lm loss: 2.906392E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.291 | TFLOPs: 31.86 | +7: iteration 117080/ 173500 | consumed samples: 29972480 | consumed tokens: 61383639040 | elapsed time per iteration (s): 0.42 | learning rate: 6.382E-05 | global batch size: 256 | lm loss: 2.886246E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.596 | TFLOPs: 31.88 | +7: iteration 117090/ 173500 | consumed samples: 29975040 | consumed tokens: 61388881920 | elapsed time per iteration (s): 0.42 | learning rate: 6.380E-05 | global batch size: 256 | lm loss: 2.882148E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.067 | TFLOPs: 31.85 | +7: iteration 117100/ 173500 | consumed samples: 29977600 | consumed tokens: 61394124800 | elapsed time per iteration (s): 0.42 | learning rate: 6.379E-05 | global batch size: 256 | lm loss: 2.887949E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.802 | TFLOPs: 31.84 | +7: iteration 117110/ 173500 | consumed samples: 29980160 | consumed tokens: 61399367680 | elapsed time per iteration (s): 0.42 | learning rate: 6.377E-05 | global batch size: 256 | lm loss: 2.893271E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.359 | TFLOPs: 31.60 | +7: iteration 117120/ 173500 | consumed samples: 29982720 | consumed tokens: 61404610560 | elapsed time per iteration (s): 0.42 | learning rate: 6.376E-05 | global batch size: 256 | lm loss: 2.910117E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.455 | TFLOPs: 31.87 | +7: iteration 117130/ 173500 | consumed samples: 29985280 | consumed tokens: 61409853440 | elapsed time per iteration (s): 0.42 | learning rate: 6.374E-05 | global batch size: 256 | lm loss: 2.894241E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.231 | TFLOPs: 31.81 | +7: iteration 117140/ 173500 | consumed samples: 29987840 | consumed tokens: 61415096320 | elapsed time per iteration (s): 0.42 | learning rate: 6.373E-05 | global batch size: 256 | lm loss: 2.895706E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.336 | TFLOPs: 31.87 | +7: iteration 117150/ 173500 | consumed samples: 29990400 | consumed tokens: 61420339200 | elapsed time per iteration (s): 0.42 | learning rate: 6.372E-05 | global batch size: 256 | lm loss: 2.910976E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.226 | TFLOPs: 31.86 | +7: iteration 117160/ 173500 | consumed samples: 29992960 | consumed tokens: 61425582080 | elapsed time per iteration (s): 0.42 | learning rate: 6.370E-05 | global batch size: 256 | lm loss: 2.891061E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.807 | TFLOPs: 31.84 | +7: iteration 117170/ 173500 | consumed samples: 29995520 | consumed tokens: 61430824960 | elapsed time per iteration (s): 0.42 | learning rate: 6.369E-05 | global batch size: 256 | lm loss: 2.900823E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.949 | TFLOPs: 31.85 | +7: iteration 117180/ 173500 | consumed samples: 29998080 | consumed tokens: 61436067840 | elapsed time per iteration (s): 0.42 | learning rate: 6.367E-05 | global batch size: 256 | lm loss: 2.894352E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.972 | TFLOPs: 31.85 | +7: iteration 117190/ 173500 | consumed samples: 30000640 | consumed tokens: 61441310720 | elapsed time per iteration (s): 0.42 | learning rate: 6.366E-05 | global batch size: 256 | lm loss: 2.888927E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.218 | TFLOPs: 31.86 | +7: iteration 117200/ 173500 | consumed samples: 30003200 | consumed tokens: 61446553600 | elapsed time per iteration (s): 0.49 | learning rate: 6.365E-05 | global batch size: 256 | lm loss: 2.893769E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 518.499 | TFLOPs: 27.20 | +7: iteration 117210/ 173500 | consumed samples: 30005760 | consumed tokens: 61451796480 | elapsed time per iteration (s): 0.42 | learning rate: 6.363E-05 | global batch size: 256 | lm loss: 2.899291E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.953 | TFLOPs: 32.00 | +7: iteration 117220/ 173500 | consumed samples: 30008320 | consumed tokens: 61457039360 | elapsed time per iteration (s): 0.42 | learning rate: 6.362E-05 | global batch size: 256 | lm loss: 2.890662E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.199 | TFLOPs: 31.96 | +7: iteration 117230/ 173500 | consumed samples: 30010880 | consumed tokens: 61462282240 | elapsed time per iteration (s): 0.42 | learning rate: 6.360E-05 | global batch size: 256 | lm loss: 2.909466E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.741 | TFLOPs: 31.94 | +7: iteration 117240/ 173500 | consumed samples: 30013440 | consumed tokens: 61467525120 | elapsed time per iteration (s): 0.42 | learning rate: 6.359E-05 | global batch size: 256 | lm loss: 2.901869E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.510 | TFLOPs: 31.93 | +7: iteration 117250/ 173500 | consumed samples: 30016000 | consumed tokens: 61472768000 | elapsed time per iteration (s): 0.42 | learning rate: 6.358E-05 | global batch size: 256 | lm loss: 2.887602E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.232 | TFLOPs: 31.91 | +7: iteration 117260/ 173500 | consumed samples: 30018560 | consumed tokens: 61478010880 | elapsed time per iteration (s): 0.42 | learning rate: 6.356E-05 | global batch size: 256 | lm loss: 2.895747E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.160 | TFLOPs: 31.91 | +7: iteration 117270/ 173500 | consumed samples: 30021120 | consumed tokens: 61483253760 | elapsed time per iteration (s): 0.42 | learning rate: 6.355E-05 | global batch size: 256 | lm loss: 2.906660E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.166 | TFLOPs: 31.91 | +7: iteration 117280/ 173500 | consumed samples: 30023680 | consumed tokens: 61488496640 | elapsed time per iteration (s): 0.42 | learning rate: 6.353E-05 | global batch size: 256 | lm loss: 2.897284E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.945 | TFLOPs: 31.90 | +7: iteration 117290/ 173500 | consumed samples: 30026240 | consumed tokens: 61493739520 | elapsed time per iteration (s): 0.42 | learning rate: 6.352E-05 | global batch size: 256 | lm loss: 2.892362E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.822 | TFLOPs: 31.89 | +7: iteration 117300/ 173500 | consumed samples: 30028800 | consumed tokens: 61498982400 | elapsed time per iteration (s): 0.42 | learning rate: 6.351E-05 | global batch size: 256 | lm loss: 2.897141E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.693 | TFLOPs: 31.88 | +7: iteration 117310/ 173500 | consumed samples: 30031360 | consumed tokens: 61504225280 | elapsed time per iteration (s): 0.42 | learning rate: 6.349E-05 | global batch size: 256 | lm loss: 2.898180E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.594 | TFLOPs: 31.88 | +7: iteration 117320/ 173500 | consumed samples: 30033920 | consumed tokens: 61509468160 | elapsed time per iteration (s): 0.42 | learning rate: 6.348E-05 | global batch size: 256 | lm loss: 2.901409E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.629 | TFLOPs: 31.88 | +7: iteration 117330/ 173500 | consumed samples: 30036480 | consumed tokens: 61514711040 | elapsed time per iteration (s): 0.42 | learning rate: 6.346E-05 | global batch size: 256 | lm loss: 2.907243E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.589 | TFLOPs: 31.88 | +7: iteration 117340/ 173500 | consumed samples: 30039040 | consumed tokens: 61519953920 | elapsed time per iteration (s): 0.42 | learning rate: 6.345E-05 | global batch size: 256 | lm loss: 2.882664E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.605 | TFLOPs: 31.78 | +7: iteration 117350/ 173500 | consumed samples: 30041600 | consumed tokens: 61525196800 | elapsed time per iteration (s): 0.42 | learning rate: 6.343E-05 | global batch size: 256 | lm loss: 2.911741E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.729 | TFLOPs: 31.89 | +7: iteration 117360/ 173500 | consumed samples: 30044160 | consumed tokens: 61530439680 | elapsed time per iteration (s): 0.42 | learning rate: 6.342E-05 | global batch size: 256 | lm loss: 2.901496E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.284 | TFLOPs: 31.86 | +7: iteration 117370/ 173500 | consumed samples: 30046720 | consumed tokens: 61535682560 | elapsed time per iteration (s): 0.42 | learning rate: 6.341E-05 | global batch size: 256 | lm loss: 2.903910E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.491 | TFLOPs: 31.87 | +7: iteration 117380/ 173500 | consumed samples: 30049280 | consumed tokens: 61540925440 | elapsed time per iteration (s): 0.42 | learning rate: 6.339E-05 | global batch size: 256 | lm loss: 2.904627E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.011 | TFLOPs: 31.85 | +7: iteration 117390/ 173500 | consumed samples: 30051840 | consumed tokens: 61546168320 | elapsed time per iteration (s): 0.42 | learning rate: 6.338E-05 | global batch size: 256 | lm loss: 2.893505E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.007 | TFLOPs: 31.85 | +7: iteration 117400/ 173500 | consumed samples: 30054400 | consumed tokens: 61551411200 | elapsed time per iteration (s): 0.42 | learning rate: 6.336E-05 | global batch size: 256 | lm loss: 2.901734E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.453 | TFLOPs: 31.87 | +7: iteration 117410/ 173500 | consumed samples: 30056960 | consumed tokens: 61556654080 | elapsed time per iteration (s): 0.42 | learning rate: 6.335E-05 | global batch size: 256 | lm loss: 2.899990E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.488 | TFLOPs: 31.87 | +7: iteration 117420/ 173500 | consumed samples: 30059520 | consumed tokens: 61561896960 | elapsed time per iteration (s): 0.42 | learning rate: 6.334E-05 | global batch size: 256 | lm loss: 2.891469E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.652 | TFLOPs: 31.88 | +7: iteration 117430/ 173500 | consumed samples: 30062080 | consumed tokens: 61567139840 | elapsed time per iteration (s): 0.42 | learning rate: 6.332E-05 | global batch size: 256 | lm loss: 2.888670E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.357 | TFLOPs: 31.87 | +7: iteration 117440/ 173500 | consumed samples: 30064640 | consumed tokens: 61572382720 | elapsed time per iteration (s): 0.42 | learning rate: 6.331E-05 | global batch size: 256 | lm loss: 2.889044E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.135 | TFLOPs: 31.86 | +7: iteration 117450/ 173500 | consumed samples: 30067200 | consumed tokens: 61577625600 | elapsed time per iteration (s): 0.42 | learning rate: 6.329E-05 | global batch size: 256 | lm loss: 2.905428E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.576 | TFLOPs: 31.88 | +7: iteration 117460/ 173500 | consumed samples: 30069760 | consumed tokens: 61582868480 | elapsed time per iteration (s): 0.42 | learning rate: 6.328E-05 | global batch size: 256 | lm loss: 2.897897E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.575 | TFLOPs: 31.88 | +7: iteration 117470/ 173500 | consumed samples: 30072320 | consumed tokens: 61588111360 | elapsed time per iteration (s): 0.42 | learning rate: 6.327E-05 | global batch size: 256 | lm loss: 2.893673E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.819 | TFLOPs: 31.89 | +7: iteration 117480/ 173500 | consumed samples: 30074880 | consumed tokens: 61593354240 | elapsed time per iteration (s): 0.43 | learning rate: 6.325E-05 | global batch size: 256 | lm loss: 2.893962E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.059 | TFLOPs: 31.12 | +7: iteration 117490/ 173500 | consumed samples: 30077440 | consumed tokens: 61598597120 | elapsed time per iteration (s): 0.42 | learning rate: 6.324E-05 | global batch size: 256 | lm loss: 2.897072E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.047 | TFLOPs: 31.90 | +7: iteration 117500/ 173500 | consumed samples: 30080000 | consumed tokens: 61603840000 | elapsed time per iteration (s): 0.42 | learning rate: 6.322E-05 | global batch size: 256 | lm loss: 2.891191E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.104 | TFLOPs: 31.91 | +7: iteration 117510/ 173500 | consumed samples: 30082560 | consumed tokens: 61609082880 | elapsed time per iteration (s): 0.42 | learning rate: 6.321E-05 | global batch size: 256 | lm loss: 2.889719E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.499 | TFLOPs: 31.87 | +7: iteration 117520/ 173500 | consumed samples: 30085120 | consumed tokens: 61614325760 | elapsed time per iteration (s): 0.42 | learning rate: 6.320E-05 | global batch size: 256 | lm loss: 2.896737E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.535 | TFLOPs: 31.88 | +7: iteration 117530/ 173500 | consumed samples: 30087680 | consumed tokens: 61619568640 | elapsed time per iteration (s): 0.42 | learning rate: 6.318E-05 | global batch size: 256 | lm loss: 2.906053E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.177 | TFLOPs: 31.86 | +7: iteration 117540/ 173500 | consumed samples: 30090240 | consumed tokens: 61624811520 | elapsed time per iteration (s): 0.42 | learning rate: 6.317E-05 | global batch size: 256 | lm loss: 2.900296E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.482 | TFLOPs: 31.87 | +7: iteration 117550/ 173500 | consumed samples: 30092800 | consumed tokens: 61630054400 | elapsed time per iteration (s): 0.42 | learning rate: 6.315E-05 | global batch size: 256 | lm loss: 2.899267E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.205 | TFLOPs: 31.86 | +7: iteration 117560/ 173500 | consumed samples: 30095360 | consumed tokens: 61635297280 | elapsed time per iteration (s): 0.42 | learning rate: 6.314E-05 | global batch size: 256 | lm loss: 2.895943E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.434 | TFLOPs: 31.87 | +7: iteration 117570/ 173500 | consumed samples: 30097920 | consumed tokens: 61640540160 | elapsed time per iteration (s): 0.42 | learning rate: 6.313E-05 | global batch size: 256 | lm loss: 2.906242E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.547 | TFLOPs: 31.88 | +7: iteration 117580/ 173500 | consumed samples: 30100480 | consumed tokens: 61645783040 | elapsed time per iteration (s): 0.42 | learning rate: 6.311E-05 | global batch size: 256 | lm loss: 2.895391E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.722 | TFLOPs: 31.68 | +7: iteration 117590/ 173500 | consumed samples: 30103040 | consumed tokens: 61651025920 | elapsed time per iteration (s): 0.42 | learning rate: 6.310E-05 | global batch size: 256 | lm loss: 2.899555E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.534 | TFLOPs: 31.88 | +7: iteration 117600/ 173500 | consumed samples: 30105600 | consumed tokens: 61656268800 | elapsed time per iteration (s): 0.42 | learning rate: 6.308E-05 | global batch size: 256 | lm loss: 2.882000E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.602 | TFLOPs: 31.88 | +7: iteration 117610/ 173500 | consumed samples: 30108160 | consumed tokens: 61661511680 | elapsed time per iteration (s): 0.42 | learning rate: 6.307E-05 | global batch size: 256 | lm loss: 2.889659E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.018 | TFLOPs: 31.85 | +7: iteration 117620/ 173500 | consumed samples: 30110720 | consumed tokens: 61666754560 | elapsed time per iteration (s): 0.42 | learning rate: 6.305E-05 | global batch size: 256 | lm loss: 2.890350E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.084 | TFLOPs: 31.85 | +7: iteration 117630/ 173500 | consumed samples: 30113280 | consumed tokens: 61671997440 | elapsed time per iteration (s): 0.42 | learning rate: 6.304E-05 | global batch size: 256 | lm loss: 2.881650E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.152 | TFLOPs: 31.86 | +7: iteration 117640/ 173500 | consumed samples: 30115840 | consumed tokens: 61677240320 | elapsed time per iteration (s): 0.42 | learning rate: 6.303E-05 | global batch size: 256 | lm loss: 2.882916E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.260 | TFLOPs: 31.86 | +7: iteration 117650/ 173500 | consumed samples: 30118400 | consumed tokens: 61682483200 | elapsed time per iteration (s): 0.42 | learning rate: 6.301E-05 | global batch size: 256 | lm loss: 2.908543E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.786 | TFLOPs: 31.84 | +7: iteration 117660/ 173500 | consumed samples: 30120960 | consumed tokens: 61687726080 | elapsed time per iteration (s): 0.42 | learning rate: 6.300E-05 | global batch size: 256 | lm loss: 2.899755E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.366 | TFLOPs: 31.87 | +7: iteration 117670/ 173500 | consumed samples: 30123520 | consumed tokens: 61692968960 | elapsed time per iteration (s): 0.42 | learning rate: 6.298E-05 | global batch size: 256 | lm loss: 2.893120E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.324 | TFLOPs: 31.87 | +7: iteration 117680/ 173500 | consumed samples: 30126080 | consumed tokens: 61698211840 | elapsed time per iteration (s): 0.42 | learning rate: 6.297E-05 | global batch size: 256 | lm loss: 2.885583E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.490 | TFLOPs: 31.87 | +7: iteration 117690/ 173500 | consumed samples: 30128640 | consumed tokens: 61703454720 | elapsed time per iteration (s): 0.42 | learning rate: 6.296E-05 | global batch size: 256 | lm loss: 2.901434E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.062 | TFLOPs: 31.85 | +7: iteration 117700/ 173500 | consumed samples: 30131200 | consumed tokens: 61708697600 | elapsed time per iteration (s): 0.42 | learning rate: 6.294E-05 | global batch size: 256 | lm loss: 2.892291E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.922 | TFLOPs: 31.84 | +7: iteration 117710/ 173500 | consumed samples: 30133760 | consumed tokens: 61713940480 | elapsed time per iteration (s): 0.42 | learning rate: 6.293E-05 | global batch size: 256 | lm loss: 2.893402E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.826 | TFLOPs: 31.89 | +7: iteration 117720/ 173500 | consumed samples: 30136320 | consumed tokens: 61719183360 | elapsed time per iteration (s): 0.42 | learning rate: 6.291E-05 | global batch size: 256 | lm loss: 2.892688E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.473 | TFLOPs: 31.87 | +7: iteration 117730/ 173500 | consumed samples: 30138880 | consumed tokens: 61724426240 | elapsed time per iteration (s): 0.42 | learning rate: 6.290E-05 | global batch size: 256 | lm loss: 2.890242E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.401 | TFLOPs: 31.87 | +7: iteration 117740/ 173500 | consumed samples: 30141440 | consumed tokens: 61729669120 | elapsed time per iteration (s): 0.42 | learning rate: 6.289E-05 | global batch size: 256 | lm loss: 2.889069E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.174 | TFLOPs: 31.86 | +7: iteration 117750/ 173500 | consumed samples: 30144000 | consumed tokens: 61734912000 | elapsed time per iteration (s): 0.42 | learning rate: 6.287E-05 | global batch size: 256 | lm loss: 2.893223E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.944 | TFLOPs: 31.85 | +7: iteration 117760/ 173500 | consumed samples: 30146560 | consumed tokens: 61740154880 | elapsed time per iteration (s): 0.42 | learning rate: 6.286E-05 | global batch size: 256 | lm loss: 2.894345E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.579 | TFLOPs: 31.88 | +7: iteration 117770/ 173500 | consumed samples: 30149120 | consumed tokens: 61745397760 | elapsed time per iteration (s): 0.42 | learning rate: 6.284E-05 | global batch size: 256 | lm loss: 2.893780E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.264 | TFLOPs: 31.86 | +7: iteration 117780/ 173500 | consumed samples: 30151680 | consumed tokens: 61750640640 | elapsed time per iteration (s): 0.42 | learning rate: 6.283E-05 | global batch size: 256 | lm loss: 2.903854E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.403 | TFLOPs: 31.87 | +7: iteration 117790/ 173500 | consumed samples: 30154240 | consumed tokens: 61755883520 | elapsed time per iteration (s): 0.42 | learning rate: 6.282E-05 | global batch size: 256 | lm loss: 2.891002E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.573 | TFLOPs: 31.88 | +7: iteration 117800/ 173500 | consumed samples: 30156800 | consumed tokens: 61761126400 | elapsed time per iteration (s): 0.43 | learning rate: 6.280E-05 | global batch size: 256 | lm loss: 2.904921E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.720 | TFLOPs: 31.31 | +7: iteration 117810/ 173500 | consumed samples: 30159360 | consumed tokens: 61766369280 | elapsed time per iteration (s): 0.43 | learning rate: 6.279E-05 | global batch size: 256 | lm loss: 2.886996E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.288 | TFLOPs: 31.60 | diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..781483dabc0897ad51fa8b4221f1f9209cb8d5b6 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9f4a74f537dae9ee51ff97554c1ec80deed59ebf582fa92a736492ef63f1cb4 +size 41353495 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5f76f298255dab608d88d2e8a4781a3cc1d3a695 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:323f7622aea36c0c68906456fe905d3e1443ccb53ca8852c7cff177173fa9339 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d5afcd52f6ccda3dc58fce5d9596b0aec642f6b --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0137cbe8f4c75e8d30d584378acfd83d78af1b9915deb9a36f9c837667079220 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6e9d60c3bb1b95b8207fea262e65cb972dc25a35 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:990f5629e0a53d383ee2fad44e87bd34022061ef34b0e384675a5ae45d5d2b3a +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a221968bcdaeb31dcf0b856f05ef3e6eeb97579 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:94f8fd4b5b55638c6adfc322f75d3625797c1a14063f9db15e5e99172f679689 +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f4aed1eb059a86a647e472266fb7f100649cc391 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c22aa968c768771910b2d845929f85e77936ac0e0094aacf20947e619aeab63 +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..39045b32eacbe39e29cce176ec544bea0334fc37 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:584460ce269b7eb9234bf917d2ab1bfe492f8ae4b041f0f83e1fc819f5115e3e +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d04ee9ff25c19d08f7210e6ba84bb98002d170e5 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3eea8925248b35830043251e0cfae734fd46008b8ac78a64fa991fd4325f10ac +size 41353442 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d703bc8687ad88c04bd0e426cf317e79b5c83252 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3797cd6c43285ab80fd8b64da51a1f1bd1ef171dbe81ea2dd4fd44fd595e806c +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bd181a93431f51e45b666101861809395bd14b57 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:020b579994c634f119ab59ff897e84893993cb03fad8fb0b5ecc7b431eb69129 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c08b13f349eea3b58d78c397532a2f9abff06034 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9b544b4c7e56ad2806b0c41e09ce864cab46f3e46369b758b1b261b6ef830c97 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f7e06de5536eca7442d076562b4359339a388cd --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:83caca326e6e69d3febe8d30ac230ab82f7ba514b21a6025a4142a934f7384f8 +size 41353559 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2fe73ed3968eec1173a75b48f44ecaa271d76c99 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa0168e1227cd1e2207e0ef08a71ec66651ced759f632c871ae77dfb15b316bf +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cac852f7525ed57e1a101d0fdbcd2fdf5ecded42 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d458ec7056f2f3ec6580a7a6d8351197dc3f31d94172a2f8e9ab7793d6ecaa89 +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1899fbb729ac406b25f72cabca5c8d87bf03345f --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a621ad7305d6597116ea72cfbbb73541e0da8ab18c321c68fde5976920394caa +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d2b49874c303970f28a7f44b3da932bfa6362ee9 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:899768d4970c1152520fb5bb02f8ea4227de76f33e1727c0afe76473e6c67c8c +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..074c86dd5d4f47208416ef729f5a39345690e400 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:10e605a1e7c8678356d6bef2ea23b71c6c674d15f6e3065499cea842f8edaefc +size 41353698 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e75401f7fe1edddcd3fe552a11b754bd5dea5b25 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65cd151350241e93cbb750f835d089a7d65f8e0b9c5555bcc9af68f38cb9e91f +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..23ae705232e882472c0506b967b5440568d5a5b0 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2488e471a7b859430a84c333624f17dd8e2110c6b35652c1a8cb38aac76fb73c +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f0b370c439074808e179e349f8568510905e0046 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:128d672a87c842c230ca9a1e089bc85054632fe02428d061a71d1e5d02957f2e +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e5889820b6dc9b0d191e371561ed17d73eabf321 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35d84ed94f2ac76d3b481be4171fde294a86f8824363100ab85ef71b8d5a3177 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c613f64ecf8f576c111fb53218f6efc96953694 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c67b67fd5fffe94d586c1e22bcfd23a735e4f798859ff8bbc6a5839dfe3a116f +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b13018aa707da96210a4ec37d0dfdd6c30d5e76 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3537ffe486edb15dd2bef7feabf703283f62b265f0c6336c9e0ee129760708aa +size 41353495 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ba110a575737299ca522bd9dbc7e89f27749d0ca --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3b9fc1b46ddb38720d40d6a378e0319a272cbf235764967685640b9bd8c68b8d +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..27623688ca89d02dffccb8694eb80a50c846439a --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d55b5a74dd79c7cfcb671ee2879d4535ee7560b6ed1c37e212e21b4a3b9c6e45 +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b9d09c0ecec12afa838a514d6c7d72a686a636cf --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:71346087d891bd9b8892ac49619d138815a098f51632057dcbd9f360136f9439 +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..002815e30a72eb6bd3b72dd890d0c0c680a25506 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5c7eaae16b0e7fc99b045cfd1052212c26d052947bf1912ed4d7bea8d763065 +size 41353698 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bbd33b866e7deb429c6c66626ddc4df1c892209c --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82b3367cc951ed4934ce4e19c2f79084f80b424b6aaa8f2a62635761d72ee47a +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bdb89e4881b40725f381f333fffb2ade618bdd7 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6891a637253510f2fa2122137bea4952321a66453e33796f6a9b511212f5f5f +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..13efd75a14965683de0fc4a8b5bbaa1058a9bf55 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82e0b76870d43d91046ff420a0a4e6f100d4ef735f440aa2a77816d8d488f741 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1cf57e5cab880dcdc42d3a70bdd10fad7e0cbd94 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e714cd61a823ce9d696bbf9b9bc0edc410b0501f5e3a6a9c504a0190520012f2 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b31f23206cb7cd22dc9a6b8e8a0c3e27a0352a59 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8391f06c8a47db104fb51e890c007f04322896efc1df51a561bbbcc2c5c4966b +size 41353698 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..432392d7443514ae9abd568b47cae79f05d24195 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51dd5e9509a50525e66b5102e611f9e4399ad86e3c751e7d648b9e5f877b0e15 +size 41353442 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..40c77f3cb47be197923b93ac9580554345cb1b0d --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f5a59555be9239e0b3f345de89a4e5435e51150565d361f6aa7651586bebe10 +size 41353495 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d6f0ea2ac5cb96f4721a50abba350fbb9cf3ebb5 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca358854d5ad8b745abfe2873d1550db37f24255adf89c400707050221280a93 +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3f519942a7feddf401a5973c0e6ee84f266db020 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02ba03ac532916102fa61d872eabf43357367c065cbacf499b5fcf399cdfaa0a +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f86eefc876d9c4d094288f52f7a42854acebcf5a --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:daf5bc8be98847089fb25444d35a610c73834a69715650c82046d91bcc7ece34 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..facd572235d585a13b37f8a572f690239d691cf6 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69545d0b97c93d06abf9e81f41ff8fe90f1541ad47980db83f47bea266be9d65 +size 41353698 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d4cffa071558e39fc874030f30513bb5f362ac71 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b8f135cbd092c4b7c7cf5698d7d698a5a879c02af0b89a6fa34e3baa7250059a +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7c7cae7f3aade44ce56525614e525da2d71d027b --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db4e7974620526f568f62ce7553ee818baa28ee74d7b62ba3382f6ee6e84d426 +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7d81948661b2b00b04b2249a268a94cdbc20df1e --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05ef48608b8330ede046ff14d8e64290ab2cd43688f7561f4efebbac78b295f1 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1aa56b238d467c5d516954bbc5d97d16ece34163 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9786aa255a906299a77c80b49bb1f246209c80d5555d52c8e990743bcff8a9a1 +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..81a65f931a26a0b858a4b9d6d19742b800931e48 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:60da763ea2beb52e50d4b45a7f41a02252cd748b1ce9c2e095c357263ea5a448 +size 41353378 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8caf34c500c52d46a9f8e8cfe45e3d4cac18e0e5 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f0fbb074f5ca3ffb3464dd97ec1b16530da8c89dc7db23a06a3323a609d95655 +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..97e1330a1a85b87dc69cdcf4e8c8d11d9b4d27a3 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21565b47e401034a98402ab4e154de0502977f228fdc37cf12be0da597e3ca9e +size 41353559 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..19d8dc2fcad5cdacda742399f7910705cbd60f15 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f30b12378cd70b51c0616e3aa7bc11d4ee42f8d6faf59b6ed4e5c0df933417d5 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c6c852c69fb356184cd461fee54314ab606df94e --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5aee2128651ff9e80825f6a6cc2e5c651af1f3355927040f89259cbcb992696 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c0858d8b54916eb47e863d6d28ffc95201b3adc2 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3475a802de2069896dbacc8f463603e363aea78b74f04e5845a469d75d48782 +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2cdb7920f432812cc2c8284b93b4528bfec9bbd9 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dba19d0c72bfa2cf9b51bfd2fd7178a108dba7e585805ca900d7bb7cbe0644f1 +size 41353442 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..58f9152a8ac1efe342fc36a3529da11d5a032e23 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:070384fcfcdcd2127b74cd6ee1e17c1a0421904076d42260ade03763cf1bf5e4 +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..382bc4d3708b15932c44be85b5c0bbce34999d06 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:53a5830777869e9093eea5bccf5d37d3f79b4d8f6c2588e264501eec9523e9b4 +size 41353378 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a0b5aaac4c83e968132bca4bd1a778418e51302 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e74034ef153f9aaa4ffbaa3ce579560e8fd7da54357deaf3cbd96b46c0333a67 +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f0d2703b56e1c30fef3ac8828e0772b6d1c6f01d --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:62cd174352181c9d15c9e998dfb16987307360228dec47d4d3086043e3c0d4e4 +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4c5c566eb87ce87b794e93f2589802f93eb87d2a --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e173b3daa70591da260c2a028deea4121d925b386be3e6be4e758db3f3c31384 +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b140328f2cf01505e00f30d4fd50b79cc7a14cbc --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:192d5e0f78df04019e0c6f55bba8a4300ad11ed8c05fe797895df6d14d8c0964 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4fb3671f205beab5c88add303905dc6e130fd793 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43e057f019ba6f292ccd87b460f87809110dfd1d4db75ae7bfe01cd63b611fd1 +size 41353495 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d13fc87fd3f0f0aec640cae3d92f8ddf12e97c8e --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:275dfab6337ed0b2daec2f2b808f964ed3abc532407fb2751620e4b8e1d3611e +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..64d594feccd123ea6dd3bc60e145c60c6bc01082 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ebf66cf557e5011a5bb9008de97ec7d4686b037bfb25b9c336859fe87b0dedcb +size 41353634 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d38d284390df0fccdfe149a6687156044df30a2 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1084a8b33147c5e8fcecb654747fd257656166ae06da9e4038f42e2f6c14d121 +size 41353570 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7dc24ae6f87d4b0a9d63835c18344ee4ed58f237 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8fbecfff40a8a4ef02b7e57a2defb5239788e2ee112bca4438169ce7baa90b01 +size 41353506 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..12fd9464c05ab2588f9edc5f6950021598bb6bd5 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ada575e5e115a8980f565764df7d29bb8d2e1ea5efca063c793a2fe52397942d +size 41353431 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ab38b27fb0e772b9fd2ee9ce0535f16e3ea02c9c --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe1c2b1732374fcb647b239802fdeee2c6fae1d2039be45d8d1eb4ac01d02ad6 +size 41353495 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f0a24ba8d0a9669251b0346f3099423d871866a0 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:94c5d6d9fd9d74ef4573fee7f82806ec363dd43371518a4eb4ceb713eb186e16 +size 41353559 diff --git a/221m32b400m/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/221m32b400m/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..023e8d2fb920798c9adfabda070dd6a10fc63d50 --- /dev/null +++ b/221m32b400m/global_step60336/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c878e72fcc97039925fb4d4aa24fce62281298d074ba98c35f635855950117ef +size 41353495 diff --git a/221m32b400m/global_step60336/layer_01-model_00-model_states.pt b/221m32b400m/global_step60336/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..dbcdd0fa07e55a49cac099a8f14a339f90ec62c0 --- /dev/null +++ b/221m32b400m/global_step60336/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ace1b0477bc8de3f55a08a6034d06109ac9be45fad474d1cc1b53f44159be60 +size 93816067 diff --git a/221m32b400m/global_step60336/layer_03-model_00-model_states.pt b/221m32b400m/global_step60336/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e3e29540cc29214b51457f5c3580f14fb6afe528 --- /dev/null +++ b/221m32b400m/global_step60336/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c7554d8042b25ac560840d0d9a9c6a318df9fad72507588f175ebfb629b506b +size 19295235 diff --git a/221m32b400m/global_step60336/layer_04-model_00-model_states.pt b/221m32b400m/global_step60336/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0b18cb334ad66623de1aeb41cd20d161ed3cdafd --- /dev/null +++ b/221m32b400m/global_step60336/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d3641e8e38323b915906768e36ac62b27fa3f803f7216193442b3f6cf2722b3 +size 19295235 diff --git a/221m32b400m/global_step60336/layer_05-model_00-model_states.pt b/221m32b400m/global_step60336/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1c9ff4967bde2a2d11893eb5b84e91f85f5a9043 --- /dev/null +++ b/221m32b400m/global_step60336/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:013882e04afc08a75016ee7ec2ada97e88099c9f20acccb3211b7d8a764bdd60 +size 19295235 diff --git a/221m32b400m/global_step60336/layer_06-model_00-model_states.pt b/221m32b400m/global_step60336/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c3cf73ed4e7f19b455824b7f474ed60fc532dd86 --- /dev/null +++ b/221m32b400m/global_step60336/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e71a1fc9f3fbb804521c099a028348ebf655fdc452da40d645332e558096d9d5 +size 19295235 diff --git a/221m32b400m/global_step60336/layer_07-model_00-model_states.pt b/221m32b400m/global_step60336/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4e7c605d31440d36db338a06d5d8f161371ace8 --- /dev/null +++ b/221m32b400m/global_step60336/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed53693bd62da2000d5e49cf88a0d6f2d9b38fd141cd67b3f257e7d767d132f1 +size 19295235 diff --git a/221m32b400m/global_step60336/layer_08-model_00-model_states.pt b/221m32b400m/global_step60336/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3da375b3746a04ffc247802dd7020bfbb1971f3d --- /dev/null +++ b/221m32b400m/global_step60336/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce63e23ac0a766179444069459900d6f0ffaaf93e0bd94ff85a1ed6a3e27ed3d +size 19295235 diff --git a/221m32b400m/global_step60336/layer_09-model_00-model_states.pt b/221m32b400m/global_step60336/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bf8393714afc031ef536c762336b75ef5f4b8969 --- /dev/null +++ b/221m32b400m/global_step60336/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e8a54d43e3a9fff9fc4a17042d777fd3e3518e8e738fc3c2df9d1f8112c093df +size 19295235 diff --git a/221m32b400m/global_step60336/layer_10-model_00-model_states.pt b/221m32b400m/global_step60336/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1dfc01d18f2bc418095fb34553d7707a5835484c --- /dev/null +++ b/221m32b400m/global_step60336/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d490e26b9f5b106702e33ddea27b400d9159081f3d7042a042ec22a4ab29d0a +size 19295235 diff --git a/221m32b400m/global_step60336/layer_11-model_00-model_states.pt b/221m32b400m/global_step60336/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c63a170674abbe98bc84d4ba647de9a16e05b50 --- /dev/null +++ b/221m32b400m/global_step60336/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b1d025e30b072362f9654ecdfc0f872139362b70604403e0736bcc149a695aa +size 19295235 diff --git a/221m32b400m/global_step60336/layer_12-model_00-model_states.pt b/221m32b400m/global_step60336/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76ff8cd2a093f09031993fdccf29a5c78ea7af92 --- /dev/null +++ b/221m32b400m/global_step60336/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aa28431c6ae7e8f5623a5ba8b54dc31754df1e00fcd5397707af3d107090c6fe +size 19295235 diff --git a/221m32b400m/global_step60336/layer_13-model_00-model_states.pt b/221m32b400m/global_step60336/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4cabb3c18bca795ddb17149913ad80868daea168 --- /dev/null +++ b/221m32b400m/global_step60336/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:42f0f629367d59cc53193cad151c70cc9471c57eb8bb586d024f936340c4bf0d +size 19295235 diff --git a/221m32b400m/global_step60336/layer_14-model_00-model_states.pt b/221m32b400m/global_step60336/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..feca3351a38345c6412019bb6e73d54392c8aaa5 --- /dev/null +++ b/221m32b400m/global_step60336/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ff4e2ee5ac7a4eaf60e046d688a345dd17c9279b43305b169d371160470ca42 +size 19295235 diff --git a/221m32b400m/global_step60336/layer_15-model_00-model_states.pt b/221m32b400m/global_step60336/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9ae75e2444d5ddb06ff003054d38a84e09e3b062 --- /dev/null +++ b/221m32b400m/global_step60336/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba77c30f91efc795fb3a3fc1127e26ad487513910a4a57bded4349151379017d +size 19295235 diff --git a/221m32b400m/global_step60336/layer_16-model_00-model_states.pt b/221m32b400m/global_step60336/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..15817fa1632e056f78bdeded1701778dc4767d67 --- /dev/null +++ b/221m32b400m/global_step60336/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba1bb70497169d8e3397792843f08ba213c50fe9d9bbd97b29aace80e1acb551 +size 19295235 diff --git a/221m32b400m/global_step60336/layer_17-model_00-model_states.pt b/221m32b400m/global_step60336/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ba3be3d4934ecea2cc0a6efc9e85abf150293249 --- /dev/null +++ b/221m32b400m/global_step60336/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b072ec5c469581016d95305713cdac75e62bcb06f7b49d32579a22bc90e8094d +size 19295235 diff --git a/221m32b400m/global_step60336/layer_18-model_00-model_states.pt b/221m32b400m/global_step60336/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d6ee9fd39d47dfa842dd7cc97f6bac89cb543a81 --- /dev/null +++ b/221m32b400m/global_step60336/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc0749a4bfefd2b9283f207b46dee2686402e6695e28e1c8259d7704048d7458 +size 19295235 diff --git a/221m32b400m/global_step60336/layer_19-model_00-model_states.pt b/221m32b400m/global_step60336/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f6313aada16cd8c55fd2e2f61ae3e589c9e1ce35 --- /dev/null +++ b/221m32b400m/global_step60336/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02d3ba3e2aa057d4780fb7d762ced5d794d5625a9e4bb0bfbc86f5bed881e66d +size 19295235 diff --git a/221m32b400m/global_step60336/layer_20-model_00-model_states.pt b/221m32b400m/global_step60336/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a9b195d163dd8489c3e1e9a5d549c164b6370957 --- /dev/null +++ b/221m32b400m/global_step60336/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:111728080bc733cf6f466e2c6934f606a5def58996941d55f6f101b54c2e6f31 +size 19295235 diff --git a/221m32b400m/global_step60336/layer_22-model_00-model_states.pt b/221m32b400m/global_step60336/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..73acb507fca1e5c30fbf4a82bb78a3ec0d4529f0 --- /dev/null +++ b/221m32b400m/global_step60336/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:39cc7fdf53d4246996a52bcc657682e4f8c6a10b59201a36618bf7d74862b9cb +size 4803 diff --git a/221m32b400m/global_step60336/mp_rank_00_model_states.pt b/221m32b400m/global_step60336/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5f41ee98ca8d26493324e6ab46093c7c1078ff2f --- /dev/null +++ b/221m32b400m/global_step60336/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d3c0d4891cfaeb4a513ba2cb3282c89abc384a46f4e0b77b06b9d3958864bae +size 37747 diff --git a/221m32b400m/sbatch_221m32b400m.sh b/221m32b400m/sbatch_221m32b400m.sh new file mode 100644 index 0000000000000000000000000000000000000000..9a348bd7d92b15b1d597e26bc95bf1ce177cb34e --- /dev/null +++ b/221m32b400m/sbatch_221m32b400m.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=221m32b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_217M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 31633480000 +# -> Samples: 15446035 +TRAIN_SAMPLES=15_446_035 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 154_460 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 10000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/221m32b400m/sbatch_221m32b400mval.sh b/221m32b400m/sbatch_221m32b400mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..98d5af85538994c2be7c5a4991fab48435ce6792 --- /dev/null +++ b/221m32b400m/sbatch_221m32b400mval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=221m32b400mval +VARIANT_CKPT=221m32b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_217M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 60400000000 +# -> Samples: 29492188 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --override-lr-scheduler \ + --reset-progress \ + --no-load-optim \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/221m32b400m/tensorboard_221m32b400m/events.out.tfevents.1678972911.nid006140.6491.0 b/221m32b400m/tensorboard_221m32b400m/events.out.tfevents.1678972911.nid006140.6491.0 new file mode 100644 index 0000000000000000000000000000000000000000..f6c45a512ad65ba4e8e6fcd8e00d696f72b67948 --- /dev/null +++ b/221m32b400m/tensorboard_221m32b400m/events.out.tfevents.1678972911.nid006140.6491.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6d0b02e980caf631e10ae486ec2aef9d1ebce952f0c132f7ad690e07d6f6ef4 +size 107848496 diff --git a/221m32b400m/tensorboard_221m32b400mval/events.out.tfevents.1679000409.nid007230.98327.0 b/221m32b400m/tensorboard_221m32b400mval/events.out.tfevents.1679000409.nid007230.98327.0 new file mode 100644 index 0000000000000000000000000000000000000000..ca4cb7f47a15eb533995aa693a3ab26875ec8371 --- /dev/null +++ b/221m32b400m/tensorboard_221m32b400mval/events.out.tfevents.1679000409.nid007230.98327.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bad5da7ac5232bec3dc74d3c4fc815272faaff20451062f616dd5e8242d2e1e6 +size 980 diff --git a/221m60b400m/3326636.err b/221m60b400m/3326636.err new file mode 100644 index 0000000000000000000000000000000000000000..d754b5b53f0583b54b4dba99558155cd0f54ca04 --- /dev/null +++ b/221m60b400m/3326636.err @@ -0,0 +1,1121 @@ +3: 2023-03-16 23:09:13.946765: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:13.946756: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:13.946778: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: 2023-03-16 23:09:13.947168: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:13.947182: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:13.947160: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: 2023-03-16 23:09:13.947062: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:13.947071: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:13.947076: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: 2023-03-16 23:09:13.947026: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:13.947035: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:13.947044: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:13.947240: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:13.947163: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:13.946809: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:13.946821: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:13.947211: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:13.947222: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:13.947158: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:13.947169: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:13.947175: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: 2023-03-16 23:09:13.946837: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:13.947398: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:13.947407: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:13.947411: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: 2023-03-16 23:09:13.947251: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:13.947252: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:13.947245: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: 2023-03-16 23:09:13.947267: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:13.947057: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:13.946841: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 23:09:13.947283: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:13.947159: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:13.947178: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-16 23:09:13.946891: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:13.947182: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-16 23:09:13.947218: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:13.947206: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:13.947267: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:13.947264: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-16 23:09:13.947246: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:13.947263: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:13.947281: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:13.947263: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:13.947311: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:13.947303: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:13.947461: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:13.947463: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:13.947451: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: 2023-03-16 23:09:13.947259: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-16 23:09:13.947298: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:13.947327: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:13.947331: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-16 23:09:13.947405: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:13.947494: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-16 23:09:13.947491: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:13.948007: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:13.948019: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:13.948043: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:13.948055: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:13.948064: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:13.948067: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:13.948071: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:13.948096: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 23:09:26.200364: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 23:09:26.200226: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-16 23:09:26.200412: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:26.200384: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-16 23:09:26.200846: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:26.200428: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200271: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:26.200409: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:26.200437: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200295: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:26.200403: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200870: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:26.200490: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200257: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:26.200409: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:26.200483: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200305: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:26.200478: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:26.200427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:26.200489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 23:09:26.200508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200313: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:26.200426: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200886: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:26.200501: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:26.200508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:26.200508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 23:09:26.200541: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-16 23:09:26.200421: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:26.200511: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:26.201017: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.200652: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 23:09:26.200493: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200318: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:26.200527: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:26.200573: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200894: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:26.200899: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:26.200907: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 23:09:26.200911: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:26.201243: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:26.200678: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:26.200919: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:26.200535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-16 23:09:26.201259: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:26.200536: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 23:09:26.200590: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:26.201230: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:26.201271: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:26.200984: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:26.201044: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.200705: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 23:09:26.200524: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 2023-03-16 23:09:26.201255: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:26.201264: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:26.201271: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:26.201285: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:26.201290: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:26.200572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-16 23:09:26.201282: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:26.201291: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:26.200555: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 23:09:26.200607: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:26.201017: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:26.201298: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 23:09:26.201305: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:26.201302: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:26.201306: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:26.200699: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 23:09:26.200530: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:26.200553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 2023-03-16 23:09:26.201316: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:26.200566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 2023-03-16 23:09:26.200610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:26.201075: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.200717: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 23:09:26.200553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:26.200577: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 2023-03-16 23:09:26.201017: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:26.200620: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:26.201047: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:26.200578: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:26.200723: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 23:09:26.200568: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:26.200572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:26.200637: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:26.200610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:26.201089: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:26.201096: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:26.201105: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:26.201115: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.200745: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 23:09:26.200571: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:26.201308: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:26.201041: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-16 23:09:26.201119: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:26.201071: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:26.201327: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:26.201338: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:26.201348: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:26.200606: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 23:09:26.200743: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-16 23:09:26.201082: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:26.201366: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:26.201366: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:26.200633: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 2023-03-16 23:09:26.201371: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-16 23:09:26.201374: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:26.201054: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.201510: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.201516: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:26.200591: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 2023-03-16 23:09:26.201525: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:26.200640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:26.201072: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:26.201091: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:26.201108: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-16 23:09:26.201110: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.201540: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.201552: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.201553: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.201557: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:26.201111: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:26.201115: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:26.201144: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-16 23:09:26.201158: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-16 23:09:26.201568: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-16 23:09:53.974390: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.974415: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.974434: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.974466: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.974466: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.974479: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.974490: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.974684: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.982929: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.982956: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.982976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.982987: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.982999: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.983005: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.983026: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.983084: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.983445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.983477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.983504: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.983542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.983545: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 23:09:53.983561: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.983553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.983567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983881: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983911: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.983815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983844: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-16 23:09:53.983924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983938: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.983942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.983843: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-16 23:09:53.983963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983876: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.983862: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983900: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.983874: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.983887: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.983920: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.984020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:09:53.984077: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983937: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.983927: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.983948: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-16 23:09:53.984104: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.984022: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.984047: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:09:53.984105: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:53.984125: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:53.984134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:53.984149: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.984068: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:53.984145: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: 2023-03-16 23:09:53.984079: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.984085: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.984101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.984284: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-16 23:09:53.984171: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:53.984241: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.984287: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.989774: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.989779: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.989777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.989782: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.989785: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.989789: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.989785: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.989791: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-16 23:09:53.989798: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:53.989799: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:53.989799: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:53.989801: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:53.989801: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:53.989804: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:53.989807: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-16 23:09:53.989807: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 23:09:53.990948: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.990956: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 23:09:53.990871: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 23:09:53.990935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.990920: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.991192: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 23:09:53.990950: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.990919: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 23:09:53.990960: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990941: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:53.990947: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990874: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.990923: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 23:09:53.990967: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 23:09:53.991188: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 23:09:53.990953: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.990972: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990951: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.990922: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-16 23:09:53.990974: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 23:09:53.991194: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 23:09:53.990948: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.990970: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 23:09:53.990872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.990920: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 23:09:53.991196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 23:09:53.990952: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.990973: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 23:09:53.990875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990955: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.990925: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990885: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990886: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990941: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 23:09:53.991198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 23:09:53.990958: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.990975: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-16 23:09:53.990888: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990891: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990891: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.990928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-16 23:09:53.990895: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990893: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-16 23:09:53.990895: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990943: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 23:09:53.991198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 23:09:53.990956: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:53.990966: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:53.990967: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.990935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-16 23:09:53.990945: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 23:09:53.991193: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-16 23:09:53.990970: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:53.990971: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:53.990972: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-16 23:09:53.990940: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.990941: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-16 23:09:53.990957: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990965: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-16 23:09:53.990973: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:53.990975: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-16 23:09:53.990975: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.990941: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.990944: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.990944: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990985: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990991: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990992: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990965: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990967: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990968: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.991204: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: 2023-03-16 23:09:53.990945: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.990947: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-16 23:09:53.990950: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.990993: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-16 23:09:53.990969: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 23:09:53.991212: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.991214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.991054: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 23:09:53.991212: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.991211: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.991216: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.991067: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.991217: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.991220: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 23:09:53.991219: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-16 23:09:53.991071: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-16 23:09:53.991089: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990440: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990455: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990502: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990504: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990511: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990504: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990504: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990518: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990522: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 23:09:53.990531: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990531: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990532: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990535: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990538: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 23:09:53.990539: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +7: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: +3: +3: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: +2: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +1: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: +6: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +3: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +1: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +4: Loading extension module utils... +1: Loading extension module utils... +6: Loading extension module utils... +4: Loading extension module utils... +6: Loading extension module utils... +4: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +3: +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils...Loading extension module utils... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +4: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: +5: Loading extension module utils...Loading extension module utils... +5: +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils...Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +7: +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils...Loading extension module utils... +7: +7: Loading extension module utils... +7: Loading extension module utils... +7: +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/221m60b400m/3326636.out b/221m60b400m/3326636.out new file mode 100644 index 0000000000000000000000000000000000000000..e92fa74164e2c284f8ba8ccc06536aa3857f5926 --- /dev/null +++ b/221m60b400m/3326636.out @@ -0,0 +1,6435 @@ +Model parameters: d_model 896 ffw_size 3584 kv_size 64 n_heads 14 n_layers 18 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 18 --hidden-size 896 --num-attention-heads 14 --kv-channels 64 --ffn-hidden-size 3584 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-221m60b400mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --override-lr-scheduler --reset-progress --no-load-optim --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --tensorboard-dir tensorboard_221m60b400mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_221m60b400m --load checkpoints_221m60b400m --train-weighted-split-paths-path train400m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3326636.json --zero-stage 0 +START 3326636: Thu 16 Mar 2023 11:08:16 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 46.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 42.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 41.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 47.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 39.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 44.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 45.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 36.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 46.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 49.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 42.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 48.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 49.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 41.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 40.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +1: Launching on nid006940 (1/8), master nid006939 port 9999, GPUs 8, CUDA: True +0: Launching on nid006939 (0/8), master nid006939 port 9999, GPUs 8, CUDA: True +5: Launching on nid006944 (5/8), master nid006939 port 9999, GPUs 8, CUDA: True +7: Launching on nid006946 (7/8), master nid006939 port 9999, GPUs 8, CUDA: True +6: Launching on nid006945 (6/8), master nid006939 port 9999, GPUs 8, CUDA: True +4: Launching on nid006943 (4/8), master nid006939 port 9999, GPUs 8, CUDA: True +2: Launching on nid006941 (2/8), master nid006939 port 9999, GPUs 8, CUDA: True +3: Launching on nid006942 (3/8), master nid006939 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3326636.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... None +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 3584 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 896 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-221m60b400mval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_221m60b400m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 14 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 18 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_221m60b400m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_221m60b400mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 23:11:00,581] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +7: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.109 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: [1/1] c++ scaled_masked_softmax_hip.o scaled_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 27.734 seconds +0: time to initialize megatron (seconds): 3.106 +0: [after megatron is initialized] datetime: 2023-03-16 23:11:31 +0: building GPT model ... +0: [2023-03-16 23:11:31,248] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 23:11:31,249] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 23:11:31,249] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.99 GB, percent = 6.8% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-16 23:11:33,249] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=25 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: undo +0: 22: MixedFusedLayerNorm +0: 23: EmbeddingPipe +0: 24: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 23:11:33,562] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 23:11:33,563] [INFO] [utils.py:828:see_memory_usage] MA 0.42 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-16 23:11:33,563] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.01 GB, percent = 6.8% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 23:11:33,565] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 23:11:46,958] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 23:11:46,959] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 23:11:46,959] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 23:11:46,974] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 23:11:46,974] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 23:11:47,097] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 23:11:47,098] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-16 23:11:47,098] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.69 GB, percent = 6.9% +0: ninja: no work to do. +0: Time to load utils op: 0.10437774658203125 seconds +0: Time to load utils op: 0.20695066452026367 seconds +0: Time to load utils op: 0.19944548606872559 seconds +0: Time to load utils op: 0.20377635955810547 seconds +0: Time to load utils op: 0.20337414741516113 seconds +0: Time to load utils op: 0.20380663871765137 seconds +0: Time to load utils op: 0.20372509956359863 seconds +0: Time to load utils op: 0.20330047607421875 seconds +5: Time to load utils op: 0.21251988410949707 seconds +5: Time to load utils op: 0.21260738372802734 seconds +5: Time to load utils op: 0.21234130859375 secondsTime to load utils op: 0.2125687599182129 seconds +5: +5: Time to load utils op: 0.21256041526794434 seconds +5: Time to load utils op: 0.21272897720336914 secondsTime to load utils op: 0.21256184577941895 seconds +5: +5: Time to load utils op: 0.2126140594482422 seconds +3: Time to load utils op: 0.21471667289733887 seconds +7: Time to load utils op: 0.21426796913146973 seconds +7: Time to load utils op: 0.21423912048339844 seconds +7: Time to load utils op: 0.21614384651184082 seconds +7: Time to load utils op: 0.2140364646911621 seconds +3: Time to load utils op: 0.21473431587219238 seconds +7: Time to load utils op: 0.21438908576965332 secondsTime to load utils op: 0.21398377418518066 secondsTime to load utils op: 0.21529102325439453 seconds +7: +7: +3: Time to load utils op: 0.21476435661315918 secondsTime to load utils op: 0.21476268768310547 seconds +3: +3: Time to load utils op: 0.2147672176361084 seconds +7: Time to load utils op: 0.2138979434967041 seconds +3: Time to load utils op: 0.21478772163391113 secondsTime to load utils op: 0.21477055549621582 seconds +3: +3: Time to load utils op: 0.21478605270385742 seconds +2: Time to load utils op: 0.21573877334594727 seconds +2: Time to load utils op: 0.21574664115905762 seconds +2: Time to load utils op: 0.21576285362243652 seconds +2: Time to load utils op: 0.2156219482421875 secondsTime to load utils op: 0.21579337120056152 seconds +2: +2: Time to load utils op: 0.2158041000366211 seconds +2: Time to load utils op: 0.21582531929016113 secondsTime to load utils op: 0.2158033847808838 seconds +2: +1: Time to load utils op: 0.2170250415802002 secondsTime to load utils op: 0.21697545051574707 seconds +1: +1: Time to load utils op: 0.2170426845550537 seconds +1: Time to load utils op: 0.21704673767089844 seconds +1: Time to load utils op: 0.2170698642730713 seconds +1: Time to load utils op: 0.2170555591583252 secondsTime to load utils op: 0.2170701026916504 seconds +1: +1: Time to load utils op: 0.21707487106323242 seconds +4: Time to load utils op: 0.2169337272644043 secondsTime to load utils op: 0.21693706512451172 seconds +4: +4: Time to load utils op: 0.21694469451904297 secondsTime to load utils op: 0.21693873405456543 secondsTime to load utils op: 0.21694445610046387 seconds +4: +4: Time to load utils op: 0.2169477939605713 seconds +4: +4: Time to load utils op: 0.21696186065673828 secondsTime to load utils op: 0.21695828437805176 seconds +4: +6: Time to load utils op: 0.21587204933166504 seconds +6: Time to load utils op: 0.21588516235351562 seconds +6: Time to load utils op: 0.21591401100158691 seconds +6: Time to load utils op: 0.21593046188354492 seconds +6: Time to load utils op: 0.21593546867370605 secondsTime to load utils op: 0.21593594551086426 seconds +6: +6: Time to load utils op: 0.21595239639282227 secondsTime to load utils op: 0.21594953536987305 seconds +6: +0: Time to load utils op: 0.0005793571472167969 seconds +0: Time to load utils op: 0.0005526542663574219 seconds +0: Time to load utils op: 0.0005867481231689453 seconds +0: Time to load utils op: 0.00048542022705078125 seconds +0: Time to load utils op: 0.0004334449768066406 seconds +0: Time to load utils op: 0.0006213188171386719 seconds +0: Time to load utils op: 0.000514984130859375 seconds +3: Time to load utils op: 0.0009109973907470703 seconds +3: Time to load utils op: 0.0009741783142089844 seconds +3: Time to load utils op: 0.0011031627655029297 seconds +3: Time to load utils op: 0.0010974407196044922 seconds +3: Time to load utils op: 0.0010766983032226562 seconds +3: Time to load utils op: 0.0010564327239990234 seconds +3: Time to load utils op: 0.0011200904846191406 seconds +3: Time to load utils op: 0.0011425018310546875 seconds +4: Time to load utils op: 0.00077056884765625 seconds +4: Time to load utils op: 0.0005548000335693359 seconds +4: Time to load utils op: 0.0005183219909667969 seconds +4: Time to load utils op: 0.0005269050598144531 seconds +4: Time to load utils op: 0.0005211830139160156 secondsTime to load utils op: 0.0005002021789550781 seconds +4: +4: Time to load utils op: 0.0005133152008056641 seconds +4: Time to load utils op: 0.0005145072937011719 seconds +5: Time to load utils op: 0.0006632804870605469 seconds +5: Time to load utils op: 0.0009334087371826172 seconds +5: Time to load utils op: 0.000978231430053711 seconds +5: Time to load utils op: 0.0012238025665283203 seconds +5: Time to load utils op: 0.0012671947479248047 secondsTime to load utils op: 0.0011963844299316406 seconds +5: +5: Time to load utils op: 0.001070261001586914 seconds +5: Time to load utils op: 0.0013055801391601562 seconds +1: Time to load utils op: 0.0010101795196533203 seconds +1: Time to load utils op: 0.0012102127075195312 seconds +1: Time to load utils op: 0.0013539791107177734 secondsTime to load utils op: 0.001369476318359375 seconds +1: +1: Time to load utils op: 0.0013539791107177734 seconds +1: Time to load utils op: 0.0013768672943115234 secondsTime to load utils op: 0.0013554096221923828 seconds +1: +1: Time to load utils op: 0.0014202594757080078 seconds +2: Time to load utils op: 0.0012111663818359375 seconds +7: Time to load utils op: 0.00044345855712890625 seconds +2: Time to load utils op: 0.0014030933380126953 seconds +7: Time to load utils op: 0.00043845176696777344 seconds +7: Time to load utils op: 0.0003917217254638672 seconds +7: Time to load utils op: 0.00046515464782714844 secondsTime to load utils op: 0.0005207061767578125 secondsTime to load utils op: 0.0005207061767578125 secondsTime to load utils op: 0.0005252361297607422 seconds +7: +7: +7: +2: Time to load utils op: 0.0015230178833007812 seconds +2: Time to load utils op: 0.0015320777893066406 seconds +2: Time to load utils op: 0.0014891624450683594 seconds +2: Time to load utils op: 0.0015683174133300781 seconds +2: Time to load utils op: 0.001481771469116211 seconds +2: Time to load utils op: 0.0015902519226074219 seconds +6: Time to load utils op: 0.0005409717559814453 seconds +6: Time to load utils op: 0.0007066726684570312 seconds +6: Time to load utils op: 0.0010154247283935547 seconds +6: Time to load utils op: 0.0007796287536621094 seconds +6: Time to load utils op: 0.0009081363677978516 seconds +6: Time to load utils op: 0.0009043216705322266 secondsTime to load utils op: 0.0009772777557373047 seconds +6: +6: Time to load utils op: 0.000946044921875 seconds +7: Time to load utils op: 0.0003955364227294922 seconds +0: [2023-03-16 23:11:47,330] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 23:11:47,331] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.41 GB CA 0.46 GB Max_CA 0 GB +0: [2023-03-16 23:11:47,331] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.84 GB, percent = 6.9% +0: [2023-03-16 23:11:47,457] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 23:11:47,458] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB +0: [2023-03-16 23:11:47,458] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.84 GB, percent = 6.9% +0: [2023-03-16 23:11:47,564] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 23:11:47,565] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB +0: [2023-03-16 23:11:47,565] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.84 GB, percent = 6.9% +0: [2023-03-16 23:11:47,673] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 23:11:47,673] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:47,674] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.84 GB, percent = 6.9% +0: [2023-03-16 23:11:47,778] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 23:11:47,778] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:47,778] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.84 GB, percent = 6.9% +0: [2023-03-16 23:11:47,885] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 23:11:47,885] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:47,886] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.84 GB, percent = 6.9% +0: [2023-03-16 23:11:47,989] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 23:11:47,990] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:47,990] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.84 GB, percent = 6.9% +0: [2023-03-16 23:11:48,099] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 23:11:48,100] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:48,100] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.84 GB, percent = 6.9% +0: [2023-03-16 23:11:48,203] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 23:11:48,204] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB +0: [2023-03-16 23:11:48,204] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 34.84 GB, percent = 6.9% +0: [2023-03-16 23:11:48,204] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 23:11:48,204] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 23:11:48,204] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 23:11:48,204] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 23:11:48,205] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 23:11:48,205] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 23:11:48,205] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 23:11:48,205] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 23:11:48,205] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 23:11:48,205] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +6: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 23:11:48,206] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 23:11:48,207] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 23:11:48,207] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.00040435791015625 seconds +0: [2023-03-16 23:11:48,208] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-16 23:11:48,230] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=25 [0, 25) STAGE_PARAMS=220527104 (220.527M) TOTAL_PARAMS=220527104 (220.527M) UNIQUE_PARAMS=220527104 (220.527M) +0: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +7: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +7: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +4: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +2: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +4: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +2: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +5: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt... +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/mp_rank_00_model_states.pt. +0: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +3: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +4: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +1: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +2: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +7: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +0: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +5: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt... +6: [2023-03-16 23:11:48,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +3: [2023-03-16 23:11:48,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +4: [2023-03-16 23:11:48,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +5: [2023-03-16 23:11:48,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +2: [2023-03-16 23:11:48,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +1: [2023-03-16 23:11:48,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +6: [2023-03-16 23:11:48,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +7: [2023-03-16 23:11:48,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_01-model_00-model_states.pt. +0: [2023-03-16 23:11:48,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +1: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +0: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +5: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +2: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt... +7: [2023-03-16 23:11:48,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +1: [2023-03-16 23:11:48,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +0: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +5: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +4: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +3: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +7: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +6: [2023-03-16 23:11:48,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_03-model_00-model_states.pt. +2: [2023-03-16 23:11:48,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +3: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +5: [2023-03-16 23:11:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +7: [2023-03-16 23:11:48,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +4: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +6: [2023-03-16 23:11:48,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt... +0: [2023-03-16 23:11:48,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +7: [2023-03-16 23:11:48,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +3: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +4: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +5: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +0: [2023-03-16 23:11:48,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +1: [2023-03-16 23:11:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +6: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_04-model_00-model_states.pt. +2: [2023-03-16 23:11:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +4: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +3: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +0: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +6: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +7: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +2: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt... +1: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +2: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +0: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +1: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +4: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +6: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +5: [2023-03-16 23:11:48,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +7: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_05-model_00-model_states.pt. +3: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +3: [2023-03-16 23:11:48,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +0: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +7: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +5: [2023-03-16 23:11:48,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt... +6: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +7: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:48,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +4: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +6: [2023-03-16 23:11:48,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +1: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:48,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +2: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +3: [2023-03-16 23:11:48,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:48,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:48,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +0: [2023-03-16 23:11:48,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:48,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:48,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:48,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:48,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:48,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:48,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_06-model_00-model_states.pt. +5: [2023-03-16 23:11:48,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:48,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:48,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:48,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:48,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:48,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:49,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:49,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +1: [2023-03-16 23:11:49,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +7: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +6: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +2: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +0: [2023-03-16 23:11:49,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt... +3: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +6: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +7: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +3: [2023-03-16 23:11:49,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +1: [2023-03-16 23:11:49,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +5: [2023-03-16 23:11:49,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +2: [2023-03-16 23:11:49,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +0: [2023-03-16 23:11:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_07-model_00-model_states.pt. +4: [2023-03-16 23:11:49,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +0: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +1: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +5: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +7: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +2: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +7: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +4: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +3: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt... +3: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +6: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +5: [2023-03-16 23:11:49,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +0: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +1: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +2: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_08-model_00-model_states.pt. +4: [2023-03-16 23:11:49,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +7: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +7: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +1: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +2: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +6: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +0: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +5: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +4: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt... +3: [2023-03-16 23:11:49,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +6: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +5: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +1: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +2: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +4: [2023-03-16 23:11:49,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +3: [2023-03-16 23:11:49,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_09-model_00-model_states.pt. +0: [2023-03-16 23:11:49,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +0: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +2: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +4: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +7: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +6: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt... +3: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +6: [2023-03-16 23:11:49,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +4: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +3: [2023-03-16 23:11:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +2: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +1: [2023-03-16 23:11:49,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +7: [2023-03-16 23:11:49,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +5: [2023-03-16 23:11:49,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_10-model_00-model_states.pt. +0: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +0: [2023-03-16 23:11:49,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +2: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +6: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +4: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt... +3: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +6: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +3: [2023-03-16 23:11:49,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +7: [2023-03-16 23:11:49,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +2: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +4: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +5: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +1: [2023-03-16 23:11:49,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_11-model_00-model_states.pt. +0: [2023-03-16 23:11:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +7: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +1: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +5: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +3: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +6: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +2: [2023-03-16 23:11:49,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt... +0: [2023-03-16 23:11:49,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +0: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +3: [2023-03-16 23:11:49,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +6: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +2: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +5: [2023-03-16 23:11:49,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +4: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +1: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_12-model_00-model_states.pt. +7: [2023-03-16 23:11:49,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +2: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +5: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +3: [2023-03-16 23:11:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +6: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +7: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +0: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt... +1: [2023-03-16 23:11:49,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +4: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +0: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +5: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +6: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +7: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +3: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +2: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_13-model_00-model_states.pt. +1: [2023-03-16 23:11:49,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +3: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +2: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +6: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +0: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +4: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +1: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +5: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt... +7: [2023-03-16 23:11:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +2: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +4: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +6: [2023-03-16 23:11:49,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +1: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +5: [2023-03-16 23:11:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +0: [2023-03-16 23:11:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +7: [2023-03-16 23:11:49,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_14-model_00-model_states.pt. +3: [2023-03-16 23:11:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +4: [2023-03-16 23:11:49,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +2: [2023-03-16 23:11:49,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +6: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +5: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +3: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +1: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt... +7: [2023-03-16 23:11:49,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +1: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +2: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +6: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +5: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +7: [2023-03-16 23:11:49,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +0: [2023-03-16 23:11:49,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +3: [2023-03-16 23:11:49,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_15-model_00-model_states.pt. +4: [2023-03-16 23:11:49,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +1: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +5: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +0: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +3: [2023-03-16 23:11:49,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +6: [2023-03-16 23:11:49,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt... +7: [2023-03-16 23:11:49,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +1: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +2: [2023-03-16 23:11:49,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:49,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:49,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +3: [2023-03-16 23:11:49,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:49,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:49,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:49,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +4: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +5: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +7: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +6: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:49,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_16-model_00-model_states.pt. +0: [2023-03-16 23:11:49,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:49,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:49,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:49,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:49,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:49,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:49,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:49,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:49,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:49,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:49,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:49,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:49,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:49,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:50,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +7: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +1: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +3: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +5: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +7: [2023-03-16 23:11:50,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +5: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +1: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +6: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +0: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +2: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +3: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +0: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +2: [2023-03-16 23:11:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +6: [2023-03-16 23:11:50,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt... +4: [2023-03-16 23:11:50,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_17-model_00-model_states.pt. +4: [2023-03-16 23:11:50,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +3: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +1: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +5: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +2: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +6: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt... +7: [2023-03-16 23:11:50,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +2: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +5: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +6: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +1: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +7: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +4: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +3: [2023-03-16 23:11:50,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_18-model_00-model_states.pt. +0: [2023-03-16 23:11:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +2: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +7: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +0: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +1: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +4: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt... +3: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +3: [2023-03-16 23:11:50,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +7: [2023-03-16 23:11:50,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +0: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +5: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +4: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +1: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +6: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_19-model_00-model_states.pt. +2: [2023-03-16 23:11:50,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +1: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +4: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +0: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +3: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +6: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +2: [2023-03-16 23:11:50,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt... +7: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +7: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +0: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +4: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +3: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +3: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +6: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +5: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +5: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +5: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +1: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +1: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: > overriding learning rate value to 0.0002 +0: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: > overriding minimum learning rate value to 2e-05 +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +0: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +4: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +0: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +6: [2023-03-16 23:11:50,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +6: [2023-03-16 23:11:50,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +7: [2023-03-16 23:11:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:11:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:11:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:11:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:11:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:11:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:11:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:11:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:11:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:11:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:11:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:11:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:11:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:11:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:11:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-16 23:11:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-16 23:11:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:11:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:11:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:11:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:11:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:11:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:11:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:11:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-16 23:11:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:11:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:11:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:11:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:11:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:11:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:11:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:11:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-16 23:11:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:11:50,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_20-model_00-model_states.pt. +2: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt... +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/layer_22-model_00-model_states.pt. +2: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:11:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-16 23:11:50,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +7: [2023-03-16 23:11:50,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:11:50,501] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +7: [2023-03-16 23:11:50,503] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +0: [2023-03-16 23:11:50,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:11:50,524] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +0: [2023-03-16 23:11:50,526] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +5: [2023-03-16 23:11:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:11:50,526] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +5: [2023-03-16 23:11:50,528] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +3: [2023-03-16 23:11:50,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:11:50,531] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +3: [2023-03-16 23:11:50,533] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +6: [2023-03-16 23:11:50,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:11:50,534] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +2: [2023-03-16 23:11:50,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:11:50,533] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +2: [2023-03-16 23:11:50,534] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +6: [2023-03-16 23:11:50,536] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +4: [2023-03-16 23:11:50,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:11:50,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:11:50,538] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +4: [2023-03-16 23:11:50,538] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +7: [2023-03-16 23:11:50,540] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +4: [2023-03-16 23:11:50,540] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +1: [2023-03-16 23:11:50,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:11:50,542] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +1: [2023-03-16 23:11:50,544] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +0: [2023-03-16 23:11:50,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:11:50,561] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +0: [2023-03-16 23:11:50,563] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +7: [2023-03-16 23:11:50,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:11:50,579] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +5: [2023-03-16 23:11:50,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:11:50,579] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +7: [2023-03-16 23:11:50,581] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +5: [2023-03-16 23:11:50,581] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +2: [2023-03-16 23:11:50,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:11:50,585] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +2: [2023-03-16 23:11:50,587] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +3: [2023-03-16 23:11:50,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:11:50,587] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +3: [2023-03-16 23:11:50,589] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +0: [2023-03-16 23:11:50,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:11:50,595] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +0: [2023-03-16 23:11:50,597] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +3: [2023-03-16 23:11:50,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:11:50,607] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +0: [2023-03-16 23:11:50,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:11:50,609] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +3: [2023-03-16 23:11:50,609] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +0: [2023-03-16 23:11:50,611] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +1: [2023-03-16 23:11:50,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:11:50,613] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +4: [2023-03-16 23:11:50,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:11:50,615] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +1: [2023-03-16 23:11:50,615] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +7: [2023-03-16 23:11:50,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:11:50,617] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +7: [2023-03-16 23:11:50,617] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +7: [2023-03-16 23:11:50,619] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +1: [2023-03-16 23:11:50,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:11:50,620] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +2: [2023-03-16 23:11:50,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:11:50,621] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +1: [2023-03-16 23:11:50,622] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +2: [2023-03-16 23:11:50,623] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +6: [2023-03-16 23:11:50,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:11:50,626] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +5: [2023-03-16 23:11:50,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:11:50,626] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +6: [2023-03-16 23:11:50,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:11:50,628] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +6: [2023-03-16 23:11:50,628] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +5: [2023-03-16 23:11:50,628] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +5: [2023-03-16 23:11:50,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:11:50,628] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +2: [2023-03-16 23:11:50,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:11:50,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:11:50,626] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +4: [2023-03-16 23:11:50,628] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +2: [2023-03-16 23:11:50,627] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +2: [2023-03-16 23:11:50,629] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +6: [2023-03-16 23:11:50,630] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +5: [2023-03-16 23:11:50,630] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +3: [2023-03-16 23:11:50,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:11:50,631] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +3: [2023-03-16 23:11:50,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:11:50,633] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +3: [2023-03-16 23:11:50,633] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +0: [2023-03-16 23:11:50,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:11:50,635] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +3: [2023-03-16 23:11:50,635] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +0: [2023-03-16 23:11:50,637] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +3: [2023-03-16 23:11:50,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:11:50,643] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +3: [2023-03-16 23:11:50,645] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +7: [2023-03-16 23:11:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:11:50,655] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +4: [2023-03-16 23:11:50,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:11:50,656] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +0: [2023-03-16 23:11:50,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:11:50,657] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +7: [2023-03-16 23:11:50,657] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +0: [2023-03-16 23:11:50,659] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +4: [2023-03-16 23:11:50,659] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +7: [2023-03-16 23:11:50,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:11:50,660] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +7: [2023-03-16 23:11:50,662] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +1: [2023-03-16 23:11:50,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:11:50,663] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +1: [2023-03-16 23:11:50,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:11:50,665] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +1: [2023-03-16 23:11:50,665] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +1: [2023-03-16 23:11:50,667] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +4: [2023-03-16 23:11:50,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:11:50,667] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +5: [2023-03-16 23:11:50,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:11:50,667] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +1: [2023-03-16 23:11:50,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:11:50,668] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +4: [2023-03-16 23:11:50,669] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +5: [2023-03-16 23:11:50,669] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +1: [2023-03-16 23:11:50,670] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +6: [2023-03-16 23:11:50,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:11:50,671] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +6: [2023-03-16 23:11:50,673] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +5: [2023-03-16 23:11:50,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:11:50,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:11:50,674] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +6: [2023-03-16 23:11:50,674] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +2: [2023-03-16 23:11:50,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:11:50,675] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +2: [2023-03-16 23:11:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:11:50,675] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +2: [2023-03-16 23:11:50,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:11:50,676] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +6: [2023-03-16 23:11:50,676] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +5: [2023-03-16 23:11:50,676] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +6: [2023-03-16 23:11:50,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:11:50,676] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +2: [2023-03-16 23:11:50,677] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +2: [2023-03-16 23:11:50,677] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +5: [2023-03-16 23:11:50,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:11:50,678] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +2: [2023-03-16 23:11:50,678] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +6: [2023-03-16 23:11:50,678] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +5: [2023-03-16 23:11:50,680] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +3: [2023-03-16 23:11:50,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:11:50,680] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +3: [2023-03-16 23:11:50,682] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +7: [2023-03-16 23:11:50,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:11:50,684] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +7: [2023-03-16 23:11:50,686] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +3: [2023-03-16 23:11:50,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-16 23:11:50,687] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +3: [2023-03-16 23:11:50,689] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +7: [2023-03-16 23:11:50,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-16 23:11:50,691] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +7: [2023-03-16 23:11:50,693] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +4: [2023-03-16 23:11:50,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:11:50,700] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +4: [2023-03-16 23:11:50,702] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +4: [2023-03-16 23:11:50,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:11:50,703] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +4: [2023-03-16 23:11:50,705] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +1: [2023-03-16 23:11:50,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:11:50,708] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +1: [2023-03-16 23:11:50,710] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +6: [2023-03-16 23:11:50,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:11:50,713] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +6: [2023-03-16 23:11:50,714] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +5: [2023-03-16 23:11:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +1: [2023-03-16 23:11:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +5: [2023-03-16 23:11:50,715] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +1: [2023-03-16 23:11:50,715] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +5: [2023-03-16 23:11:50,717] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +1: [2023-03-16 23:11:50,717] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +2: [2023-03-16 23:11:50,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-16 23:11:50,717] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +6: [2023-03-16 23:11:50,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-16 23:11:50,719] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +2: [2023-03-16 23:11:50,719] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +6: [2023-03-16 23:11:50,721] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +4: [2023-03-16 23:11:50,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-16 23:11:50,750] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +4: [2023-03-16 23:11:50,752] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +0: [2023-03-16 23:11:50,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:11:50,763] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +0: [2023-03-16 23:11:50,765] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +0: [2023-03-16 23:11:50,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_221m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-16 23:11:50,825] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +0: [2023-03-16 23:11:50,827] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +0: successfully loaded checkpoint from checkpoints_221m60b400m at iteration 0 +7: time (ms) | load-checkpoint: 2603.17 +0: estimated model parameters: 0.220527104 +0: estimated model parameters without embeddings: 0.173619712 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 23:11:51 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.008910 seconds +0: number of documents: 835726 +0: > dataset split: +0: train: +0: document indices in [0, 835726) total of 835726 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.048 seconds +0: total number of samples: 195101 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.037479 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.059 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-16 23:12:04 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 19946.95 | train/valid/test-data-iterators-setup: 13571.05 +0: [after training is done] datetime: 2023-03-16 23:12:04 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.258704E+00 | lm loss PPL: 2.601581E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3326636: Thu 16 Mar 2023 11:12:28 PM EET diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fee2bc3f8c3043abacfcc993a9119495607affa5 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:46a59fd5dc3925588e65f2eea21b7dd4691589718704db0a3964dab32c794eea +size 41353495 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3af69b5f56df64084eb645337f452922e30730e2 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8690e8d7ec16f7e886fd35be104ff1fa1e597c2a410d3820bea1ba74e4ce692d +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a505cbda9550fd7aa705433f791a3fda6be5383f --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9963cffead1fb92aaca16edd9b14619ddeb24553e8f1bb2f7eac4a5d66f9822 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4ad90fd01d99700e53a77d72e8a1c1f43826665d --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:16b12475d1d804141555126e021b5151eb3a8fc28e01b4929187ad0fc77c384e +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2c32154459ed9433b86dec9cbc4a0819db61221b --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb980cb7dc39ae210cc570b537bf90a2afff22ea9ed66bcd80896315291068e2 +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b668923172db351492f75a3e0b3e7b1c368a5d8c --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe3abf78588945a9d4f8af49d3faa5a1a973f11fb1a0eeeda6309edda9942f15 +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8fe27469110c099ffd7406e5472ff4bd0412518f --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1dc821a2b52c2d7e341a8bd4c7683dfc47f4ad66ed3a6e098c3a6c4b2bd84866 +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f9b3e2f1b5103689159bc787a544ecfdab964c11 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a61cd3e949cb9a11b3609713f012f7eedff73e62d85cd177fef9c9f91b4ea6fd +size 41353442 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..72465dc208650240a0f5d138debeef1f311b1ea0 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72b080d9dedd197e16764abfdd91f654d62683436fc1b3dd8ce6f81af8815aca +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1dac023c26f95530115049220a2a461ef2c6ad5f --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cff9d996b7e962792db8458bf481f0bdf091a6beb66eedfda847cc4626228b21 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0b1ae53a37352d9d994e02c23676f6f8313eb635 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c026656df29b4f3615d7de0fe9074d76b883e4bdf26454304f6175a1697c8e26 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6564390529380a7444422be754ec8b7b3815d75f --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:95a16463864affd24fecd4b7abb3f68d223a26725248226ff42379f66a323486 +size 41353559 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4bcfe53b6e95a14ab97d3cd1cfbd76f8c489e204 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a178bff87dd8f3e7fa159ea7cd8457dcf43f63735d0d36aac4830c4037a6203f +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5cca37184c77c0e186725ab1ae1a8997c05dfcab --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a0229c10bfa2bfd54ae1037347060b854d3e7607fc6783f1d693560dc02be5a +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..970ceaa346bee5455953ec9c234a7743742d6454 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ad3a32691dbd033b333ec1c9409998dcd7174fd11b50b53f328ac558b6225ea +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1abdf0af273b896bacff2e9f87085bdb0d12a596 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc16c92c43adc0542b8aa7044bd3fe35c0a8b237efab45b70aaf3f31ba393b33 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8c12e29c949fe81501badc37711eafd072a4a81a --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26aaab14263aa7f8a1627d196c92819e5053a19325b73ad5e11deef4c7d7ac6e +size 41353698 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b08577ae0f6b05ccf4b7db15dfd2b8a4dc334299 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7a96a0bfcec22f16ecfa36535e6fbb4728c2903984f7205e0cbea5f12feda5a0 +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f65102865861e2dac5cb500ee497e768113ada52 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2ecb124d50481b13e91f08a263dba2d7bd3a3568bad1d4c9b3c14d97a674ca7 +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..da7e927c368ece742a3499e1e96e22954f2b9b92 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05e0368c6f98c5d9662550b6d55271a009a15ad2c9a887e5cb312eda3d0e1e9f +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1199619c78eb2bb3452041e60f858680e1d5a8c2 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5b4b64d80fc12aed7c7ed4a61134277e6d2ebd5ab62d989f7613a622e705a077 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0a9504c98398569ad35d1738839f6581469be4c0 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb3b3090072ea227eebf8c5d09900a5ae0b695a7381fff56ff462240020f43b4 +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b266253781108a807f1d73e7c71bdccf47dfcff --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14e4b9dc6f3c1bc2f2720d63d4d926951660d171e6ccea46a3bc07281259037d +size 41353495 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fdbc19b76af3fc291a1028ab71f6cbd7cd075dbf --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:673f907568b9d3a3b2f2c26ff9dc31700e78996beaa4fe174b7009e54fb41806 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8217dcb52276acf4d3422b937d05bb5a1fd189d2 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:631e9a4c2777ac8fb4a04c57db8908da1bd9bc968b0406e0d06c89d325355fab +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..315dc1f8c3d5272fc3381df7c4ac3abc1e6cd481 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:da5abde2d85adfe89e08ce94a0869eda54af6eb802dbb94265d95f0a36aa4bca +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f3c55f97e4a6b89c6810bc8dcb3199b3884d5d69 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca3ab7d22cde60f9b70645e800830fd6f425831f1ecf609e923ce66f8cee105f +size 41353698 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3cbf59f45a854cbd5279fe26be064dd5c3d5c793 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:20c7db80c7557d7383bd618633c5aa0250c648bf6ea728e2193c88d95abfe66c +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..586733ba468d4cad20a4adb7115cfdb4e4b3d424 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b73aecd8afcb2711768e9756f565bd190f17ac4d4d41bbbd869078d3488c34b +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f889a1f90846ac3ff50a0c769fcbc2468ae7383 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02df6c5cb5c9c10e3b63769afb96c8ddf89fa27d9299985ef721550d52446fde +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..523066e88bfaa3c761d04725140b08ad24c9fbdc --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d34394470964e5f772a47089925f15b2d6f6547bed3d5d51034f9712eb13287a +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..337aa4fb415c5e19a9df9fb6b01b0094e21f910b --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a036abcf1c6554e744c38adc397936ce704d6b318154661c9a0039acf25092e3 +size 41353762 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f2bf9f83e31817f460c1c0915f240552cbd8100 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d7b5a683a21e1ea0d0e2406d46415ac20a8f45d3a66bbbc22f7efb80ee5e360 +size 41353442 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ea38e424dd2aa039b036c3cd9fe70d7bddde4013 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00ac096c9818f2d35ea8bcae88ab4fe1efd5f3c06a45100e7e8096fe0e0d4386 +size 41353495 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1439fc3b3328f4c9bd7919930449b9d3d5e0ca7a --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8376da1438205841d3b343a34997bd6a5bf4948ceb6272bd6e4067d1b7127124 +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..20b9f8541192fbeb895dbc69885fcbc90c627f16 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ebf4681af4100bd4fc53fee1067781e1fb86ec0ba5b1a8669f742bf417870219 +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8f137cd52aa4d1916ee80133dee66321c2af0f0f --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:951c7ac6b50cf5f64793d1bb4558366b885ffd1f2ec0fb41ecdef6a8b24f4ee7 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..13c6800e3db80d9887743fea66be851cd9be4701 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80f7b5150716b1738e39e2e23179b0565c77f63a3c2f3680ae4b03d085c2fd10 +size 41353698 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0d5ec5c59cd0101fba2b14897e10dbfc07f861ba --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a8dcda261ce40f42c51f743f32255c60b74d8ac84de6a3f174a753d2fc38f410 +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0ac969a4ad1449eadf0639ae42f7e098280ac2ca --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5964ea3d971b793966ecf210fb7c06a3f25c0c110e282f4ab2cb28cc003009ec +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f87f7c9ccd42f4e40e5e09653a5c26b851338e5a --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21f5d92f3813158d62bc64a749db7a3e889f0b6f356873382a1d249b7edb9c87 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f92d5d96bd46e33621c84d57b714f90c6bda3ef --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7a26c018f3903317681e00455d4c3141a25d66e9bc98a35c446aa85183ebfdf3 +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6a5b0a700222c7656b21354484c7048e868bcbac --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f043088c60b176c9593343a42ea692a237416af94ce0708894ba8d2427231052 +size 41353442 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..addfb9d3a44a15ed4c1b6c578e3033788140a621 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c7755df680d6f76e0b88f6aa53659b874b3688a83643d4bff796cd519babcd5 +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b32997466ef989ce9981338d16050811d188a58 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06513256137f3c57eb63330ca0cc72cac9c2cbfee3950563b04c57d82d5f4d2a +size 41353559 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b4cdf392b0483d17dbaa6359699a7575695e7b0d --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d6491d3d2ba710e6544f0a3a02b0112f2f155a9aec5ae000b47746009deb1ec +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..992da671477279d980d62598d3a3f249e272c611 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f2bf68e8e458eb14a31e14929ee7ecb219e6d2fef2a27b97657bf433f37fe88e +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bbd538a0a26648a8e5b8e0f20534f66e41e1bc90 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:107fbd221053543dda8f8ee0793f23054035d02c1a7ef8831457cab52ca85940 +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ad8f5262ba694dd0b80c1e42268584ee5b22af54 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bda98dc50942b96feaaede7f3a5de077bd9ecc511b5da331a0f01681657a451 +size 41353442 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..556c4c07e4d71998e668df6e27e3843777cf7b8d --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06d48bcbb58fdf513905e94f88a1ad54b7bb842f4441db40bf899d3db01ffbd6 +size 41353698 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d5a6b493c98921fbb7b798046fcfc35491c7a7ed --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2bb4ac4514ff010325e448bb9c64b2e6909f2d280c0c326d9ca148cdcee1da4 +size 41353442 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4cfbbf3b6959fbb03e6b7262731bdffc64db9674 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e3271f7d7fb1229ab6cf210554ac708443386b9458508939e6d80fa32153bfe +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..34a384f67b225a0ea4b5491da11fc82907e55bf1 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c88de50c59cbf3ec3a4d46b15cc39aca28c5deffd96e5a280c909397a3978a4 +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fbd5458210d09c77a51dbdb46d5651485d469c6b --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e857a0f0f6bad5328fd6f409c75382aaeeefa87a4401c26c78fe5706ed6cc5cf +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb25790a951f8f93720c1ba7cc474c922359d6c9 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:10edec7e3d96941f31cfe4c58970fa97c3473482fb371ef5e3e1a7de834b6f00 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee4df6b2ed0bef330808714047bbd686c08d153e --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7707b06e1381029009cfd0967f8784757b44be9b9bfdcdc007d25787a2e6bf43 +size 41353559 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1649e577f09ed6a3a376f6f54863406a3bd2483 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:400a2221fe82fe13a8a13b68cd51f53467e34ff2a971da108bac208376fd2965 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b49e88ab381e2b9656da1f8c4207b458a5813718 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b764e5143db420b3f24ccb353215c81a70f03d99b132484671326fc9327a8811 +size 41353634 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b72b3c3a38910e877b01594f053096b5eb7b0b7 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a64b8f9f8378d0f764a38b2eecb31a39e93b52dfe83141323ee9bcdf09049756 +size 41353570 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8dc816b477103fa4d030c0c5ab1071e2b404181c --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0902c6330bc51bea6f77db271d95fc1e30325e8b8628954afb59effa3c2f09f +size 41353506 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d9536d3ee80f112c8720d7c48c3bf79940296639 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:52fca787a26ce013bc617b00f57a6f073edd99bc151ec269868d756e4ff60958 +size 41353495 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e2b33a15bec620f3592bdeaea583ea368af132a4 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72eede1403c4dcae55f8aa90ab21c4db927e6c395626aeb69938b97ef9dce772 +size 41353495 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7346e768c0ea727b099521ea541358320a894982 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca28a42e03e973a53918c784d9f30cfd57e3980fd46fae1963ebe6b0529ec7ed +size 41353559 diff --git a/221m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/221m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..24e2db0a11a8914dbc55871e8db075d8f8187dc4 --- /dev/null +++ b/221m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cacaf8c0e521209ae811e9518654eabe35a05867d4d28051b466e6d10fd9b65f +size 41353495 diff --git a/221m60b400m/global_step115203/layer_01-model_00-model_states.pt b/221m60b400m/global_step115203/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe11a2deefbc72a814b0e1f3acc4beb8659ee4bc --- /dev/null +++ b/221m60b400m/global_step115203/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9231dd62f514d658aae96bef4f14831bd61439b356b7f37ce10af555f527aefa +size 93816067 diff --git a/221m60b400m/global_step115203/layer_03-model_00-model_states.pt b/221m60b400m/global_step115203/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1f18fd8b78ed682e523d736cfa93459d3577fdf4 --- /dev/null +++ b/221m60b400m/global_step115203/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5b4d7927cf918f7b3dc95c3e75e15393982ae7205c97f604297c9f4a169fea02 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_04-model_00-model_states.pt b/221m60b400m/global_step115203/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..baf61bb1430c8066bbde4737ef7774f44d0fbb3d --- /dev/null +++ b/221m60b400m/global_step115203/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80769c243720aee5d30ae01b23ec7b7b0d85627e22da1a8c0feea8c12f639d96 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_05-model_00-model_states.pt b/221m60b400m/global_step115203/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..55e971b98c6b1168893f3865cb60525c8e8e24ed --- /dev/null +++ b/221m60b400m/global_step115203/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c48d024347f7bf40edff4e5ce5626c8efbffdee67f9b240f4e4e650dc77c6e0 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_06-model_00-model_states.pt b/221m60b400m/global_step115203/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..37992affc31aff948d6d2e809909822da0b1db24 --- /dev/null +++ b/221m60b400m/global_step115203/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b3a9c3b412773d777471f111b3c66c9280a8d2973a1ede5483fa6e5cbe1a302 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_07-model_00-model_states.pt b/221m60b400m/global_step115203/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aef9593fe4c5cdb824f6ebcc216c1afa134cc97d --- /dev/null +++ b/221m60b400m/global_step115203/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c4404818cf60cf8cf0c24276b1911d9590aaaaa20596ca6e387ad809658908ca +size 19295235 diff --git a/221m60b400m/global_step115203/layer_08-model_00-model_states.pt b/221m60b400m/global_step115203/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..30bebf5ecc3478e8beafba5ba25934dd5dcbad82 --- /dev/null +++ b/221m60b400m/global_step115203/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc3ddb81df544c2ce585a8b85e180e2d84eba0910d2416930f509c6980077a01 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_09-model_00-model_states.pt b/221m60b400m/global_step115203/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e895d4c8801cd48710a10a8d97ce875c44f39717 --- /dev/null +++ b/221m60b400m/global_step115203/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf9ffd2dde6bc4024c1dd752d59ae6a4f19357ac8753e37f9b9f5315cb77c2b7 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_10-model_00-model_states.pt b/221m60b400m/global_step115203/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5404a12a6f977ec9bee86322c07a3100b01bba4d --- /dev/null +++ b/221m60b400m/global_step115203/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:460a868e6577710eac642a6341b290c66466766b510b97899d0ec305cad94709 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_11-model_00-model_states.pt b/221m60b400m/global_step115203/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..23aebb692aefb38751a7ea6e8aeb0009823a3fec --- /dev/null +++ b/221m60b400m/global_step115203/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4fb00709af39686d9c16bab2a37e28a7aba8e1736a2eb3bb5d8ab052806b22e4 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_12-model_00-model_states.pt b/221m60b400m/global_step115203/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d03c5003292045e041e6991fd4662b262585d60c --- /dev/null +++ b/221m60b400m/global_step115203/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:94ac95d69471e25928b89558a80fdd5dd82833092fb5ae776bac9c990fc61748 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_13-model_00-model_states.pt b/221m60b400m/global_step115203/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..78c02b1514ee019eecdbd57d170e13d3fde730d3 --- /dev/null +++ b/221m60b400m/global_step115203/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2606c98b1d3c0603552e67a63403b841e375bbe650898f147accf8d4dbcdef84 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_14-model_00-model_states.pt b/221m60b400m/global_step115203/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4488f48db2aa6f9ebaef4d7093983fd5871b181d --- /dev/null +++ b/221m60b400m/global_step115203/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:944b649e1023b686ce0f457399e03c2918fd28eb178660ba0bbf9a53ef084e78 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_15-model_00-model_states.pt b/221m60b400m/global_step115203/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..12ee08632fbcb7cb5294e64dbe89272d15aa3ba7 --- /dev/null +++ b/221m60b400m/global_step115203/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8e2f887d34c366e801fcfe2e77d559e1d0b98c8c2f97ce3f8dd11b32bf955170 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_16-model_00-model_states.pt b/221m60b400m/global_step115203/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..53ecadd19d360b06e28849604c10e18c76b0f086 --- /dev/null +++ b/221m60b400m/global_step115203/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f7205f8e01056d5dde60f6349b7ea83b3cd6a059895e0ad9b8f3b55a649af0a5 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_17-model_00-model_states.pt b/221m60b400m/global_step115203/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b8bead07f4c9d3530c6d37777ae356fd8070467 --- /dev/null +++ b/221m60b400m/global_step115203/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b62910cd677a1e7de8556ab1341ebc2363afd5327748220d4bd23ad01b526801 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_18-model_00-model_states.pt b/221m60b400m/global_step115203/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8b3ef4ac5aced66ad6ec14de220e40a926c4d204 --- /dev/null +++ b/221m60b400m/global_step115203/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:917c624f246c6e308c2e0cbe7d22e388caa17b8e514e13cb1dcc374b10e335d0 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_19-model_00-model_states.pt b/221m60b400m/global_step115203/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..31161cd5523f611bae798878bb107ee6b365b400 --- /dev/null +++ b/221m60b400m/global_step115203/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a423350180686fe0f57249847272b8e488c285e086de1a6b098144087a3a6d9f +size 19295235 diff --git a/221m60b400m/global_step115203/layer_20-model_00-model_states.pt b/221m60b400m/global_step115203/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fbf95b2a2249156559e58225356e69391a2fef11 --- /dev/null +++ b/221m60b400m/global_step115203/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35dd8a90d1d3c674c2bb2e862a80e49124fdfaad89da708be9dd4fb31c8c10f0 +size 19295235 diff --git a/221m60b400m/global_step115203/layer_22-model_00-model_states.pt b/221m60b400m/global_step115203/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b49ad869c4977748d8c36ef3450e36d740421c4b --- /dev/null +++ b/221m60b400m/global_step115203/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d81079d9d0dcf6d523be586469a5376c84eb82c9e17cc820978cea5d6bc16305 +size 4803 diff --git a/221m60b400m/global_step115203/mp_rank_00_model_states.pt b/221m60b400m/global_step115203/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a243bd1ef473dd663f8563a945b8d34c31f73e8f --- /dev/null +++ b/221m60b400m/global_step115203/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e374f8f5d05c4f65f49d534a4862b130a7cece53ab827e08fac553a5c1fa4116 +size 37747 diff --git a/221m60b400m/sbatch_221m60b400m.sh b/221m60b400m/sbatch_221m60b400m.sh new file mode 100644 index 0000000000000000000000000000000000000000..607d5510af3385d16d3a0c171c72d4a379731a61 --- /dev/null +++ b/221m60b400m/sbatch_221m60b400m.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=221m60b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_217M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 60400000000 +# -> Samples: 29492188 +TRAIN_SAMPLES=29_492_188 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 294_922 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 10000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/221m60b400m/sbatch_221m60b400mval.sh b/221m60b400m/sbatch_221m60b400mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..3a72073f54724c39726a970190fec86fbef5b56a --- /dev/null +++ b/221m60b400m/sbatch_221m60b400mval.sh @@ -0,0 +1,167 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=221m60b400mval +VARIANT_CKPT=221m60b400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_217M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 60400000000 +# -> Samples: 29492188 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --override-lr-scheduler \ + --reset-progress \ + --no-load-optim \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678951221.nid006946.96656.0 b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678951221.nid006946.96656.0 new file mode 100644 index 0000000000000000000000000000000000000000..9d395c0fc6401e0c138b7b7a2fbf770729640b8f --- /dev/null +++ b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678951221.nid006946.96656.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5589dac49710c198d86365be093024c98fd750bf34111c200f4ace6d40fc0c7 +size 206371191 diff --git a/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678972911.nid006724.13615.0 b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678972911.nid006724.13615.0 new file mode 100644 index 0000000000000000000000000000000000000000..3149c25b2f8a378269cdc2cba425df6e8c3b0c0f --- /dev/null +++ b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678972911.nid006724.13615.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9596b3cebb70f5aa3b1490b1865d698f43516fa961b1b157eb7071a7801cec2d +size 40 diff --git a/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678973017.nid005161.125990.0 b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678973017.nid005161.125990.0 new file mode 100644 index 0000000000000000000000000000000000000000..04c60cae3b6d1ff988ddb595fe79bac36e10da87 --- /dev/null +++ b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678973017.nid005161.125990.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d4d2bcdfa7fd12e861ec188fc2ceacbe182c2dca10e8782b5f228a2a10580ab2 +size 40 diff --git a/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678973124.nid006724.17543.0 b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678973124.nid006724.17543.0 new file mode 100644 index 0000000000000000000000000000000000000000..78e8fda2d48bd0f01ab81ff1bb80c4992094dac3 --- /dev/null +++ b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678973124.nid006724.17543.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:da5992db9b9e27b5f29cd06fc6a08ec1963379fef300ff614230b6b1903766f4 +size 40 diff --git a/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678987313.nid005143.52651.0 b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678987313.nid005143.52651.0 new file mode 100644 index 0000000000000000000000000000000000000000..0eea1dec61f21a67b4c530738b9842db421ad3e3 --- /dev/null +++ b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678987313.nid005143.52651.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:395f5f4d060a6b8f19f66dc9f29ac02205fac27f813700ce624de187c41c8d13 +size 40 diff --git a/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678987419.nid006236.64278.0 b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678987419.nid006236.64278.0 new file mode 100644 index 0000000000000000000000000000000000000000..61eeddd2365b64bcb4ead86df2ad7097a6977cd9 --- /dev/null +++ b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678987419.nid006236.64278.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c72fc96408041cffeb53f5635c5e2b47accd9402519b25b3e5bd2c273a771986 +size 40 diff --git a/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678987518.nid005143.56433.0 b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678987518.nid005143.56433.0 new file mode 100644 index 0000000000000000000000000000000000000000..b61584b87ca9eafec28465c0abcd197e1c5f2933 --- /dev/null +++ b/221m60b400m/tensorboard_221m60b400m/events.out.tfevents.1678987518.nid005143.56433.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9620b44d42c4982ac8126ea1a38827e08499f792110cfef90e42f2bd7b754da7 +size 40 diff --git a/221m60b400m/tensorboard_221m60b400mval/events.out.tfevents.1679001060.nid006946.82092.0 b/221m60b400m/tensorboard_221m60b400mval/events.out.tfevents.1679001060.nid006946.82092.0 new file mode 100644 index 0000000000000000000000000000000000000000..14a9a1255de1560232bf7d9f020386302054d3ac --- /dev/null +++ b/221m60b400m/tensorboard_221m60b400mval/events.out.tfevents.1679001060.nid006946.82092.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fb639b5dd9f0ee22a127ef326826617bf0375c739bf99acb730faab409a197c8 +size 980 diff --git a/280m5b9400m/3318400.err b/280m5b9400m/3318400.err new file mode 100644 index 0000000000000000000000000000000000000000..0aa434bd84067493a697b0bedd4b8ef7d89227b4 --- /dev/null +++ b/280m5b9400m/3318400.err @@ -0,0 +1,1122 @@ +1: 2023-03-15 21:58:11.670515: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 21:58:11.670522: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 21:58:11.670524: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 21:58:11.670525: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 21:58:11.670526: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 21:58:11.670532: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 21:58:11.670532: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 21:58:11.670525: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 21:58:11.719842: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 21:58:11.719855: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 21:58:11.719856: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 21:58:11.719850: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 21:58:11.719859: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 21:58:11.719862: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 21:58:11.719849: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 21:58:11.719849: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 21:58:11.720580: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 21:58:11.720577: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 21:58:11.720587: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 21:58:11.720597: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 21:58:11.720584: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 21:58:11.720589: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 21:58:11.720581: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 21:58:11.720577: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 21:58:11.762574: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 21:58:11.762583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 21:58:11.762578: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 21:58:11.762581: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 21:58:11.762573: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 21:58:11.762573: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 21:58:11.762580: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 21:58:11.762579: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 21:58:11.782663: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 21:58:11.782668: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 21:58:11.782659: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 21:58:11.782676: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 21:58:11.782686: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 21:58:11.782683: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 21:58:11.782677: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 21:58:11.782680: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 21:58:11.783186: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 21:58:11.783197: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 21:58:11.783204: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 21:58:11.783212: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 21:58:11.783217: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 21:58:11.783203: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 21:58:11.783217: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 21:58:11.783225: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 21:58:11.820155: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 21:58:11.820160: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 21:58:11.820158: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 21:58:11.820168: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 21:58:11.820168: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 21:58:11.820171: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 21:58:11.820166: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 21:58:11.820176: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 21:58:11.850351: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 21:58:11.850345: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 21:58:11.850348: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 21:58:11.850357: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 21:58:11.850356: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 21:58:11.850350: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 21:58:11.850355: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 21:58:11.850355: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 21:58:13.284447: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:13.284450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:13.284442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:13.284453: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:13.284445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:13.284462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:13.284441: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:13.284455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:13.284837: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 21:58:13.284841: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 21:58:13.284847: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 21:58:13.284851: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 21:58:13.284847: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 21:58:13.284854: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 21:58:13.284860: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 21:58:13.284860: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 21:58:13.312417: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:13.312419: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:13.312425: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:13.312425: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:13.312431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:13.312432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:13.312432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:13.312432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:13.312840: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 21:58:13.312842: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 21:58:13.312847: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 21:58:13.312847: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 21:58:13.312850: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 21:58:13.312853: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 21:58:13.312854: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 21:58:13.312859: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 21:58:13.370280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:13.370276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:13.370283: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:13.370289: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:13.370287: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:13.370291: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:13.370294: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:13.370289: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:13.370697: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 21:58:13.370704: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 21:58:13.370707: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 21:58:13.370714: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 21:58:13.370714: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 21:58:13.370718: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 21:58:13.370721: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 21:58:13.370723: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 21:58:13.372116: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:13.372116: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:13.372123: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:13.372122: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:13.372123: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:13.372114: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:13.372124: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:13.372121: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:13.372324: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 21:58:13.372329: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 21:58:13.372331: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 21:58:13.372332: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 21:58:13.372332: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 21:58:13.372332: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 21:58:13.372335: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 21:58:13.372337: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 21:58:13.380973: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:13.380982: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:13.380988: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:13.380992: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:13.380987: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:13.380981: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:13.380989: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:13.380981: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:13.381332: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 21:58:13.381338: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 21:58:13.381341: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 21:58:13.381342: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 21:58:13.381346: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 21:58:13.381346: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 21:58:13.381345: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 21:58:13.381351: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 21:58:13.381553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:13.381553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:13.381556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:13.381569: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:13.381566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:13.381566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:13.381566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:13.381564: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:13.381931: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 21:58:13.381933: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 21:58:13.381939: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 21:58:13.381940: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 21:58:13.381940: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 21:58:13.381941: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 21:58:13.381943: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 21:58:13.381945: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 21:58:13.383209: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:13.383214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:13.383214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:13.383219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:13.383219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:13.383213: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:13.383226: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:13.383226: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:13.383594: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 21:58:13.383595: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 21:58:13.383599: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 21:58:13.383601: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 21:58:13.383602: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 21:58:13.383604: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 21:58:13.383606: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 21:58:13.383610: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 21:58:13.558675: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:13.558669: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:13.558677: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:13.558669: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:13.558674: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:13.558676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:13.558676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:13.558681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:13.559071: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 21:58:13.559072: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 21:58:13.559074: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 21:58:13.559077: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 21:58:13.559078: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 21:58:13.559079: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 21:58:13.559081: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 21:58:13.559085: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 21:58:16.571697: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.571700: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.571707: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.571710: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.571707: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.571706: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.571706: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.571717: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-15 21:58:16.571931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.571946: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.571944: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.571942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.571941: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.571939: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.571946: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.571951: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-15 21:58:16.572111: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:16.572117: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:16.572120: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: 2023-03-15 21:58:16.572186: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:16.572121: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.572183: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-15 21:58:16.572122: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.572191: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-15 21:58:16.572125: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.572196: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-15 21:58:16.572125: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.572195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: 2023-03-15 21:58:16.572127: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.572193: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.572201: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.572197: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.572877: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.572884: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.572882: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.572887: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.572892: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.572893: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: 2023-03-15 21:58:16.573042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.572894: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.572895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.573048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.573054: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.573059: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.573063: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.573065: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.573065: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.573062: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.573263: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.573273: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.573268: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.573271: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.573278: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.573275: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.573276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.573282: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:16.573990: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-15 21:58:16.574026: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.574029: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-15 21:58:16.573990: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.574031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-15 21:58:16.573993: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.574032: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.574142: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 21:58:16.574189: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-15 21:58:16.574033: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:16.573995: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-15 21:58:16.574038: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.574049: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 21:58:16.573998: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-15 21:58:16.574051: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 21:58:16.574052: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 21:58:16.574054: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.574143: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 21:58:16.574189: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: 2023-03-15 21:58:16.574054: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 21:58:16.574057: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 21:58:16.574002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.574078: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:16.574010: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 21:58:16.574146: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 21:58:16.574188: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:16.574010: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 21:58:16.574010: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 21:58:16.574011: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 21:58:16.574081: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-15 21:58:16.574013: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 21:58:16.574016: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 21:58:16.574147: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 21:58:16.574192: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 21:58:16.574092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.574156: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 21:58:16.574156: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.574203: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 21:58:16.574203: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 21:58:16.574091: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 21:58:16.574093: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.574153: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 21:58:16.574199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-15 21:58:16.574092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.574153: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 21:58:16.574202: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: 2023-03-15 21:58:16.574104: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 21:58:16.574105: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.574160: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 21:58:16.574165: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.574209: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 21:58:16.574212: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 21:58:16.574170: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 21:58:16.574172: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 21:58:16.574216: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 21:58:16.574217: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 21:58:16.574174: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 21:58:16.574242: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.574191: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 21:58:16.574189: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 21:58:16.574255: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 21:58:16.574261: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 21:58:16.574206: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 21:58:16.574268: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 21:58:16.574711: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.574715: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.574714: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.574880: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-15 21:58:16.574719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.574721: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.574724: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 21:58:16.574722: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-15 21:58:16.574883: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.574723: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-15 21:58:16.574883: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.574730: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 21:58:16.574730: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.574734: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 21:58:16.574735: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 21:58:16.574739: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 21:58:16.574884: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: 2023-03-15 21:58:16.574741: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.574751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: 2023-03-15 21:58:16.574888: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 21:58:16.574766: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.574896: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 21:58:16.574895: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 21:58:16.574892: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.574896: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.574904: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 21:58:16.574905: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 21:58:16.574907: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 21:58:16.574912: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 21:58:16.574912: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 21:58:16.574930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 21:58:16.574950: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 21:58:16.575187: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.575186: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.575193: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.575195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.575197: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.575202: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 21:58:16.575202: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 21:58:16.575199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.575205: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 21:58:16.575205: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.575212: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 21:58:16.575202: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 21:58:16.575213: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 21:58:16.575218: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 21:58:16.575221: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 21:58:16.575224: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 21:58:16.842152: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.842162: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.842159: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.842161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.842167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.842167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.842171: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.842173: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.844134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.844140: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.844142: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.844140: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.844148: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 21:58:16.844144: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.844147: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.844155: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 21:58:16.844157: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 21:58:16.844159: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 21:58:16.844162: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 21:58:16.844163: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 21:58:16.844181: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.844194: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 21:58:16.844189: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 21:58:16.844206: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +3: Building extension module utils... +3: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: +5: +5: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +3: Loading extension module utils... +1: Loading extension module utils... +3: Loading extension module utils...Loading extension module utils... +3: +1: Loading extension module utils... +1: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +5: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +2: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +6: +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils...Loading extension module utils... +6: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +7: +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils...Loading extension module utils... +1: +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils...Loading extension module utils... +5: +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +5: +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +2: +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/280m5b9400m/3318400.out b/280m5b9400m/3318400.out new file mode 100644 index 0000000000000000000000000000000000000000..28114b8c1ea9d020fbce59c6faf86358ff21e8d0 --- /dev/null +++ b/280m5b9400m/3318400.out @@ -0,0 +1,5680 @@ +Model parameters: d_model 1024 ffw_size 4096 kv_size 64 n_heads 16 n_layers 18 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 18 --hidden-size 1024 --num-attention-heads 16 --kv-channels 64 --ffn-hidden-size 4096 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 2_884_878 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-280m5b9400m --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 2_884_878 --lr-warmup-samples 28_849 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_280m5b9400m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_280m5b9400m --load checkpoints_280m5b9400m --train-weighted-split-paths-path train400m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3318400.json --zero-stage 0 +START 3318400: Wed 15 Mar 2023 09:57:52 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 54.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 45.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 41.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 47.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 36.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 40.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 45.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 46.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 44.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 47.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 43.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 50.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 46.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 42.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 50.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 44.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 45.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 39.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +7: Launching on nid006716 (7/8), master nid006709 port 9999, GPUs 8, CUDA: True +4: Launching on nid006713 (4/8), master nid006709 port 9999, GPUs 8, CUDA: True +6: Launching on nid006715 (6/8), master nid006709 port 9999, GPUs 8, CUDA: True +3: Launching on nid006712 (3/8), master nid006709 port 9999, GPUs 8, CUDA: True +0: Launching on nid006709 (0/8), master nid006709 port 9999, GPUs 8, CUDA: True +2: Launching on nid006711 (2/8), master nid006709 port 9999, GPUs 8, CUDA: True +1: Launching on nid006710 (1/8), master nid006709 port 9999, GPUs 8, CUDA: True +5: Launching on nid006714 (5/8), master nid006709 port 9999, GPUs 8, CUDA: True +7: > setting tensorboard ... +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3318400.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1000 +0: eval_iters ...................................... 1 +0: eval_only ....................................... None +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 4096 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 1024 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-280m5b9400m +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_280m5b9400m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 2884878 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 28849 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... None +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 16 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 18 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... False +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. None +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_280m5b9400m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_280m5b9400m +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 2884878 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-15 21:58:31,628] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.102 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: [1/1] c++ layer_norm_hip_kernel.cuda.o layer_norm_cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so +0: >>> done with compiling and loading fused kernels. Compilation time: 24.091 seconds +0: time to initialize megatron (seconds): 2.569 +0: [after megatron is initialized] datetime: 2023-03-15 21:58:58 +0: building GPT model ... +0: [2023-03-15 21:58:58,704] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-15 21:58:58,705] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-15 21:58:58,705] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-15 21:59:00,693] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=25 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: undo +0: 22: MixedFusedLayerNorm +0: 23: EmbeddingPipe +0: 24: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-15 21:59:00,895] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-15 21:59:00,895] [INFO] [utils.py:828:see_memory_usage] MA 0.53 GB Max_MA 0.53 GB CA 0.57 GB Max_CA 1 GB +0: [2023-03-15 21:59:00,896] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.45 GB, percent = 6.2% +0: setting training iterations to 11269 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-15 21:59:00,898] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-15 21:59:14,194] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-15 21:59:14,195] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-15 21:59:14,195] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-15 21:59:14,201] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-15 21:59:14,201] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-15 21:59:14,318] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-15 21:59:14,319] [INFO] [utils.py:828:see_memory_usage] MA 0.52 GB Max_MA 0.53 GB CA 0.57 GB Max_CA 1 GB +0: [2023-03-15 21:59:14,319] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.13 GB, percent = 6.4% +3: ninja: no work to do. +3: Time to load utils op: 0.17144203186035156 seconds +6: Time to load utils op: 0.21031498908996582 seconds +7: Time to load utils op: 0.20921635627746582 seconds +0: Time to load utils op: 0.11027336120605469 seconds +3: Time to load utils op: 0.0010654926300048828 seconds +0: Time to load utils op: 0.10276937484741211 seconds +0: Time to load utils op: 0.10281586647033691 seconds +0: Time to load utils op: 0.10351991653442383 seconds +0: Time to load utils op: 0.10327744483947754 seconds +0: Time to load utils op: 0.10352826118469238 seconds +0: Time to load utils op: 0.10354304313659668 seconds +0: Time to load utils op: 0.10361194610595703 seconds +3: Time to load utils op: 0.10242342948913574 seconds +3: Time to load utils op: 0.10216617584228516 seconds +3: Time to load utils op: 0.1015923023223877 seconds +3: Time to load utils op: 0.10169458389282227 seconds +3: Time to load utils op: 0.10215950012207031 seconds +3: Time to load utils op: 0.10169386863708496 seconds +3: Time to load utils op: 0.10167503356933594 seconds +6: Time to load utils op: 0.10226702690124512 seconds +6: Time to load utils op: 0.10262346267700195 seconds +6: Time to load utils op: 0.10271859169006348 seconds +6: Time to load utils op: 0.10250139236450195 seconds +6: Time to load utils op: 0.10264205932617188 seconds +6: Time to load utils op: 0.10289859771728516 seconds +6: Time to load utils op: 0.10301637649536133 seconds +7: Time to load utils op: 0.10228276252746582 seconds +1: Time to load utils op: 0.11133885383605957 seconds +1: Time to load utils op: 0.11049652099609375 seconds +1: Time to load utils op: 0.1105794906616211 secondsTime to load utils op: 0.11010456085205078 seconds +1: +1: Time to load utils op: 0.11027312278747559 seconds +1: Time to load utils op: 0.11065459251403809 secondsTime to load utils op: 0.11060142517089844 seconds +1: +1: Time to load utils op: 0.1106557846069336 seconds +7: Time to load utils op: 0.10219359397888184 seconds +7: Time to load utils op: 0.10232067108154297 seconds +7: Time to load utils op: 0.10240292549133301 secondsTime to load utils op: 0.10241007804870605 seconds +7: +7: Time to load utils op: 0.1020669937133789 seconds +7: Time to load utils op: 0.10207653045654297 seconds +2: Time to load utils op: 0.11097121238708496 secondsTime to load utils op: 0.11098074913024902 seconds +2: +2: Time to load utils op: 0.110992431640625 seconds +2: Time to load utils op: 0.11100411415100098 secondsTime to load utils op: 0.11100554466247559 secondsTime to load utils op: 0.11099910736083984 seconds +2: +2: +2: Time to load utils op: 0.1110084056854248 seconds +2: Time to load utils op: 0.1110086441040039 seconds +5: Time to load utils op: 0.11234760284423828 secondsTime to load utils op: 0.11210250854492188 seconds +5: +5: Time to load utils op: 0.11231279373168945 secondsTime to load utils op: 0.11213088035583496 seconds +5: +5: Time to load utils op: 0.11268305778503418 secondsTime to load utils op: 0.11213564872741699 secondsTime to load utils op: 0.11218738555908203 seconds +5: Time to load utils op: 0.11213803291320801 seconds +5: +5: +4: Time to load utils op: 0.11065554618835449 seconds +4: Time to load utils op: 0.11067652702331543 seconds +4: Time to load utils op: 0.11066770553588867 seconds +4: Time to load utils op: 0.11067700386047363 seconds +4: Time to load utils op: 0.11069846153259277 secondsTime to load utils op: 0.1107032299041748 seconds +4: +4: Time to load utils op: 0.11071372032165527 seconds +4: Time to load utils op: 0.11070370674133301 seconds +3: Time to load utils op: 0.0003390312194824219 seconds +3: Time to load utils op: 0.00033354759216308594 seconds +3: Time to load utils op: 0.00032711029052734375 seconds +3: Time to load utils op: 0.00031113624572753906 seconds +3: Time to load utils op: 0.000377655029296875 seconds +3: Time to load utils op: 0.00036597251892089844 seconds +3: Time to load utils op: 0.0003902912139892578 seconds +6: Time to load utils op: 0.0003979206085205078 seconds +6: Time to load utils op: 0.0004973411560058594 seconds +6: Time to load utils op: 0.000415802001953125 seconds +6: Time to load utils op: 0.000415802001953125 seconds +6: Time to load utils op: 0.0005066394805908203 seconds +6: Time to load utils op: 0.0004954338073730469 seconds +6: Time to load utils op: 0.0005273818969726562 secondsTime to load utils op: 0.0005331039428710938 seconds +6: +7: Time to load utils op: 0.00045943260192871094 seconds +7: Time to load utils op: 0.00038743019104003906 seconds +7: Time to load utils op: 0.0003757476806640625 seconds +7: Time to load utils op: 0.000385284423828125 seconds +7: Time to load utils op: 0.0003826618194580078 seconds +7: Time to load utils op: 0.00043201446533203125 seconds +7: Time to load utils op: 0.000415802001953125 seconds +7: Time to load utils op: 0.0003705024719238281 seconds +1: Time to load utils op: 0.0005168914794921875 seconds +1: Time to load utils op: 0.0009691715240478516 secondsTime to load utils op: 0.0009138584136962891 seconds +1: +1: Time to load utils op: 0.000896453857421875 seconds +1: Time to load utils op: 0.0011131763458251953 seconds +1: Time to load utils op: 0.0010991096496582031 seconds +1: Time to load utils op: 0.0011289119720458984 seconds +1: Time to load utils op: 0.0009953975677490234 seconds +0: Time to load utils op: 0.0005435943603515625 seconds +0: Time to load utils op: 0.0004413127899169922 seconds +0: Time to load utils op: 0.0004467964172363281 seconds +0: Time to load utils op: 0.00043964385986328125 seconds +0: Time to load utils op: 0.00044989585876464844 seconds +0: Time to load utils op: 0.0004558563232421875 seconds +0: Time to load utils op: 0.00044727325439453125 seconds +5: Time to load utils op: 0.0007803440093994141 seconds +5: Time to load utils op: 0.0010323524475097656 secondsTime to load utils op: 0.0010428428649902344 seconds +5: +5: Time to load utils op: 0.001180410385131836 seconds +5: Time to load utils op: 0.0011432170867919922 seconds +5: Time to load utils op: 0.0011034011840820312 seconds +5: Time to load utils op: 0.0011391639709472656 seconds +2: Time to load utils op: 0.0008594989776611328 seconds +5: Time to load utils op: 0.001142740249633789 seconds +2: Time to load utils op: 0.0014104843139648438 seconds +2: Time to load utils op: 0.0013108253479003906 secondsTime to load utils op: 0.0013263225555419922 seconds +2: +2: Time to load utils op: 0.0013880729675292969 seconds +2: Time to load utils op: 0.0013074874877929688 seconds +2: Time to load utils op: 0.0013096332550048828 seconds +2: Time to load utils op: 0.0013210773468017578 seconds +4: Time to load utils op: 0.0007288455963134766 seconds +4: Time to load utils op: 0.0009312629699707031 seconds +4: Time to load utils op: 0.0011856555938720703 seconds +4: Time to load utils op: 0.001066446304321289 seconds +4: Time to load utils op: 0.0012581348419189453 seconds +4: Time to load utils op: 0.0010874271392822266 secondsTime to load utils op: 0.0011210441589355469 seconds +4: +4: Time to load utils op: 0.001285552978515625 seconds +0: [2023-03-15 21:59:14,553] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-15 21:59:14,554] [INFO] [utils.py:828:see_memory_usage] MA 0.52 GB Max_MA 0.52 GB CA 0.57 GB Max_CA 1 GB +0: [2023-03-15 21:59:14,554] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% +0: [2023-03-15 21:59:14,681] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-15 21:59:14,682] [INFO] [utils.py:828:see_memory_usage] MA 1.14 GB Max_MA 1.14 GB CA 1.48 GB Max_CA 1 GB +0: [2023-03-15 21:59:14,682] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-15 21:59:14,785] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-15 21:59:14,786] [INFO] [utils.py:828:see_memory_usage] MA 1.14 GB Max_MA 1.14 GB CA 1.48 GB Max_CA 1 GB +0: [2023-03-15 21:59:14,786] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-15 21:59:14,889] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-15 21:59:14,890] [INFO] [utils.py:828:see_memory_usage] MA 1.58 GB Max_MA 1.58 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 21:59:14,890] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-15 21:59:14,991] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-15 21:59:14,991] [INFO] [utils.py:828:see_memory_usage] MA 1.58 GB Max_MA 1.58 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 21:59:14,992] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-15 21:59:15,096] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-15 21:59:15,096] [INFO] [utils.py:828:see_memory_usage] MA 1.58 GB Max_MA 1.58 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 21:59:15,096] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-15 21:59:15,197] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-15 21:59:15,198] [INFO] [utils.py:828:see_memory_usage] MA 1.58 GB Max_MA 1.58 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 21:59:15,198] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-15 21:59:15,304] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-15 21:59:15,305] [INFO] [utils.py:828:see_memory_usage] MA 1.62 GB Max_MA 1.62 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 21:59:15,305] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-15 21:59:15,407] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-15 21:59:15,408] [INFO] [utils.py:828:see_memory_usage] MA 1.62 GB Max_MA 1.62 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 21:59:15,408] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.27 GB, percent = 6.4% +0: [2023-03-15 21:59:15,408] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-15 21:59:15,408] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-15 21:59:15,409] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-15 21:59:15,409] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-15 21:59:15,409] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-15 21:59:15,409] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-15 21:59:15,409] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-15 21:59:15,409] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-15 21:59:15,409] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-15 21:59:15,410] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-15 21:59:15,411] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-15 21:59:15,411] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.00042557716369628906 seconds +0: [2023-03-15 21:59:15,412] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-15 21:59:15,422] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=25 [0, 25) STAGE_PARAMS=280342528 (280.343M) TOTAL_PARAMS=280342528 (280.343M) UNIQUE_PARAMS=280342528 (280.343M) +6: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: WARNING: could not find the metadata file checkpoints_280m5b9400m +1: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: will not load any checkpoints and will start from random +2: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-15 21:59:15,429] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +3: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +6: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +4: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +2: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +5: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-15 21:59:15,430] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_280m5b9400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +7: time (ms) | load-checkpoint: 6.53 +0: estimated model parameters: 0.280342528 +0: estimated model parameters without embeddings: 0.22673408 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-15 21:59:15 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 2884878 +0: validation: 3072 +0: test: 256 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.006673 seconds +0: number of documents: 835726 +0: > dataset split: +0: train: +0: document indices in [0, 835726) total of 835726 documents +0: > WARNING: could not find index map files, building the indices on rank 0 ... +0: > last epoch number of samples (153471) is smaller than 95.0% of number of samples per epoch (195100), setting separate_last_epoch to True +0: > elasped time to build and save doc-idx mapping (seconds): 0.556262 +0: using: +0: number of documents: 835726 +0: number of epochs: 15 +0: sequence length: 2048 +0: total number of samples: 2926508 +0: > elasped time to build and save sample-idx mapping (seconds): 0.071125 +0: > building shuffle index with split [0, 2731407) and [2731407, 2926508) ... +0: > elasped time to build and save shuffle-idx mapping (seconds): 0.074524 +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_2884878ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_2884878ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_2884878ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.012 seconds +0: total number of samples: 2926509 +0: total number of epochs: 15 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.044785 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_3072ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_3072ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_3072ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.075 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-15 21:59:29 +0: done with setup ... +0: training ... +0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: +7: time (ms) | model-and-optimizer-setup: 17015.35 | train/valid/test-data-iterators-setup: 13710.54 +0: [000-000] 0.2803B / 0.2267B +0: [before the start of training step] datetime: 2023-03-15 21:59:29 +0: [Rank 0] (after 10 iterations) memory (MB) | allocated: 3765.8056640625 | max allocated: 34106.49267578125 | reserved: 35064.0 | max reserved: 35064.0 +7: iteration 10/ 11269 | consumed samples: 2560 | consumed tokens: 5242880 | elapsed time per iteration (s): 1.36 | learning rate: 1.775E-05 | global batch size: 256 | lm loss: 1.027888E+01 | grad norm: 2.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 188.292 | TFLOPs: 12.31 | +7: iteration 20/ 11269 | consumed samples: 5120 | consumed tokens: 10485760 | elapsed time per iteration (s): 0.50 | learning rate: 3.550E-05 | global batch size: 256 | lm loss: 9.282013E+00 | grad norm: 1.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 507.128 | TFLOPs: 33.17 | +7: iteration 30/ 11269 | consumed samples: 7680 | consumed tokens: 15728640 | elapsed time per iteration (s): 0.50 | learning rate: 5.324E-05 | global batch size: 256 | lm loss: 8.801241E+00 | grad norm: 1.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 515.602 | TFLOPs: 33.72 | +7: iteration 40/ 11269 | consumed samples: 10240 | consumed tokens: 20971520 | elapsed time per iteration (s): 0.49 | learning rate: 7.099E-05 | global batch size: 256 | lm loss: 8.217269E+00 | grad norm: 1.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 522.455 | TFLOPs: 34.17 | +7: iteration 50/ 11269 | consumed samples: 12800 | consumed tokens: 26214400 | elapsed time per iteration (s): 0.49 | learning rate: 8.874E-05 | global batch size: 256 | lm loss: 7.653390E+00 | grad norm: 1.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 522.280 | TFLOPs: 34.16 | +7: iteration 60/ 11269 | consumed samples: 15360 | consumed tokens: 31457280 | elapsed time per iteration (s): 0.50 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 7.292472E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 507.201 | TFLOPs: 33.17 | +7: iteration 70/ 11269 | consumed samples: 17920 | consumed tokens: 36700160 | elapsed time per iteration (s): 0.49 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 7.099794E+00 | grad norm: 0.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 523.555 | TFLOPs: 34.24 | +7: iteration 80/ 11269 | consumed samples: 20480 | consumed tokens: 41943040 | elapsed time per iteration (s): 0.49 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 6.961439E+00 | grad norm: 0.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 527.797 | TFLOPs: 34.52 | +7: iteration 90/ 11269 | consumed samples: 23040 | consumed tokens: 47185920 | elapsed time per iteration (s): 0.49 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 6.817342E+00 | grad norm: 0.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 519.459 | TFLOPs: 33.97 | +7: iteration 100/ 11269 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (s): 0.49 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 6.711210E+00 | grad norm: 0.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 520.806 | TFLOPs: 34.06 | +7: iteration 110/ 11269 | consumed samples: 28160 | consumed tokens: 57671680 | elapsed time per iteration (s): 0.50 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 6.613287E+00 | grad norm: 1.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 511.305 | TFLOPs: 33.44 | +7: iteration 120/ 11269 | consumed samples: 30720 | consumed tokens: 62914560 | elapsed time per iteration (s): 0.48 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.510976E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.614 | TFLOPs: 34.96 | +7: iteration 130/ 11269 | consumed samples: 33280 | consumed tokens: 68157440 | elapsed time per iteration (s): 0.48 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.440462E+00 | grad norm: 0.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.191 | TFLOPs: 35.20 | +7: iteration 140/ 11269 | consumed samples: 35840 | consumed tokens: 73400320 | elapsed time per iteration (s): 0.48 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.427649E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.410 | TFLOPs: 35.08 | +7: iteration 150/ 11269 | consumed samples: 38400 | consumed tokens: 78643200 | elapsed time per iteration (s): 0.48 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.363493E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.177 | TFLOPs: 35.00 | +7: iteration 160/ 11269 | consumed samples: 40960 | consumed tokens: 83886080 | elapsed time per iteration (s): 0.48 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.318335E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.664 | TFLOPs: 34.97 | +7: iteration 170/ 11269 | consumed samples: 43520 | consumed tokens: 89128960 | elapsed time per iteration (s): 0.49 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.267995E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 524.590 | TFLOPs: 34.31 | +7: iteration 180/ 11269 | consumed samples: 46080 | consumed tokens: 94371840 | elapsed time per iteration (s): 0.49 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.230944E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 527.443 | TFLOPs: 34.49 | +7: iteration 190/ 11269 | consumed samples: 48640 | consumed tokens: 99614720 | elapsed time per iteration (s): 0.48 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.213390E+00 | grad norm: 0.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.793 | TFLOPs: 34.58 | +7: iteration 200/ 11269 | consumed samples: 51200 | consumed tokens: 104857600 | elapsed time per iteration (s): 0.48 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.200788E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.983 | TFLOPs: 35.12 | +7: iteration 210/ 11269 | consumed samples: 53760 | consumed tokens: 110100480 | elapsed time per iteration (s): 0.47 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.147212E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.415 | TFLOPs: 35.47 | +7: iteration 220/ 11269 | consumed samples: 56320 | consumed tokens: 115343360 | elapsed time per iteration (s): 0.48 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.127728E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.287 | TFLOPs: 34.81 | +7: iteration 230/ 11269 | consumed samples: 58880 | consumed tokens: 120586240 | elapsed time per iteration (s): 0.49 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 6.097730E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 520.181 | TFLOPs: 34.02 | +7: iteration 240/ 11269 | consumed samples: 61440 | consumed tokens: 125829120 | elapsed time per iteration (s): 0.48 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 6.073207E+00 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.134 | TFLOPs: 35.13 | +7: iteration 250/ 11269 | consumed samples: 64000 | consumed tokens: 131072000 | elapsed time per iteration (s): 0.48 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 6.033714E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.045 | TFLOPs: 34.93 | +7: iteration 260/ 11269 | consumed samples: 66560 | consumed tokens: 136314880 | elapsed time per iteration (s): 0.48 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 6.030340E+00 | grad norm: 0.915 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.036 | TFLOPs: 34.86 | +7: iteration 270/ 11269 | consumed samples: 69120 | consumed tokens: 141557760 | elapsed time per iteration (s): 0.48 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 6.006889E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.097 | TFLOPs: 35.06 | +7: iteration 280/ 11269 | consumed samples: 71680 | consumed tokens: 146800640 | elapsed time per iteration (s): 0.49 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.970426E+00 | grad norm: 0.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 524.005 | TFLOPs: 34.27 | +7: iteration 290/ 11269 | consumed samples: 74240 | consumed tokens: 152043520 | elapsed time per iteration (s): 0.49 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.940175E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 527.560 | TFLOPs: 34.50 | +7: iteration 300/ 11269 | consumed samples: 76800 | consumed tokens: 157286400 | elapsed time per iteration (s): 0.48 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.934694E+00 | grad norm: 0.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.748 | TFLOPs: 35.17 | +7: iteration 310/ 11269 | consumed samples: 79360 | consumed tokens: 162529280 | elapsed time per iteration (s): 0.49 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.894152E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 526.023 | TFLOPs: 34.40 | +7: iteration 320/ 11269 | consumed samples: 81920 | consumed tokens: 167772160 | elapsed time per iteration (s): 0.49 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.880116E+00 | grad norm: 0.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 523.796 | TFLOPs: 34.26 | +7: iteration 330/ 11269 | consumed samples: 84480 | consumed tokens: 173015040 | elapsed time per iteration (s): 0.48 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.829498E+00 | grad norm: 0.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.257 | TFLOPs: 34.87 | +7: iteration 340/ 11269 | consumed samples: 87040 | consumed tokens: 178257920 | elapsed time per iteration (s): 0.47 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.812987E+00 | grad norm: 0.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.109 | TFLOPs: 35.45 | +7: iteration 350/ 11269 | consumed samples: 89600 | consumed tokens: 183500800 | elapsed time per iteration (s): 0.50 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.778070E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 516.365 | TFLOPs: 33.77 | +7: iteration 360/ 11269 | consumed samples: 92160 | consumed tokens: 188743680 | elapsed time per iteration (s): 0.50 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.737553E+00 | grad norm: 0.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 512.925 | TFLOPs: 33.55 | +7: iteration 370/ 11269 | consumed samples: 94720 | consumed tokens: 193986560 | elapsed time per iteration (s): 0.49 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.730161E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 523.026 | TFLOPs: 34.21 | +7: iteration 380/ 11269 | consumed samples: 97280 | consumed tokens: 199229440 | elapsed time per iteration (s): 0.49 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 5.695152E+00 | grad norm: 0.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 523.541 | TFLOPs: 34.24 | +7: iteration 390/ 11269 | consumed samples: 99840 | consumed tokens: 204472320 | elapsed time per iteration (s): 0.48 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 5.674793E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.723 | TFLOPs: 34.58 | +7: iteration 400/ 11269 | consumed samples: 102400 | consumed tokens: 209715200 | elapsed time per iteration (s): 0.48 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 5.630848E+00 | grad norm: 0.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.640 | TFLOPs: 35.03 | +7: iteration 410/ 11269 | consumed samples: 104960 | consumed tokens: 214958080 | elapsed time per iteration (s): 0.50 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 5.635732E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 515.622 | TFLOPs: 33.72 | +7: iteration 420/ 11269 | consumed samples: 107520 | consumed tokens: 220200960 | elapsed time per iteration (s): 0.48 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 5.570592E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.920 | TFLOPs: 34.66 | +7: iteration 430/ 11269 | consumed samples: 110080 | consumed tokens: 225443840 | elapsed time per iteration (s): 0.48 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 5.554506E+00 | grad norm: 0.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.228 | TFLOPs: 34.61 | +7: iteration 440/ 11269 | consumed samples: 112640 | consumed tokens: 230686720 | elapsed time per iteration (s): 0.48 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 5.531062E+00 | grad norm: 0.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.936 | TFLOPs: 35.12 | +7: iteration 450/ 11269 | consumed samples: 115200 | consumed tokens: 235929600 | elapsed time per iteration (s): 0.48 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 5.511559E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.146 | TFLOPs: 34.87 | +7: iteration 460/ 11269 | consumed samples: 117760 | consumed tokens: 241172480 | elapsed time per iteration (s): 0.49 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 5.456729E+00 | grad norm: 0.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 525.116 | TFLOPs: 34.34 | +7: iteration 470/ 11269 | consumed samples: 120320 | consumed tokens: 246415360 | elapsed time per iteration (s): 0.49 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 5.479884E+00 | grad norm: 0.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 526.241 | TFLOPs: 34.42 | +7: iteration 480/ 11269 | consumed samples: 122880 | consumed tokens: 251658240 | elapsed time per iteration (s): 0.49 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 5.432080E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 521.080 | TFLOPs: 34.08 | +7: iteration 490/ 11269 | consumed samples: 125440 | consumed tokens: 256901120 | elapsed time per iteration (s): 0.49 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 5.394036E+00 | grad norm: 0.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 523.430 | TFLOPs: 34.23 | +7: iteration 500/ 11269 | consumed samples: 128000 | consumed tokens: 262144000 | elapsed time per iteration (s): 0.49 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 5.395821E+00 | grad norm: 0.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 522.905 | TFLOPs: 34.20 | +7: iteration 510/ 11269 | consumed samples: 130560 | consumed tokens: 267386880 | elapsed time per iteration (s): 0.48 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 5.347544E+00 | grad norm: 0.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.856 | TFLOPs: 35.18 | +7: iteration 520/ 11269 | consumed samples: 133120 | consumed tokens: 272629760 | elapsed time per iteration (s): 0.48 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 5.302108E+00 | grad norm: 0.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.402 | TFLOPs: 34.75 | +7: iteration 530/ 11269 | consumed samples: 135680 | consumed tokens: 277872640 | elapsed time per iteration (s): 0.48 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 5.293512E+00 | grad norm: 0.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.539 | TFLOPs: 34.96 | +7: iteration 540/ 11269 | consumed samples: 138240 | consumed tokens: 283115520 | elapsed time per iteration (s): 0.47 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 5.254497E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.572 | TFLOPs: 35.48 | +7: iteration 550/ 11269 | consumed samples: 140800 | consumed tokens: 288358400 | elapsed time per iteration (s): 0.48 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 5.264146E+00 | grad norm: 0.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.555 | TFLOPs: 35.03 | +7: iteration 560/ 11269 | consumed samples: 143360 | consumed tokens: 293601280 | elapsed time per iteration (s): 0.47 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 5.225822E+00 | grad norm: 0.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.069 | TFLOPs: 35.25 | +7: iteration 570/ 11269 | consumed samples: 145920 | consumed tokens: 298844160 | elapsed time per iteration (s): 0.47 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 5.209805E+00 | grad norm: 0.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.655 | TFLOPs: 35.42 | +7: iteration 580/ 11269 | consumed samples: 148480 | consumed tokens: 304087040 | elapsed time per iteration (s): 0.48 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 5.156770E+00 | grad norm: 0.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.189 | TFLOPs: 34.61 | +7: iteration 590/ 11269 | consumed samples: 151040 | consumed tokens: 309329920 | elapsed time per iteration (s): 0.48 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 5.151631E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.686 | TFLOPs: 34.84 | +7: iteration 600/ 11269 | consumed samples: 153600 | consumed tokens: 314572800 | elapsed time per iteration (s): 0.48 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 5.120596E+00 | grad norm: 0.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 530.133 | TFLOPs: 34.67 | +7: iteration 610/ 11269 | consumed samples: 156160 | consumed tokens: 319815680 | elapsed time per iteration (s): 0.48 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 5.125335E+00 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.552 | TFLOPs: 35.16 | +7: iteration 620/ 11269 | consumed samples: 158720 | consumed tokens: 325058560 | elapsed time per iteration (s): 0.48 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 5.092756E+00 | grad norm: 0.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.164 | TFLOPs: 34.74 | +7: iteration 630/ 11269 | consumed samples: 161280 | consumed tokens: 330301440 | elapsed time per iteration (s): 0.48 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 5.059850E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.216 | TFLOPs: 34.94 | +7: iteration 640/ 11269 | consumed samples: 163840 | consumed tokens: 335544320 | elapsed time per iteration (s): 0.48 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 5.024911E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.566 | TFLOPs: 34.57 | +7: iteration 650/ 11269 | consumed samples: 166400 | consumed tokens: 340787200 | elapsed time per iteration (s): 0.48 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 5.003077E+00 | grad norm: 0.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.392 | TFLOPs: 34.62 | +7: iteration 660/ 11269 | consumed samples: 168960 | consumed tokens: 346030080 | elapsed time per iteration (s): 0.48 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.994066E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.957 | TFLOPs: 34.92 | +7: iteration 670/ 11269 | consumed samples: 171520 | consumed tokens: 351272960 | elapsed time per iteration (s): 0.48 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.976462E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.040 | TFLOPs: 34.86 | +7: iteration 680/ 11269 | consumed samples: 174080 | consumed tokens: 356515840 | elapsed time per iteration (s): 0.48 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.974440E+00 | grad norm: 0.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.984 | TFLOPs: 34.92 | +7: iteration 690/ 11269 | consumed samples: 176640 | consumed tokens: 361758720 | elapsed time per iteration (s): 0.48 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.922567E+00 | grad norm: 0.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.805 | TFLOPs: 35.11 | +7: iteration 700/ 11269 | consumed samples: 179200 | consumed tokens: 367001600 | elapsed time per iteration (s): 0.48 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.899709E+00 | grad norm: 0.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.813 | TFLOPs: 35.17 | +7: iteration 710/ 11269 | consumed samples: 181760 | consumed tokens: 372244480 | elapsed time per iteration (s): 0.48 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.885059E+00 | grad norm: 0.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.479 | TFLOPs: 34.89 | +7: iteration 720/ 11269 | consumed samples: 184320 | consumed tokens: 377487360 | elapsed time per iteration (s): 0.47 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.864639E+00 | grad norm: 1.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.433 | TFLOPs: 35.28 | +7: iteration 730/ 11269 | consumed samples: 186880 | consumed tokens: 382730240 | elapsed time per iteration (s): 0.48 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.845634E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.294 | TFLOPs: 35.14 | +7: iteration 740/ 11269 | consumed samples: 189440 | consumed tokens: 387973120 | elapsed time per iteration (s): 0.48 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.823149E+00 | grad norm: 0.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.043 | TFLOPs: 35.12 | +7: iteration 750/ 11269 | consumed samples: 192000 | consumed tokens: 393216000 | elapsed time per iteration (s): 0.48 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.803323E+00 | grad norm: 0.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.007 | TFLOPs: 34.92 | +7: iteration 760/ 11269 | consumed samples: 194560 | consumed tokens: 398458880 | elapsed time per iteration (s): 0.47 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.772005E+00 | grad norm: 0.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.781 | TFLOPs: 35.30 | +7: iteration 770/ 11269 | consumed samples: 197120 | consumed tokens: 403701760 | elapsed time per iteration (s): 0.48 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.717144E+00 | grad norm: 0.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.921 | TFLOPs: 34.66 | +7: iteration 780/ 11269 | consumed samples: 199680 | consumed tokens: 408944640 | elapsed time per iteration (s): 0.48 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.721557E+00 | grad norm: 0.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.553 | TFLOPs: 35.22 | +7: iteration 790/ 11269 | consumed samples: 202240 | consumed tokens: 414187520 | elapsed time per iteration (s): 0.48 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.679041E+00 | grad norm: 0.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.519 | TFLOPs: 34.96 | +7: iteration 800/ 11269 | consumed samples: 204800 | consumed tokens: 419430400 | elapsed time per iteration (s): 0.48 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.666647E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.097 | TFLOPs: 34.73 | +7: iteration 810/ 11269 | consumed samples: 207360 | consumed tokens: 424673280 | elapsed time per iteration (s): 0.48 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.636638E+00 | grad norm: 0.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.098 | TFLOPs: 34.93 | +7: iteration 820/ 11269 | consumed samples: 209920 | consumed tokens: 429916160 | elapsed time per iteration (s): 0.49 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.609793E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 523.860 | TFLOPs: 34.26 | +7: iteration 830/ 11269 | consumed samples: 212480 | consumed tokens: 435159040 | elapsed time per iteration (s): 0.48 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.595002E+00 | grad norm: 0.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.227 | TFLOPs: 35.13 | +7: iteration 840/ 11269 | consumed samples: 215040 | consumed tokens: 440401920 | elapsed time per iteration (s): 0.48 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.565396E+00 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 530.411 | TFLOPs: 34.69 | +7: iteration 850/ 11269 | consumed samples: 217600 | consumed tokens: 445644800 | elapsed time per iteration (s): 0.48 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.560812E+00 | grad norm: 0.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 527.962 | TFLOPs: 34.53 | +7: iteration 860/ 11269 | consumed samples: 220160 | consumed tokens: 450887680 | elapsed time per iteration (s): 0.48 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.539445E+00 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.014 | TFLOPs: 34.86 | +7: iteration 870/ 11269 | consumed samples: 222720 | consumed tokens: 456130560 | elapsed time per iteration (s): 0.48 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.551311E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.242 | TFLOPs: 34.87 | +7: iteration 880/ 11269 | consumed samples: 225280 | consumed tokens: 461373440 | elapsed time per iteration (s): 0.48 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.524503E+00 | grad norm: 0.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.371 | TFLOPs: 34.88 | +7: iteration 890/ 11269 | consumed samples: 227840 | consumed tokens: 466616320 | elapsed time per iteration (s): 0.48 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.513066E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 530.638 | TFLOPs: 34.70 | +7: iteration 900/ 11269 | consumed samples: 230400 | consumed tokens: 471859200 | elapsed time per iteration (s): 0.48 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.501746E+00 | grad norm: 0.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.479 | TFLOPs: 35.09 | +7: iteration 910/ 11269 | consumed samples: 232960 | consumed tokens: 477102080 | elapsed time per iteration (s): 0.48 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.470564E+00 | grad norm: 0.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.426 | TFLOPs: 35.02 | +7: iteration 920/ 11269 | consumed samples: 235520 | consumed tokens: 482344960 | elapsed time per iteration (s): 0.48 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.467780E+00 | grad norm: 0.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.448 | TFLOPs: 34.82 | +7: iteration 930/ 11269 | consumed samples: 238080 | consumed tokens: 487587840 | elapsed time per iteration (s): 0.48 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.458410E+00 | grad norm: 0.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.008 | TFLOPs: 34.79 | +7: iteration 940/ 11269 | consumed samples: 240640 | consumed tokens: 492830720 | elapsed time per iteration (s): 0.48 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.438409E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.956 | TFLOPs: 34.86 | +7: iteration 950/ 11269 | consumed samples: 243200 | consumed tokens: 498073600 | elapsed time per iteration (s): 0.48 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.428521E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.069 | TFLOPs: 34.86 | +7: iteration 960/ 11269 | consumed samples: 245760 | consumed tokens: 503316480 | elapsed time per iteration (s): 0.47 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.423767E+00 | grad norm: 0.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.466 | TFLOPs: 35.28 | +7: iteration 970/ 11269 | consumed samples: 248320 | consumed tokens: 508559360 | elapsed time per iteration (s): 0.48 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.396809E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.104 | TFLOPs: 34.73 | +7: iteration 980/ 11269 | consumed samples: 250880 | consumed tokens: 513802240 | elapsed time per iteration (s): 0.49 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.381861E+00 | grad norm: 0.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 520.344 | TFLOPs: 34.03 | +7: iteration 990/ 11269 | consumed samples: 253440 | consumed tokens: 519045120 | elapsed time per iteration (s): 0.48 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.388478E+00 | grad norm: 0.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 530.506 | TFLOPs: 34.69 | +7: iteration 1000/ 11269 | consumed samples: 256000 | consumed tokens: 524288000 | elapsed time per iteration (s): 0.47 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.382750E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.360 | TFLOPs: 35.47 | +7: ----------------------------------------------------------------------------------------------- +7: validation loss at iteration 1000 | lm loss value: 4.359394E+00 | lm loss PPL: 7.820969E+01 | +7: ----------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 1000 to checkpoints_280m5b9400m +0: [2023-03-15 22:07:41,254] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is begin to save! +0: [2023-03-15 22:07:41,274] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_01-model_00-model_states.pt... +0: [2023-03-15 22:07:41,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_01-model_00-model_states.pt. +0: [2023-03-15 22:07:41,409] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_03-model_00-model_states.pt... +0: [2023-03-15 22:07:41,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_03-model_00-model_states.pt. +0: [2023-03-15 22:07:41,436] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_04-model_00-model_states.pt... +0: [2023-03-15 22:07:41,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_04-model_00-model_states.pt. +0: [2023-03-15 22:07:41,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_05-model_00-model_states.pt... +0: [2023-03-15 22:07:41,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_05-model_00-model_states.pt. +0: [2023-03-15 22:07:41,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_06-model_00-model_states.pt... +0: [2023-03-15 22:07:41,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_06-model_00-model_states.pt. +0: [2023-03-15 22:07:41,509] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_07-model_00-model_states.pt... +0: [2023-03-15 22:07:41,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_07-model_00-model_states.pt. +0: [2023-03-15 22:07:41,533] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_08-model_00-model_states.pt... +0: [2023-03-15 22:07:41,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_08-model_00-model_states.pt. +0: [2023-03-15 22:07:41,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_09-model_00-model_states.pt... +0: [2023-03-15 22:07:41,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_09-model_00-model_states.pt. +0: [2023-03-15 22:07:41,581] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_10-model_00-model_states.pt... +0: [2023-03-15 22:07:41,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_10-model_00-model_states.pt. +0: [2023-03-15 22:07:41,605] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_11-model_00-model_states.pt... +0: [2023-03-15 22:07:41,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_11-model_00-model_states.pt. +0: [2023-03-15 22:07:41,629] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_12-model_00-model_states.pt... +0: [2023-03-15 22:07:41,652] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_12-model_00-model_states.pt. +0: [2023-03-15 22:07:41,653] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_13-model_00-model_states.pt... +0: [2023-03-15 22:07:41,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_13-model_00-model_states.pt. +0: [2023-03-15 22:07:41,677] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_14-model_00-model_states.pt... +0: [2023-03-15 22:07:41,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_14-model_00-model_states.pt. +0: [2023-03-15 22:07:41,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_15-model_00-model_states.pt... +0: [2023-03-15 22:07:41,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_15-model_00-model_states.pt. +0: [2023-03-15 22:07:41,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_16-model_00-model_states.pt... +0: [2023-03-15 22:07:41,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_16-model_00-model_states.pt. +0: [2023-03-15 22:07:41,749] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_17-model_00-model_states.pt... +0: [2023-03-15 22:07:41,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_17-model_00-model_states.pt. +0: [2023-03-15 22:07:41,773] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_18-model_00-model_states.pt... +0: [2023-03-15 22:07:41,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_18-model_00-model_states.pt. +0: [2023-03-15 22:07:41,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_19-model_00-model_states.pt... +0: [2023-03-15 22:07:41,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_19-model_00-model_states.pt. +0: [2023-03-15 22:07:41,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_20-model_00-model_states.pt... +0: [2023-03-15 22:07:41,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_20-model_00-model_states.pt. +0: [2023-03-15 22:07:41,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/layer_22-model_00-model_states.pt... +0: [2023-03-15 22:07:41,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/layer_22-model_00-model_states.pt. +0: [2023-03-15 22:07:41,847] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step1000/mp_rank_00_model_states.pt +0: [2023-03-15 22:07:41,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/mp_rank_00_model_states.pt... +0: [2023-03-15 22:07:41,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/mp_rank_00_model_states.pt. +0: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:07:41,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:07:41,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:07:41,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:07:41,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:07:41,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 22:07:41,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +0: [2023-03-15 22:07:41,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:07:41,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 22:07:41,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +0: [2023-03-15 22:07:41,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:07:41,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 22:07:41,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +1: [2023-03-15 22:07:41,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:07:41,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 22:07:41,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +5: [2023-03-15 22:07:41,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:07:41,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:07:41,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 22:07:41,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +5: [2023-03-15 22:07:41,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 22:07:41,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +5: [2023-03-15 22:07:41,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:07:41,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 22:07:41,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +5: [2023-03-15 22:07:41,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:07:41,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 22:07:41,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +5: [2023-03-15 22:07:41,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:07:41,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 22:07:41,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +5: [2023-03-15 22:07:41,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:07:41,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 22:07:41,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +5: [2023-03-15 22:07:41,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:07:41,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 22:07:41,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +5: [2023-03-15 22:07:41,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:07:41,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 22:07:41,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +1: [2023-03-15 22:07:41,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:07:41,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:07:41,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 22:07:41,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-15 22:07:41,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +1: [2023-03-15 22:07:41,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +4: [2023-03-15 22:07:41,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:07:41,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 22:07:41,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +4: [2023-03-15 22:07:41,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:07:41,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 22:07:41,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +4: [2023-03-15 22:07:41,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:07:41,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 22:07:41,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +4: [2023-03-15 22:07:41,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:07:41,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-15 22:07:41,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +4: [2023-03-15 22:07:41,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:07:41,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:07:41,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:07:41,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:07:41,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-15 22:07:41,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 22:07:41,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-15 22:07:41,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 22:07:41,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +4: [2023-03-15 22:07:41,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +4: [2023-03-15 22:07:41,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +4: [2023-03-15 22:07:41,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +0: [2023-03-15 22:07:41,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:07:41,986] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 22:07:41,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +1: [2023-03-15 22:07:41,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:07:41,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 22:07:41,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +1: [2023-03-15 22:07:41,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:07:41,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 22:07:41,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +7: [2023-03-15 22:07:41,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:07:41,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:07:41,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:07:41,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 22:07:41,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 22:07:41,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 22:07:41,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +7: [2023-03-15 22:07:41,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +7: [2023-03-15 22:07:41,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +2: [2023-03-15 22:07:41,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:07:41,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:07:41,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:07:41,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:07:41,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 22:07:41,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +5: [2023-03-15 22:07:42,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:07:42,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 22:07:42,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +7: [2023-03-15 22:07:42,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:07:42,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:07:42,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:07:42,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:07:42,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 22:07:42,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 22:07:42,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 22:07:42,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 22:07:42,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +7: [2023-03-15 22:07:42,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +7: [2023-03-15 22:07:42,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +7: [2023-03-15 22:07:42,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +6: [2023-03-15 22:07:42,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:07:42,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:07:42,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:07:42,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 22:07:42,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 22:07:42,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 22:07:42,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +6: [2023-03-15 22:07:42,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +6: [2023-03-15 22:07:42,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +3: [2023-03-15 22:07:42,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:07:42,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:07:42,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:07:42,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:07:42,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:07:42,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:07:42,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 22:07:42,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 22:07:42,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 22:07:42,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 22:07:42,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 22:07:42,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 22:07:42,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +3: [2023-03-15 22:07:42,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +3: [2023-03-15 22:07:42,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +3: [2023-03-15 22:07:42,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +3: [2023-03-15 22:07:42,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +3: [2023-03-15 22:07:42,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +7: [2023-03-15 22:07:42,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:07:42,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 22:07:42,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +0: [2023-03-15 22:07:42,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:07:42,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 22:07:42,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +0: [2023-03-15 22:07:42,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 22:07:42,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +6: [2023-03-15 22:07:42,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:07:42,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:07:42,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:07:42,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:07:42,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 22:07:42,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 22:07:42,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 22:07:42,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +6: [2023-03-15 22:07:42,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-15 22:07:42,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +6: [2023-03-15 22:07:42,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +6: [2023-03-15 22:07:42,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +0: [2023-03-15 22:07:42,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:07:42,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 22:07:42,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +3: [2023-03-15 22:07:42,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:07:42,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-15 22:07:42,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +1: [2023-03-15 22:07:42,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:07:42,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 22:07:42,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +2: [2023-03-15 22:07:41,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 22:07:41,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +2: [2023-03-15 22:07:41,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +2: [2023-03-15 22:07:41,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 22:07:41,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +2: [2023-03-15 22:07:41,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +2: [2023-03-15 22:07:41,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:07:41,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 22:07:41,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +2: [2023-03-15 22:07:42,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:07:42,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:07:42,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:07:42,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 22:07:42,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 22:07:42,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 22:07:42,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +2: [2023-03-15 22:07:42,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +2: [2023-03-15 22:07:42,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +1: [2023-03-15 22:07:42,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:07:42,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:07:42,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 22:07:42,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 22:07:42,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +1: [2023-03-15 22:07:42,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +3: [2023-03-15 22:07:42,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:07:42,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 22:07:42,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +6: [2023-03-15 22:07:42,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:07:42,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step1000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 22:07:42,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! +0: successfully saved checkpoint at iteration 1000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 800.32 +7: iteration 1010/ 11269 | consumed samples: 258560 | consumed tokens: 529530880 | elapsed time per iteration (s): 0.57 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.367416E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 450.918 | TFLOPs: 29.49 | +7: iteration 1020/ 11269 | consumed samples: 261120 | consumed tokens: 534773760 | elapsed time per iteration (s): 0.50 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.358165E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 513.627 | TFLOPs: 33.59 | +7: iteration 1030/ 11269 | consumed samples: 263680 | consumed tokens: 540016640 | elapsed time per iteration (s): 0.47 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.347829E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.912 | TFLOPs: 35.44 | +7: iteration 1040/ 11269 | consumed samples: 266240 | consumed tokens: 545259520 | elapsed time per iteration (s): 0.48 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.331307E+00 | grad norm: 0.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.931 | TFLOPs: 34.85 | +7: iteration 1050/ 11269 | consumed samples: 268800 | consumed tokens: 550502400 | elapsed time per iteration (s): 0.47 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.316079E+00 | grad norm: 0.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.311 | TFLOPs: 35.34 | +7: iteration 1060/ 11269 | consumed samples: 271360 | consumed tokens: 555745280 | elapsed time per iteration (s): 0.47 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.317997E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.250 | TFLOPs: 35.46 | +7: iteration 1070/ 11269 | consumed samples: 273920 | consumed tokens: 560988160 | elapsed time per iteration (s): 0.48 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.297785E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.578 | TFLOPs: 35.09 | +7: iteration 1080/ 11269 | consumed samples: 276480 | consumed tokens: 566231040 | elapsed time per iteration (s): 0.48 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.309339E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.335 | TFLOPs: 35.21 | +7: iteration 1090/ 11269 | consumed samples: 279040 | consumed tokens: 571473920 | elapsed time per iteration (s): 0.48 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.307839E+00 | grad norm: 0.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.095 | TFLOPs: 35.06 | +7: iteration 1100/ 11269 | consumed samples: 281600 | consumed tokens: 576716800 | elapsed time per iteration (s): 0.47 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.285108E+00 | grad norm: 0.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.940 | TFLOPs: 35.31 | +7: iteration 1110/ 11269 | consumed samples: 284160 | consumed tokens: 581959680 | elapsed time per iteration (s): 0.48 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.273550E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.801 | TFLOPs: 34.98 | +7: iteration 1120/ 11269 | consumed samples: 286720 | consumed tokens: 587202560 | elapsed time per iteration (s): 0.47 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.256940E+00 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.203 | TFLOPs: 35.46 | +7: iteration 1130/ 11269 | consumed samples: 289280 | consumed tokens: 592445440 | elapsed time per iteration (s): 0.47 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.249532E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.055 | TFLOPs: 35.25 | +7: iteration 1140/ 11269 | consumed samples: 291840 | consumed tokens: 597688320 | elapsed time per iteration (s): 0.48 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.260006E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.163 | TFLOPs: 35.20 | +7: iteration 1150/ 11269 | consumed samples: 294400 | consumed tokens: 602931200 | elapsed time per iteration (s): 0.47 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.235439E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.037 | TFLOPs: 35.25 | +7: iteration 1160/ 11269 | consumed samples: 296960 | consumed tokens: 608174080 | elapsed time per iteration (s): 0.48 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.260288E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.634 | TFLOPs: 35.16 | +7: iteration 1170/ 11269 | consumed samples: 299520 | consumed tokens: 613416960 | elapsed time per iteration (s): 0.48 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.223132E+00 | grad norm: 0.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.997 | TFLOPs: 34.92 | +7: iteration 1180/ 11269 | consumed samples: 302080 | consumed tokens: 618659840 | elapsed time per iteration (s): 0.47 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.223728E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.481 | TFLOPs: 35.28 | +7: iteration 1190/ 11269 | consumed samples: 304640 | consumed tokens: 623902720 | elapsed time per iteration (s): 0.48 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.221468E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.250 | TFLOPs: 35.14 | +7: iteration 1200/ 11269 | consumed samples: 307200 | consumed tokens: 629145600 | elapsed time per iteration (s): 0.48 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.199981E+00 | grad norm: 0.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.873 | TFLOPs: 35.11 | +7: iteration 1210/ 11269 | consumed samples: 309760 | consumed tokens: 634388480 | elapsed time per iteration (s): 0.47 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.200072E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.294 | TFLOPs: 35.47 | +7: iteration 1220/ 11269 | consumed samples: 312320 | consumed tokens: 639631360 | elapsed time per iteration (s): 0.48 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.215726E+00 | grad norm: 0.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.432 | TFLOPs: 35.21 | +7: iteration 1230/ 11269 | consumed samples: 314880 | consumed tokens: 644874240 | elapsed time per iteration (s): 0.48 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.202612E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.818 | TFLOPs: 34.78 | +7: iteration 1240/ 11269 | consumed samples: 317440 | consumed tokens: 650117120 | elapsed time per iteration (s): 0.49 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.179997E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 525.535 | TFLOPs: 34.37 | +7: iteration 1250/ 11269 | consumed samples: 320000 | consumed tokens: 655360000 | elapsed time per iteration (s): 0.48 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.171236E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.758 | TFLOPs: 35.10 | +7: iteration 1260/ 11269 | consumed samples: 322560 | consumed tokens: 660602880 | elapsed time per iteration (s): 0.48 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.170303E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.754 | TFLOPs: 35.10 | +7: iteration 1270/ 11269 | consumed samples: 325120 | consumed tokens: 665845760 | elapsed time per iteration (s): 0.48 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.161396E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.024 | TFLOPs: 35.19 | +7: iteration 1280/ 11269 | consumed samples: 327680 | consumed tokens: 671088640 | elapsed time per iteration (s): 0.47 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.161057E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.117 | TFLOPs: 35.32 | +7: iteration 1290/ 11269 | consumed samples: 330240 | consumed tokens: 676331520 | elapsed time per iteration (s): 0.48 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.150000E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.055 | TFLOPs: 34.93 | +7: iteration 1300/ 11269 | consumed samples: 332800 | consumed tokens: 681574400 | elapsed time per iteration (s): 0.48 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.151115E+00 | grad norm: 0.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.995 | TFLOPs: 35.18 | +7: iteration 1310/ 11269 | consumed samples: 335360 | consumed tokens: 686817280 | elapsed time per iteration (s): 0.47 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.140461E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.208 | TFLOPs: 35.33 | +7: iteration 1320/ 11269 | consumed samples: 337920 | consumed tokens: 692060160 | elapsed time per iteration (s): 0.48 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.134912E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.470 | TFLOPs: 34.95 | +7: iteration 1330/ 11269 | consumed samples: 340480 | consumed tokens: 697303040 | elapsed time per iteration (s): 0.48 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.130267E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.866 | TFLOPs: 35.05 | +7: iteration 1340/ 11269 | consumed samples: 343040 | consumed tokens: 702545920 | elapsed time per iteration (s): 0.48 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.129687E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.526 | TFLOPs: 35.22 | +7: iteration 1350/ 11269 | consumed samples: 345600 | consumed tokens: 707788800 | elapsed time per iteration (s): 0.47 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.121521E+00 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.818 | TFLOPs: 35.30 | +7: iteration 1360/ 11269 | consumed samples: 348160 | consumed tokens: 713031680 | elapsed time per iteration (s): 0.47 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.109702E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.917 | TFLOPs: 35.31 | +7: iteration 1370/ 11269 | consumed samples: 350720 | consumed tokens: 718274560 | elapsed time per iteration (s): 0.48 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.102767E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.286 | TFLOPs: 34.88 | +7: iteration 1380/ 11269 | consumed samples: 353280 | consumed tokens: 723517440 | elapsed time per iteration (s): 0.48 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.118284E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.883 | TFLOPs: 34.78 | +7: iteration 1390/ 11269 | consumed samples: 355840 | consumed tokens: 728760320 | elapsed time per iteration (s): 0.47 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.100820E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.566 | TFLOPs: 35.29 | +7: iteration 1400/ 11269 | consumed samples: 358400 | consumed tokens: 734003200 | elapsed time per iteration (s): 0.47 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.088197E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.137 | TFLOPs: 35.46 | +7: iteration 1410/ 11269 | consumed samples: 360960 | consumed tokens: 739246080 | elapsed time per iteration (s): 0.48 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.090203E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.730 | TFLOPs: 35.17 | +7: iteration 1420/ 11269 | consumed samples: 363520 | consumed tokens: 744488960 | elapsed time per iteration (s): 0.48 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.095140E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.283 | TFLOPs: 35.14 | +7: iteration 1430/ 11269 | consumed samples: 366080 | consumed tokens: 749731840 | elapsed time per iteration (s): 0.48 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.086061E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.404 | TFLOPs: 34.82 | +7: iteration 1440/ 11269 | consumed samples: 368640 | consumed tokens: 754974720 | elapsed time per iteration (s): 0.48 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.084681E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.607 | TFLOPs: 35.16 | +7: iteration 1450/ 11269 | consumed samples: 371200 | consumed tokens: 760217600 | elapsed time per iteration (s): 0.48 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.078041E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.064 | TFLOPs: 35.12 | +7: iteration 1460/ 11269 | consumed samples: 373760 | consumed tokens: 765460480 | elapsed time per iteration (s): 0.47 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.074731E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.812 | TFLOPs: 35.43 | +7: iteration 1470/ 11269 | consumed samples: 376320 | consumed tokens: 770703360 | elapsed time per iteration (s): 0.47 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.075573E+00 | grad norm: 0.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.372 | TFLOPs: 35.27 | +7: iteration 1480/ 11269 | consumed samples: 378880 | consumed tokens: 775946240 | elapsed time per iteration (s): 0.47 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.066193E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.876 | TFLOPs: 35.44 | +7: iteration 1490/ 11269 | consumed samples: 381440 | consumed tokens: 781189120 | elapsed time per iteration (s): 0.49 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.063973E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 522.913 | TFLOPs: 34.20 | +7: iteration 1500/ 11269 | consumed samples: 384000 | consumed tokens: 786432000 | elapsed time per iteration (s): 0.48 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.055220E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.513 | TFLOPs: 35.09 | +7: iteration 1510/ 11269 | consumed samples: 386560 | consumed tokens: 791674880 | elapsed time per iteration (s): 0.48 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.041025E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.883 | TFLOPs: 35.24 | +7: iteration 1520/ 11269 | consumed samples: 389120 | consumed tokens: 796917760 | elapsed time per iteration (s): 0.47 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.062443E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.702 | TFLOPs: 35.43 | +7: iteration 1530/ 11269 | consumed samples: 391680 | consumed tokens: 802160640 | elapsed time per iteration (s): 0.48 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.047706E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.693 | TFLOPs: 35.16 | +7: iteration 1540/ 11269 | consumed samples: 394240 | consumed tokens: 807403520 | elapsed time per iteration (s): 0.48 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.044231E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.798 | TFLOPs: 34.98 | +7: iteration 1550/ 11269 | consumed samples: 396800 | consumed tokens: 812646400 | elapsed time per iteration (s): 0.48 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.043360E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.319 | TFLOPs: 35.14 | +7: iteration 1560/ 11269 | consumed samples: 399360 | consumed tokens: 817889280 | elapsed time per iteration (s): 0.48 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.011599E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.178 | TFLOPs: 35.07 | +7: iteration 1570/ 11269 | consumed samples: 401920 | consumed tokens: 823132160 | elapsed time per iteration (s): 0.48 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.019790E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.085 | TFLOPs: 35.19 | +7: iteration 1580/ 11269 | consumed samples: 404480 | consumed tokens: 828375040 | elapsed time per iteration (s): 0.47 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.018644E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.580 | TFLOPs: 35.42 | +7: iteration 1590/ 11269 | consumed samples: 407040 | consumed tokens: 833617920 | elapsed time per iteration (s): 0.48 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.010510E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.866 | TFLOPs: 34.91 | +7: iteration 1600/ 11269 | consumed samples: 409600 | consumed tokens: 838860800 | elapsed time per iteration (s): 0.48 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.996565E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.242 | TFLOPs: 35.20 | +7: iteration 1610/ 11269 | consumed samples: 412160 | consumed tokens: 844103680 | elapsed time per iteration (s): 0.47 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.002259E+00 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.756 | TFLOPs: 35.37 | +7: iteration 1620/ 11269 | consumed samples: 414720 | consumed tokens: 849346560 | elapsed time per iteration (s): 0.48 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.001208E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.898 | TFLOPs: 34.98 | +7: iteration 1630/ 11269 | consumed samples: 417280 | consumed tokens: 854589440 | elapsed time per iteration (s): 0.48 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.987079E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.240 | TFLOPs: 34.94 | +7: iteration 1640/ 11269 | consumed samples: 419840 | consumed tokens: 859832320 | elapsed time per iteration (s): 0.48 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.000662E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.247 | TFLOPs: 34.94 | +7: iteration 1650/ 11269 | consumed samples: 422400 | consumed tokens: 865075200 | elapsed time per iteration (s): 0.48 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.002419E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.661 | TFLOPs: 35.16 | +7: iteration 1660/ 11269 | consumed samples: 424960 | consumed tokens: 870318080 | elapsed time per iteration (s): 0.48 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.987616E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.682 | TFLOPs: 35.16 | +7: iteration 1670/ 11269 | consumed samples: 427520 | consumed tokens: 875560960 | elapsed time per iteration (s): 0.48 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.985016E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.055 | TFLOPs: 34.86 | +7: iteration 1680/ 11269 | consumed samples: 430080 | consumed tokens: 880803840 | elapsed time per iteration (s): 0.47 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 3.978141E+00 | grad norm: 0.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.052 | TFLOPs: 35.45 | +7: iteration 1690/ 11269 | consumed samples: 432640 | consumed tokens: 886046720 | elapsed time per iteration (s): 0.47 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.965922E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.765 | TFLOPs: 35.43 | +7: iteration 1700/ 11269 | consumed samples: 435200 | consumed tokens: 891289600 | elapsed time per iteration (s): 0.48 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.969548E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.608 | TFLOPs: 34.64 | +7: iteration 1710/ 11269 | consumed samples: 437760 | consumed tokens: 896532480 | elapsed time per iteration (s): 0.47 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.959496E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.551 | TFLOPs: 35.29 | +7: iteration 1720/ 11269 | consumed samples: 440320 | consumed tokens: 901775360 | elapsed time per iteration (s): 0.48 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.950530E+00 | grad norm: 0.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.559 | TFLOPs: 35.22 | +7: iteration 1730/ 11269 | consumed samples: 442880 | consumed tokens: 907018240 | elapsed time per iteration (s): 0.48 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.957402E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.389 | TFLOPs: 35.15 | +7: iteration 1740/ 11269 | consumed samples: 445440 | consumed tokens: 912261120 | elapsed time per iteration (s): 0.48 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.963531E+00 | grad norm: 0.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.673 | TFLOPs: 35.23 | +7: iteration 1750/ 11269 | consumed samples: 448000 | consumed tokens: 917504000 | elapsed time per iteration (s): 0.48 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.958447E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.423 | TFLOPs: 35.08 | +7: iteration 1760/ 11269 | consumed samples: 450560 | consumed tokens: 922746880 | elapsed time per iteration (s): 0.49 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.941109E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 527.712 | TFLOPs: 34.51 | +7: iteration 1770/ 11269 | consumed samples: 453120 | consumed tokens: 927989760 | elapsed time per iteration (s): 0.50 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.949590E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 511.991 | TFLOPs: 33.48 | +7: iteration 1780/ 11269 | consumed samples: 455680 | consumed tokens: 933232640 | elapsed time per iteration (s): 0.48 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.928989E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.474 | TFLOPs: 34.82 | +7: iteration 1790/ 11269 | consumed samples: 458240 | consumed tokens: 938475520 | elapsed time per iteration (s): 0.48 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.931721E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.513 | TFLOPs: 35.22 | +7: iteration 1800/ 11269 | consumed samples: 460800 | consumed tokens: 943718400 | elapsed time per iteration (s): 0.48 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.926086E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.562 | TFLOPs: 35.22 | +7: iteration 1810/ 11269 | consumed samples: 463360 | consumed tokens: 948961280 | elapsed time per iteration (s): 0.48 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.928977E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.281 | TFLOPs: 34.81 | +7: iteration 1820/ 11269 | consumed samples: 465920 | consumed tokens: 954204160 | elapsed time per iteration (s): 0.47 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.939436E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.582 | TFLOPs: 35.42 | +7: iteration 1830/ 11269 | consumed samples: 468480 | consumed tokens: 959447040 | elapsed time per iteration (s): 0.48 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.920950E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.510 | TFLOPs: 35.22 | +7: iteration 1840/ 11269 | consumed samples: 471040 | consumed tokens: 964689920 | elapsed time per iteration (s): 0.48 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.932523E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.223 | TFLOPs: 35.07 | +7: iteration 1850/ 11269 | consumed samples: 473600 | consumed tokens: 969932800 | elapsed time per iteration (s): 0.48 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 3.927012E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.161 | TFLOPs: 34.93 | +7: iteration 1860/ 11269 | consumed samples: 476160 | consumed tokens: 975175680 | elapsed time per iteration (s): 0.48 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.912764E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.139 | TFLOPs: 34.80 | +7: iteration 1870/ 11269 | consumed samples: 478720 | consumed tokens: 980418560 | elapsed time per iteration (s): 0.48 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.915350E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.427 | TFLOPs: 35.21 | +7: iteration 1880/ 11269 | consumed samples: 481280 | consumed tokens: 985661440 | elapsed time per iteration (s): 0.48 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.920081E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.506 | TFLOPs: 35.22 | +7: iteration 1890/ 11269 | consumed samples: 483840 | consumed tokens: 990904320 | elapsed time per iteration (s): 0.49 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.905660E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 526.902 | TFLOPs: 34.46 | +7: iteration 1900/ 11269 | consumed samples: 486400 | consumed tokens: 996147200 | elapsed time per iteration (s): 0.47 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 3.901535E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.985 | TFLOPs: 35.45 | +7: iteration 1910/ 11269 | consumed samples: 488960 | consumed tokens: 1001390080 | elapsed time per iteration (s): 0.48 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.896963E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.549 | TFLOPs: 35.02 | +7: iteration 1920/ 11269 | consumed samples: 491520 | consumed tokens: 1006632960 | elapsed time per iteration (s): 0.48 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.910717E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.075 | TFLOPs: 35.19 | +7: iteration 1930/ 11269 | consumed samples: 494080 | consumed tokens: 1011875840 | elapsed time per iteration (s): 0.47 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.892570E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.448 | TFLOPs: 35.41 | +7: iteration 1940/ 11269 | consumed samples: 496640 | consumed tokens: 1017118720 | elapsed time per iteration (s): 0.47 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.906018E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.999 | TFLOPs: 35.25 | +7: iteration 1950/ 11269 | consumed samples: 499200 | consumed tokens: 1022361600 | elapsed time per iteration (s): 0.48 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 3.884351E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.044 | TFLOPs: 34.86 | +7: iteration 1960/ 11269 | consumed samples: 501760 | consumed tokens: 1027604480 | elapsed time per iteration (s): 0.48 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.902623E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.128 | TFLOPs: 35.13 | +7: iteration 1970/ 11269 | consumed samples: 504320 | consumed tokens: 1032847360 | elapsed time per iteration (s): 0.48 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.882310E+00 | grad norm: 0.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.016 | TFLOPs: 34.99 | +7: iteration 1980/ 11269 | consumed samples: 506880 | consumed tokens: 1038090240 | elapsed time per iteration (s): 0.47 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.912272E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.733 | TFLOPs: 35.36 | +7: iteration 1990/ 11269 | consumed samples: 509440 | consumed tokens: 1043333120 | elapsed time per iteration (s): 0.48 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 3.903329E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.767 | TFLOPs: 34.78 | +0: [2023-03-15 22:15:39,143] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[0.00018758615112345138, 0.00018758615112345138, 0.00018758615112345138], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 2000/ 11269 | consumed samples: 512000 | consumed tokens: 1048576000 | elapsed time per iteration (s): 0.47 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.884674E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.859 | TFLOPs: 35.44 | +0: steps: 2000 loss: 3.8585 iter time (s): 0.482 samples/sec: 531.170 +7: ----------------------------------------------------------------------------------------------- +7: validation loss at iteration 2000 | lm loss value: 3.851890E+00 | lm loss PPL: 4.708195E+01 | +7: ----------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 2000 to checkpoints_280m5b9400m +0: [2023-03-15 22:15:39,321] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step2000 is begin to save! +0: [2023-03-15 22:15:39,324] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_01-model_00-model_states.pt... +0: [2023-03-15 22:15:39,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_01-model_00-model_states.pt. +0: [2023-03-15 22:15:39,438] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_03-model_00-model_states.pt... +0: [2023-03-15 22:15:39,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_03-model_00-model_states.pt. +0: [2023-03-15 22:15:39,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_04-model_00-model_states.pt... +0: [2023-03-15 22:15:39,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_04-model_00-model_states.pt. +0: [2023-03-15 22:15:39,488] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_05-model_00-model_states.pt... +0: [2023-03-15 22:15:39,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_05-model_00-model_states.pt. +0: [2023-03-15 22:15:39,512] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_06-model_00-model_states.pt... +0: [2023-03-15 22:15:39,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_06-model_00-model_states.pt. +0: [2023-03-15 22:15:39,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_07-model_00-model_states.pt... +0: [2023-03-15 22:15:39,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_07-model_00-model_states.pt. +0: [2023-03-15 22:15:39,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_08-model_00-model_states.pt... +0: [2023-03-15 22:15:39,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_08-model_00-model_states.pt. +0: [2023-03-15 22:15:39,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_09-model_00-model_states.pt... +0: [2023-03-15 22:15:39,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_09-model_00-model_states.pt. +0: [2023-03-15 22:15:39,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_10-model_00-model_states.pt... +0: [2023-03-15 22:15:39,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_10-model_00-model_states.pt. +0: [2023-03-15 22:15:39,635] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_11-model_00-model_states.pt... +0: [2023-03-15 22:15:39,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_11-model_00-model_states.pt. +0: [2023-03-15 22:15:39,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_12-model_00-model_states.pt... +0: [2023-03-15 22:15:39,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_12-model_00-model_states.pt. +0: [2023-03-15 22:15:39,684] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_13-model_00-model_states.pt... +0: [2023-03-15 22:15:39,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_13-model_00-model_states.pt. +0: [2023-03-15 22:15:39,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_14-model_00-model_states.pt... +0: [2023-03-15 22:15:39,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_14-model_00-model_states.pt. +0: [2023-03-15 22:15:39,733] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_15-model_00-model_states.pt... +0: [2023-03-15 22:15:39,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_15-model_00-model_states.pt. +0: [2023-03-15 22:15:39,757] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_16-model_00-model_states.pt... +0: [2023-03-15 22:15:39,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_16-model_00-model_states.pt. +0: [2023-03-15 22:15:39,782] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_17-model_00-model_states.pt... +0: [2023-03-15 22:15:39,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_17-model_00-model_states.pt. +0: [2023-03-15 22:15:39,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_18-model_00-model_states.pt... +0: [2023-03-15 22:15:39,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_18-model_00-model_states.pt. +0: [2023-03-15 22:15:39,830] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_19-model_00-model_states.pt... +0: [2023-03-15 22:15:39,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_19-model_00-model_states.pt. +0: [2023-03-15 22:15:39,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_20-model_00-model_states.pt... +0: [2023-03-15 22:15:39,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_20-model_00-model_states.pt. +0: [2023-03-15 22:15:39,879] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/layer_22-model_00-model_states.pt... +0: [2023-03-15 22:15:39,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/layer_22-model_00-model_states.pt. +0: [2023-03-15 22:15:39,881] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step2000/mp_rank_00_model_states.pt +0: [2023-03-15 22:15:39,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/mp_rank_00_model_states.pt... +0: [2023-03-15 22:15:39,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/mp_rank_00_model_states.pt. +0: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:15:39,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:15:39,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:15:39,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 22:15:39,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +7: [2023-03-15 22:15:39,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:15:39,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 22:15:39,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +2: [2023-03-15 22:15:39,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:15:39,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 22:15:39,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +5: [2023-03-15 22:15:39,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:15:39,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +4: [2023-03-15 22:15:39,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:15:39,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +4: [2023-03-15 22:15:39,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 22:15:39,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +7: [2023-03-15 22:15:39,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:15:39,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 22:15:39,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +4: [2023-03-15 22:15:39,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:15:39,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-15 22:15:39,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +6: [2023-03-15 22:15:39,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:15:39,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 22:15:39,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +6: [2023-03-15 22:15:39,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:15:39,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 22:15:39,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +4: [2023-03-15 22:15:39,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:15:39,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 22:15:39,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +7: [2023-03-15 22:15:39,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:15:39,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 22:15:39,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +4: [2023-03-15 22:15:39,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:15:39,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:15:39,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +7: [2023-03-15 22:15:39,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +4: [2023-03-15 22:15:39,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +7: [2023-03-15 22:15:39,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +5: [2023-03-15 22:15:39,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:15:39,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:15:39,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 22:15:39,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 22:15:39,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +5: [2023-03-15 22:15:39,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +5: [2023-03-15 22:15:39,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:15:39,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 22:15:39,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +0: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:15:39,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 22:15:39,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +0: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +6: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:15:39,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +0: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:15:39,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +6: [2023-03-15 22:15:39,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +0: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +6: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +7: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:15:39,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +4: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:15:39,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 22:15:39,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +6: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:15:39,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:15:39,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +1: [2023-03-15 22:15:39,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +6: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +1: [2023-03-15 22:15:39,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +6: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:15:39,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +5: [2023-03-15 22:15:39,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:15:39,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 22:15:39,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +7: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:15:39,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +7: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:15:39,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +4: [2023-03-15 22:15:39,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +7: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +4: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +7: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:15:39,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +3: [2023-03-15 22:15:39,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +3: [2023-03-15 22:15:39,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:15:39,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +3: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:15:39,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 22:15:39,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +3: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:15:39,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 22:15:39,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +4: [2023-03-15 22:15:39,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:15:39,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:15:39,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +2: [2023-03-15 22:15:39,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:15:39,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +2: [2023-03-15 22:15:39,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +4: [2023-03-15 22:15:39,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +4: [2023-03-15 22:15:39,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +2: [2023-03-15 22:15:39,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +2: [2023-03-15 22:15:39,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:15:39,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +2: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:15:39,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 22:15:39,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +2: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +3: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:15:39,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +5: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:15:39,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +3: [2023-03-15 22:15:39,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:15:39,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 22:15:39,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +3: [2023-03-15 22:15:39,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:15:39,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:15:39,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +5: [2023-03-15 22:15:39,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +3: [2023-03-15 22:15:39,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +5: [2023-03-15 22:15:39,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +5: [2023-03-15 22:15:39,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:15:39,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 22:15:39,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +2: [2023-03-15 22:15:39,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:15:39,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 22:15:39,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +2: [2023-03-15 22:15:39,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:15:39,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 22:15:39,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +2: [2023-03-15 22:15:39,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:15:39,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 22:15:39,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +1: [2023-03-15 22:15:39,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:15:39,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 22:15:39,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +1: [2023-03-15 22:15:39,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:15:39,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 22:15:39,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +1: [2023-03-15 22:15:39,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:15:39,996] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 22:15:39,996] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +0: [2023-03-15 22:15:40,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:15:40,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:15:40,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 22:15:40,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:15:40,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +0: [2023-03-15 22:15:40,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 22:15:40,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +1: [2023-03-15 22:15:40,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:15:40,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 22:15:40,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +0: [2023-03-15 22:15:40,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:15:40,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 22:15:40,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +1: [2023-03-15 22:15:40,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:15:40,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 22:15:40,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +6: [2023-03-15 22:15:40,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:15:40,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 22:15:40,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +0: [2023-03-15 22:15:40,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:15:40,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 22:15:40,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +1: [2023-03-15 22:15:40,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:15:40,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-15 22:15:40,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +1: [2023-03-15 22:15:40,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:15:40,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 22:15:40,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +6: [2023-03-15 22:15:40,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:15:40,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 22:15:40,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +0: [2023-03-15 22:15:40,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step2000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 22:15:40,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! +0: successfully saved checkpoint at iteration 2000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 744.85 +7: iteration 2010/ 11269 | consumed samples: 514560 | consumed tokens: 1053818880 | elapsed time per iteration (s): 0.57 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.882499E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 450.051 | TFLOPs: 29.43 | +7: iteration 2020/ 11269 | consumed samples: 517120 | consumed tokens: 1059061760 | elapsed time per iteration (s): 0.48 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.884430E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.585 | TFLOPs: 35.22 | +7: iteration 2030/ 11269 | consumed samples: 519680 | consumed tokens: 1064304640 | elapsed time per iteration (s): 0.48 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 3.887638E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.488 | TFLOPs: 34.89 | +7: iteration 2040/ 11269 | consumed samples: 522240 | consumed tokens: 1069547520 | elapsed time per iteration (s): 0.48 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.870945E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.465 | TFLOPs: 35.02 | +7: iteration 2050/ 11269 | consumed samples: 524800 | consumed tokens: 1074790400 | elapsed time per iteration (s): 0.48 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.869390E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.743 | TFLOPs: 35.10 | +7: iteration 2060/ 11269 | consumed samples: 527360 | consumed tokens: 1080033280 | elapsed time per iteration (s): 0.48 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.876960E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.831 | TFLOPs: 35.04 | +7: iteration 2070/ 11269 | consumed samples: 529920 | consumed tokens: 1085276160 | elapsed time per iteration (s): 0.48 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 3.859916E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.341 | TFLOPs: 35.14 | +7: iteration 2080/ 11269 | consumed samples: 532480 | consumed tokens: 1090519040 | elapsed time per iteration (s): 0.48 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.851398E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 529.770 | TFLOPs: 34.65 | +7: iteration 2090/ 11269 | consumed samples: 535040 | consumed tokens: 1095761920 | elapsed time per iteration (s): 0.47 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.851305E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.674 | TFLOPs: 35.43 | +7: iteration 2100/ 11269 | consumed samples: 537600 | consumed tokens: 1101004800 | elapsed time per iteration (s): 0.48 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 3.839741E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.308 | TFLOPs: 35.07 | +7: iteration 2110/ 11269 | consumed samples: 540160 | consumed tokens: 1106247680 | elapsed time per iteration (s): 0.48 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.844390E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.271 | TFLOPs: 35.20 | +7: iteration 2120/ 11269 | consumed samples: 542720 | consumed tokens: 1111490560 | elapsed time per iteration (s): 0.48 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.840216E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.916 | TFLOPs: 35.18 | +7: iteration 2130/ 11269 | consumed samples: 545280 | consumed tokens: 1116733440 | elapsed time per iteration (s): 0.48 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.850616E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.176 | TFLOPs: 35.07 | +7: iteration 2140/ 11269 | consumed samples: 547840 | consumed tokens: 1121976320 | elapsed time per iteration (s): 0.47 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.836988E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.656 | TFLOPs: 35.42 | +7: iteration 2150/ 11269 | consumed samples: 550400 | consumed tokens: 1127219200 | elapsed time per iteration (s): 0.48 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.842624E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.060 | TFLOPs: 34.93 | +7: iteration 2160/ 11269 | consumed samples: 552960 | consumed tokens: 1132462080 | elapsed time per iteration (s): 0.47 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 3.845037E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.450 | TFLOPs: 35.28 | +7: iteration 2170/ 11269 | consumed samples: 555520 | consumed tokens: 1137704960 | elapsed time per iteration (s): 0.48 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.845623E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.978 | TFLOPs: 35.05 | +7: iteration 2180/ 11269 | consumed samples: 558080 | consumed tokens: 1142947840 | elapsed time per iteration (s): 0.48 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.837968E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.960 | TFLOPs: 35.18 | +7: iteration 2190/ 11269 | consumed samples: 560640 | consumed tokens: 1148190720 | elapsed time per iteration (s): 0.47 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.824232E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.743 | TFLOPs: 35.43 | +7: iteration 2200/ 11269 | consumed samples: 563200 | consumed tokens: 1153433600 | elapsed time per iteration (s): 0.48 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.828149E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.316 | TFLOPs: 34.88 | +7: iteration 2210/ 11269 | consumed samples: 565760 | consumed tokens: 1158676480 | elapsed time per iteration (s): 0.47 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 3.823164E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.684 | TFLOPs: 35.43 | +7: iteration 2220/ 11269 | consumed samples: 568320 | consumed tokens: 1163919360 | elapsed time per iteration (s): 0.48 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.819661E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.423 | TFLOPs: 34.56 | +7: iteration 2230/ 11269 | consumed samples: 570880 | consumed tokens: 1169162240 | elapsed time per iteration (s): 0.48 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.820479E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.924 | TFLOPs: 35.25 | +7: iteration 2240/ 11269 | consumed samples: 573440 | consumed tokens: 1174405120 | elapsed time per iteration (s): 0.47 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.823940E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.101 | TFLOPs: 35.26 | +7: iteration 2250/ 11269 | consumed samples: 576000 | consumed tokens: 1179648000 | elapsed time per iteration (s): 0.48 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.828367E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.266 | TFLOPs: 35.14 | +7: iteration 2260/ 11269 | consumed samples: 578560 | consumed tokens: 1184890880 | elapsed time per iteration (s): 0.47 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.815744E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.571 | TFLOPs: 35.29 | +7: iteration 2270/ 11269 | consumed samples: 581120 | consumed tokens: 1190133760 | elapsed time per iteration (s): 0.48 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.817776E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.299 | TFLOPs: 34.75 | +7: iteration 2280/ 11269 | consumed samples: 583680 | consumed tokens: 1195376640 | elapsed time per iteration (s): 0.47 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.802319E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.723 | TFLOPs: 35.43 | +7: iteration 2290/ 11269 | consumed samples: 586240 | consumed tokens: 1200619520 | elapsed time per iteration (s): 0.49 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.803706E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 520.957 | TFLOPs: 34.07 | +7: iteration 2300/ 11269 | consumed samples: 588800 | consumed tokens: 1205862400 | elapsed time per iteration (s): 0.47 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.797497E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.773 | TFLOPs: 35.43 | +7: iteration 2310/ 11269 | consumed samples: 591360 | consumed tokens: 1211105280 | elapsed time per iteration (s): 0.48 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.811961E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.163 | TFLOPs: 35.06 | +7: iteration 2320/ 11269 | consumed samples: 593920 | consumed tokens: 1216348160 | elapsed time per iteration (s): 0.48 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.815844E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.503 | TFLOPs: 35.22 | +7: iteration 2330/ 11269 | consumed samples: 596480 | consumed tokens: 1221591040 | elapsed time per iteration (s): 0.48 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.798663E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.136 | TFLOPs: 35.19 | +7: iteration 2340/ 11269 | consumed samples: 599040 | consumed tokens: 1226833920 | elapsed time per iteration (s): 0.47 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.810221E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.398 | TFLOPs: 35.28 | +7: iteration 2350/ 11269 | consumed samples: 601600 | consumed tokens: 1232076800 | elapsed time per iteration (s): 0.48 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 3.803239E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.027 | TFLOPs: 35.19 | +7: iteration 2360/ 11269 | consumed samples: 604160 | consumed tokens: 1237319680 | elapsed time per iteration (s): 0.48 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.800451E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.479 | TFLOPs: 35.22 | +7: iteration 2370/ 11269 | consumed samples: 606720 | consumed tokens: 1242562560 | elapsed time per iteration (s): 0.48 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 3.791774E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.122 | TFLOPs: 35.13 | +7: iteration 2380/ 11269 | consumed samples: 609280 | consumed tokens: 1247805440 | elapsed time per iteration (s): 0.47 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.802037E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.485 | TFLOPs: 35.41 | +7: iteration 2390/ 11269 | consumed samples: 611840 | consumed tokens: 1253048320 | elapsed time per iteration (s): 0.47 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 3.794889E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.245 | TFLOPs: 35.27 | +7: iteration 2400/ 11269 | consumed samples: 614400 | consumed tokens: 1258291200 | elapsed time per iteration (s): 0.47 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.777916E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.599 | TFLOPs: 35.42 | +7: iteration 2410/ 11269 | consumed samples: 616960 | consumed tokens: 1263534080 | elapsed time per iteration (s): 0.47 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 3.786474E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.462 | TFLOPs: 35.41 | +7: iteration 2420/ 11269 | consumed samples: 619520 | consumed tokens: 1268776960 | elapsed time per iteration (s): 0.48 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.784957E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.828 | TFLOPs: 34.98 | +7: iteration 2430/ 11269 | consumed samples: 622080 | consumed tokens: 1274019840 | elapsed time per iteration (s): 0.47 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 3.775450E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.556 | TFLOPs: 35.42 | +7: iteration 2440/ 11269 | consumed samples: 624640 | consumed tokens: 1279262720 | elapsed time per iteration (s): 0.48 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.775534E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.391 | TFLOPs: 35.21 | +7: iteration 2450/ 11269 | consumed samples: 627200 | consumed tokens: 1284505600 | elapsed time per iteration (s): 0.47 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 3.786253E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.987 | TFLOPs: 35.25 | +7: iteration 2460/ 11269 | consumed samples: 629760 | consumed tokens: 1289748480 | elapsed time per iteration (s): 0.47 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.775782E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.429 | TFLOPs: 35.28 | +7: iteration 2470/ 11269 | consumed samples: 632320 | consumed tokens: 1294991360 | elapsed time per iteration (s): 0.47 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 3.776648E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.418 | TFLOPs: 35.28 | +7: iteration 2480/ 11269 | consumed samples: 634880 | consumed tokens: 1300234240 | elapsed time per iteration (s): 0.47 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.790529E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.532 | TFLOPs: 35.42 | +7: iteration 2490/ 11269 | consumed samples: 637440 | consumed tokens: 1305477120 | elapsed time per iteration (s): 0.47 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 3.769222E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.451 | TFLOPs: 35.41 | +7: iteration 2500/ 11269 | consumed samples: 640000 | consumed tokens: 1310720000 | elapsed time per iteration (s): 0.47 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.775280E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.491 | TFLOPs: 35.41 | +7: iteration 2510/ 11269 | consumed samples: 642560 | consumed tokens: 1315962880 | elapsed time per iteration (s): 0.48 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 3.749633E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.124 | TFLOPs: 35.19 | +7: iteration 2520/ 11269 | consumed samples: 645120 | consumed tokens: 1321205760 | elapsed time per iteration (s): 0.47 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 3.765163E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.805 | TFLOPs: 35.43 | +7: iteration 2530/ 11269 | consumed samples: 647680 | consumed tokens: 1326448640 | elapsed time per iteration (s): 0.47 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.776926E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.633 | TFLOPs: 35.29 | +7: iteration 2540/ 11269 | consumed samples: 650240 | consumed tokens: 1331691520 | elapsed time per iteration (s): 0.47 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 3.774195E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.261 | TFLOPs: 35.33 | +7: iteration 2550/ 11269 | consumed samples: 652800 | consumed tokens: 1336934400 | elapsed time per iteration (s): 0.48 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.748422E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.331 | TFLOPs: 35.14 | +7: iteration 2560/ 11269 | consumed samples: 655360 | consumed tokens: 1342177280 | elapsed time per iteration (s): 0.47 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 3.760729E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.308 | TFLOPs: 35.40 | +7: iteration 2570/ 11269 | consumed samples: 657920 | consumed tokens: 1347420160 | elapsed time per iteration (s): 0.48 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.756708E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.969 | TFLOPs: 34.99 | +7: iteration 2580/ 11269 | consumed samples: 660480 | consumed tokens: 1352663040 | elapsed time per iteration (s): 0.48 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.748076E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.272 | TFLOPs: 35.14 | +7: iteration 2590/ 11269 | consumed samples: 663040 | consumed tokens: 1357905920 | elapsed time per iteration (s): 0.48 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 3.750365E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.705 | TFLOPs: 35.17 | +7: iteration 2600/ 11269 | consumed samples: 665600 | consumed tokens: 1363148800 | elapsed time per iteration (s): 0.47 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.749382E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.449 | TFLOPs: 35.41 | +7: iteration 2610/ 11269 | consumed samples: 668160 | consumed tokens: 1368391680 | elapsed time per iteration (s): 0.48 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.756418E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.458 | TFLOPs: 35.08 | +7: iteration 2620/ 11269 | consumed samples: 670720 | consumed tokens: 1373634560 | elapsed time per iteration (s): 0.48 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.745824E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.754 | TFLOPs: 35.23 | +7: iteration 2630/ 11269 | consumed samples: 673280 | consumed tokens: 1378877440 | elapsed time per iteration (s): 0.48 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.745185E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.268 | TFLOPs: 35.14 | +7: iteration 2640/ 11269 | consumed samples: 675840 | consumed tokens: 1384120320 | elapsed time per iteration (s): 0.48 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 3.752087E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.026 | TFLOPs: 35.19 | +7: iteration 2650/ 11269 | consumed samples: 678400 | consumed tokens: 1389363200 | elapsed time per iteration (s): 0.48 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.737976E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.916 | TFLOPs: 35.11 | +7: iteration 2660/ 11269 | consumed samples: 680960 | consumed tokens: 1394606080 | elapsed time per iteration (s): 0.48 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.742223E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.095 | TFLOPs: 35.06 | +7: iteration 2670/ 11269 | consumed samples: 683520 | consumed tokens: 1399848960 | elapsed time per iteration (s): 0.48 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 3.751134E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.983 | TFLOPs: 34.86 | +7: iteration 2680/ 11269 | consumed samples: 686080 | consumed tokens: 1405091840 | elapsed time per iteration (s): 0.48 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.736777E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.631 | TFLOPs: 35.03 | +7: iteration 2690/ 11269 | consumed samples: 688640 | consumed tokens: 1410334720 | elapsed time per iteration (s): 0.47 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.746079E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.593 | TFLOPs: 35.35 | +7: iteration 2700/ 11269 | consumed samples: 691200 | consumed tokens: 1415577600 | elapsed time per iteration (s): 0.48 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.735019E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.296 | TFLOPs: 35.14 | +7: iteration 2710/ 11269 | consumed samples: 693760 | consumed tokens: 1420820480 | elapsed time per iteration (s): 0.47 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.735714E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.660 | TFLOPs: 35.42 | +7: iteration 2720/ 11269 | consumed samples: 696320 | consumed tokens: 1426063360 | elapsed time per iteration (s): 0.48 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.720799E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.175 | TFLOPs: 35.13 | +7: iteration 2730/ 11269 | consumed samples: 698880 | consumed tokens: 1431306240 | elapsed time per iteration (s): 0.48 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 3.731408E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.472 | TFLOPs: 35.22 | +7: iteration 2740/ 11269 | consumed samples: 701440 | consumed tokens: 1436549120 | elapsed time per iteration (s): 0.47 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.739030E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.521 | TFLOPs: 35.42 | +7: iteration 2750/ 11269 | consumed samples: 704000 | consumed tokens: 1441792000 | elapsed time per iteration (s): 0.48 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.731999E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.702 | TFLOPs: 35.03 | +7: iteration 2760/ 11269 | consumed samples: 706560 | consumed tokens: 1447034880 | elapsed time per iteration (s): 0.47 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 3.724368E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.640 | TFLOPs: 35.42 | +7: iteration 2770/ 11269 | consumed samples: 709120 | consumed tokens: 1452277760 | elapsed time per iteration (s): 0.48 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.731783E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.201 | TFLOPs: 35.07 | +7: iteration 2780/ 11269 | consumed samples: 711680 | consumed tokens: 1457520640 | elapsed time per iteration (s): 0.48 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.711679E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.784 | TFLOPs: 35.11 | +7: iteration 2790/ 11269 | consumed samples: 714240 | consumed tokens: 1462763520 | elapsed time per iteration (s): 0.47 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.721898E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.990 | TFLOPs: 35.45 | +7: iteration 2800/ 11269 | consumed samples: 716800 | consumed tokens: 1468006400 | elapsed time per iteration (s): 0.48 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 3.718946E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.908 | TFLOPs: 35.11 | +7: iteration 2810/ 11269 | consumed samples: 719360 | consumed tokens: 1473249280 | elapsed time per iteration (s): 0.47 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.719162E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.913 | TFLOPs: 35.44 | +7: iteration 2820/ 11269 | consumed samples: 721920 | consumed tokens: 1478492160 | elapsed time per iteration (s): 0.48 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.706026E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.900 | TFLOPs: 35.11 | +7: iteration 2830/ 11269 | consumed samples: 724480 | consumed tokens: 1483735040 | elapsed time per iteration (s): 0.47 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 3.696422E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.468 | TFLOPs: 35.41 | +7: iteration 2840/ 11269 | consumed samples: 727040 | consumed tokens: 1488977920 | elapsed time per iteration (s): 0.47 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 3.692456E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.642 | TFLOPs: 35.36 | +7: iteration 2850/ 11269 | consumed samples: 729600 | consumed tokens: 1494220800 | elapsed time per iteration (s): 0.47 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.685821E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.673 | TFLOPs: 35.43 | +7: iteration 2860/ 11269 | consumed samples: 732160 | consumed tokens: 1499463680 | elapsed time per iteration (s): 0.47 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.705552E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.618 | TFLOPs: 35.29 | +7: iteration 2870/ 11269 | consumed samples: 734720 | consumed tokens: 1504706560 | elapsed time per iteration (s): 0.48 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 3.698108E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.991 | TFLOPs: 34.86 | +7: iteration 2880/ 11269 | consumed samples: 737280 | consumed tokens: 1509949440 | elapsed time per iteration (s): 0.49 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 3.706473E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 519.494 | TFLOPs: 33.97 | +7: iteration 2890/ 11269 | consumed samples: 739840 | consumed tokens: 1515192320 | elapsed time per iteration (s): 0.47 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.711499E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.080 | TFLOPs: 35.45 | +7: iteration 2900/ 11269 | consumed samples: 742400 | consumed tokens: 1520435200 | elapsed time per iteration (s): 0.48 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 3.710240E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.097 | TFLOPs: 35.19 | +7: iteration 2910/ 11269 | consumed samples: 744960 | consumed tokens: 1525678080 | elapsed time per iteration (s): 0.47 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 3.713371E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.133 | TFLOPs: 35.46 | +7: iteration 2920/ 11269 | consumed samples: 747520 | consumed tokens: 1530920960 | elapsed time per iteration (s): 0.48 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 3.702215E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.932 | TFLOPs: 35.25 | +7: iteration 2930/ 11269 | consumed samples: 750080 | consumed tokens: 1536163840 | elapsed time per iteration (s): 0.47 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 3.699169E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.240 | TFLOPs: 35.27 | +7: iteration 2940/ 11269 | consumed samples: 752640 | consumed tokens: 1541406720 | elapsed time per iteration (s): 0.47 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.703460E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.050 | TFLOPs: 35.45 | +7: iteration 2950/ 11269 | consumed samples: 755200 | consumed tokens: 1546649600 | elapsed time per iteration (s): 0.48 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 3.700461E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.859 | TFLOPs: 35.04 | +7: iteration 2960/ 11269 | consumed samples: 757760 | consumed tokens: 1551892480 | elapsed time per iteration (s): 0.47 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 3.681445E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.029 | TFLOPs: 35.38 | +7: iteration 2970/ 11269 | consumed samples: 760320 | consumed tokens: 1557135360 | elapsed time per iteration (s): 0.47 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 3.703711E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.918 | TFLOPs: 35.44 | +7: iteration 2980/ 11269 | consumed samples: 762880 | consumed tokens: 1562378240 | elapsed time per iteration (s): 0.48 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.691098E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.650 | TFLOPs: 35.03 | +7: iteration 2990/ 11269 | consumed samples: 765440 | consumed tokens: 1567621120 | elapsed time per iteration (s): 0.47 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.701260E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.292 | TFLOPs: 35.40 | +7: iteration 3000/ 11269 | consumed samples: 768000 | consumed tokens: 1572864000 | elapsed time per iteration (s): 0.48 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 3.698249E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.464 | TFLOPs: 34.82 | +7: ----------------------------------------------------------------------------------------------- +7: validation loss at iteration 3000 | lm loss value: 3.670590E+00 | lm loss PPL: 3.927506E+01 | +7: ----------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 3000 to checkpoints_280m5b9400m +0: [2023-03-15 22:23:36,091] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step3000 is begin to save! +0: [2023-03-15 22:23:36,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_01-model_00-model_states.pt... +0: [2023-03-15 22:23:36,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_01-model_00-model_states.pt. +0: [2023-03-15 22:23:36,207] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_03-model_00-model_states.pt... +0: [2023-03-15 22:23:36,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_03-model_00-model_states.pt. +0: [2023-03-15 22:23:36,232] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_04-model_00-model_states.pt... +0: [2023-03-15 22:23:36,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_04-model_00-model_states.pt. +0: [2023-03-15 22:23:36,256] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_05-model_00-model_states.pt... +0: [2023-03-15 22:23:36,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_05-model_00-model_states.pt. +0: [2023-03-15 22:23:36,280] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_06-model_00-model_states.pt... +0: [2023-03-15 22:23:36,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_06-model_00-model_states.pt. +0: [2023-03-15 22:23:36,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_07-model_00-model_states.pt... +0: [2023-03-15 22:23:36,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_07-model_00-model_states.pt. +0: [2023-03-15 22:23:36,328] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_08-model_00-model_states.pt... +0: [2023-03-15 22:23:36,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_08-model_00-model_states.pt. +0: [2023-03-15 22:23:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_09-model_00-model_states.pt... +0: [2023-03-15 22:23:36,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_09-model_00-model_states.pt. +0: [2023-03-15 22:23:36,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_10-model_00-model_states.pt... +0: [2023-03-15 22:23:36,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_10-model_00-model_states.pt. +0: [2023-03-15 22:23:36,400] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_11-model_00-model_states.pt... +0: [2023-03-15 22:23:36,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_11-model_00-model_states.pt. +0: [2023-03-15 22:23:36,424] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_12-model_00-model_states.pt... +0: [2023-03-15 22:23:36,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_12-model_00-model_states.pt. +0: [2023-03-15 22:23:36,448] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_13-model_00-model_states.pt... +0: [2023-03-15 22:23:36,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_13-model_00-model_states.pt. +0: [2023-03-15 22:23:36,472] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_14-model_00-model_states.pt... +0: [2023-03-15 22:23:36,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_14-model_00-model_states.pt. +0: [2023-03-15 22:23:36,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_15-model_00-model_states.pt... +0: [2023-03-15 22:23:36,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_15-model_00-model_states.pt. +0: [2023-03-15 22:23:36,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_16-model_00-model_states.pt... +0: [2023-03-15 22:23:36,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_16-model_00-model_states.pt. +0: [2023-03-15 22:23:36,543] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_17-model_00-model_states.pt... +0: [2023-03-15 22:23:36,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_17-model_00-model_states.pt. +0: [2023-03-15 22:23:36,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_18-model_00-model_states.pt... +0: [2023-03-15 22:23:36,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_18-model_00-model_states.pt. +0: [2023-03-15 22:23:36,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_19-model_00-model_states.pt... +0: [2023-03-15 22:23:36,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_19-model_00-model_states.pt. +0: [2023-03-15 22:23:36,615] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_20-model_00-model_states.pt... +0: [2023-03-15 22:23:36,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_20-model_00-model_states.pt. +0: [2023-03-15 22:23:36,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/layer_22-model_00-model_states.pt... +0: [2023-03-15 22:23:36,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/layer_22-model_00-model_states.pt. +0: [2023-03-15 22:23:36,641] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step3000/mp_rank_00_model_states.pt +0: [2023-03-15 22:23:36,641] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/mp_rank_00_model_states.pt... +0: [2023-03-15 22:23:36,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/mp_rank_00_model_states.pt. +0: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:23:36,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:23:36,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 22:23:36,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 22:23:36,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +2: [2023-03-15 22:23:36,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +2: [2023-03-15 22:23:36,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 22:23:36,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +0: [2023-03-15 22:23:36,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:23:36,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:23:36,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 22:23:36,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +7: [2023-03-15 22:23:36,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:23:36,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 22:23:36,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +6: [2023-03-15 22:23:36,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 22:23:36,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +2: [2023-03-15 22:23:36,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:23:36,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +6: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +4: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +4: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:23:36,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +4: [2023-03-15 22:23:36,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +4: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:23:36,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +2: [2023-03-15 22:23:36,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:23:36,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 22:23:36,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +6: [2023-03-15 22:23:36,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 22:23:36,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +1: [2023-03-15 22:23:36,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:23:36,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 22:23:36,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +4: [2023-03-15 22:23:36,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:23:36,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-15 22:23:36,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +4: [2023-03-15 22:23:36,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:23:36,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 22:23:36,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +6: [2023-03-15 22:23:36,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +5: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:23:36,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 22:23:36,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +5: [2023-03-15 22:23:36,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +5: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +7: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:23:36,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +7: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:23:36,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 22:23:36,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +7: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +4: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:23:36,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +7: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:23:36,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +0: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:23:36,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:23:36,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 22:23:36,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 22:23:36,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +0: [2023-03-15 22:23:36,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +1: [2023-03-15 22:23:36,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:23:36,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 22:23:36,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +2: [2023-03-15 22:23:36,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:23:36,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 22:23:36,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +2: [2023-03-15 22:23:36,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:23:36,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 22:23:36,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +2: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:23:36,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 22:23:36,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +2: [2023-03-15 22:23:36,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:23:36,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 22:23:36,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +2: [2023-03-15 22:23:36,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:23:36,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 22:23:36,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +6: [2023-03-15 22:23:36,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 22:23:36,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +4: [2023-03-15 22:23:36,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:23:36,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-15 22:23:36,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +0: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:23:36,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 22:23:36,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 22:23:36,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +0: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +0: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +5: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:23:36,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 22:23:36,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +3: [2023-03-15 22:23:36,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:23:36,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 22:23:36,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +3: [2023-03-15 22:23:36,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:23:36,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:23:36,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 22:23:36,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 22:23:36,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +3: [2023-03-15 22:23:36,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +3: [2023-03-15 22:23:36,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:23:36,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 22:23:36,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +3: [2023-03-15 22:23:36,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:23:36,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:23:36,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-15 22:23:36,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 22:23:36,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +3: [2023-03-15 22:23:36,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +3: [2023-03-15 22:23:36,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:23:36,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 22:23:36,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +5: [2023-03-15 22:23:36,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +5: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:23:36,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:23:36,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 22:23:36,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 22:23:36,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +5: [2023-03-15 22:23:36,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +4: [2023-03-15 22:23:36,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:23:36,760] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 22:23:36,760] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +0: [2023-03-15 22:23:36,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:23:36,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 22:23:36,763] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +1: [2023-03-15 22:23:36,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:23:36,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 22:23:36,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +7: [2023-03-15 22:23:36,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:23:36,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 22:23:36,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +6: [2023-03-15 22:23:36,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:23:36,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 22:23:36,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +0: [2023-03-15 22:23:36,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 22:23:36,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +1: [2023-03-15 22:23:36,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:23:36,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 22:23:36,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:23:36,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +1: [2023-03-15 22:23:36,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-15 22:23:36,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +1: [2023-03-15 22:23:36,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:23:36,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 22:23:36,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +1: [2023-03-15 22:23:36,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:23:36,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 22:23:36,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +1: [2023-03-15 22:23:36,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:23:36,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 22:23:36,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +5: [2023-03-15 22:23:36,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:23:36,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 22:23:36,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +7: [2023-03-15 22:23:36,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:23:36,812] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 22:23:36,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +7: [2023-03-15 22:23:36,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:23:36,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 22:23:36,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +5: [2023-03-15 22:23:36,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:23:36,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step3000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 22:23:36,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! +0: successfully saved checkpoint at iteration 3000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 732.02 +7: iteration 3010/ 11269 | consumed samples: 770560 | consumed tokens: 1578106880 | elapsed time per iteration (s): 0.56 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 3.683694E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 457.997 | TFLOPs: 29.95 | +7: iteration 3020/ 11269 | consumed samples: 773120 | consumed tokens: 1583349760 | elapsed time per iteration (s): 0.48 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 3.698830E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.913 | TFLOPs: 34.98 | +7: iteration 3030/ 11269 | consumed samples: 775680 | consumed tokens: 1588592640 | elapsed time per iteration (s): 0.48 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.693024E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.764 | TFLOPs: 35.23 | +7: iteration 3040/ 11269 | consumed samples: 778240 | consumed tokens: 1593835520 | elapsed time per iteration (s): 0.47 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.683218E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.203 | TFLOPs: 35.46 | +7: iteration 3050/ 11269 | consumed samples: 780800 | consumed tokens: 1599078400 | elapsed time per iteration (s): 0.47 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.676220E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.036 | TFLOPs: 35.45 | +7: iteration 3060/ 11269 | consumed samples: 783360 | consumed tokens: 1604321280 | elapsed time per iteration (s): 0.47 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.687749E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.487 | TFLOPs: 35.35 | +7: iteration 3070/ 11269 | consumed samples: 785920 | consumed tokens: 1609564160 | elapsed time per iteration (s): 0.48 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 3.673960E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.376 | TFLOPs: 35.21 | +7: iteration 3080/ 11269 | consumed samples: 788480 | consumed tokens: 1614807040 | elapsed time per iteration (s): 0.48 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 3.676728E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.347 | TFLOPs: 35.14 | +7: iteration 3090/ 11269 | consumed samples: 791040 | consumed tokens: 1620049920 | elapsed time per iteration (s): 0.48 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.670569E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.855 | TFLOPs: 35.11 | +7: iteration 3100/ 11269 | consumed samples: 793600 | consumed tokens: 1625292800 | elapsed time per iteration (s): 0.48 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.669370E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.138 | TFLOPs: 35.13 | +7: iteration 3110/ 11269 | consumed samples: 796160 | consumed tokens: 1630535680 | elapsed time per iteration (s): 0.48 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.682632E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.382 | TFLOPs: 35.01 | +7: iteration 3120/ 11269 | consumed samples: 798720 | consumed tokens: 1635778560 | elapsed time per iteration (s): 0.47 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.686160E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.758 | TFLOPs: 35.43 | +7: iteration 3130/ 11269 | consumed samples: 801280 | consumed tokens: 1641021440 | elapsed time per iteration (s): 0.48 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.668792E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.501 | TFLOPs: 35.09 | +7: iteration 3140/ 11269 | consumed samples: 803840 | consumed tokens: 1646264320 | elapsed time per iteration (s): 0.47 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.671898E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.615 | TFLOPs: 35.49 | +7: iteration 3150/ 11269 | consumed samples: 806400 | consumed tokens: 1651507200 | elapsed time per iteration (s): 0.47 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.662459E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.183 | TFLOPs: 35.26 | +7: iteration 3160/ 11269 | consumed samples: 808960 | consumed tokens: 1656750080 | elapsed time per iteration (s): 0.47 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 3.664626E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.501 | TFLOPs: 35.48 | +7: iteration 3170/ 11269 | consumed samples: 811520 | consumed tokens: 1661992960 | elapsed time per iteration (s): 0.47 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.653599E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.668 | TFLOPs: 35.29 | +7: iteration 3180/ 11269 | consumed samples: 814080 | consumed tokens: 1667235840 | elapsed time per iteration (s): 0.47 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.661003E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.677 | TFLOPs: 35.49 | +7: iteration 3190/ 11269 | consumed samples: 816640 | consumed tokens: 1672478720 | elapsed time per iteration (s): 0.48 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.694192E+00 | grad norm: 4.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.714 | TFLOPs: 35.10 | +7: iteration 3200/ 11269 | consumed samples: 819200 | consumed tokens: 1677721600 | elapsed time per iteration (s): 0.47 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.942126E+00 | grad norm: 3.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.183 | TFLOPs: 35.33 | +7: iteration 3210/ 11269 | consumed samples: 821760 | consumed tokens: 1682964480 | elapsed time per iteration (s): 0.48 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.889553E+00 | grad norm: 1.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.353 | TFLOPs: 34.75 | +7: iteration 3220/ 11269 | consumed samples: 824320 | consumed tokens: 1688207360 | elapsed time per iteration (s): 0.47 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.836391E+00 | grad norm: 0.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.287 | TFLOPs: 35.40 | +7: iteration 3230/ 11269 | consumed samples: 826880 | consumed tokens: 1693450240 | elapsed time per iteration (s): 0.48 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.749614E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.586 | TFLOPs: 35.22 | +7: iteration 3240/ 11269 | consumed samples: 829440 | consumed tokens: 1698693120 | elapsed time per iteration (s): 0.47 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.722608E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.081 | TFLOPs: 35.45 | +7: iteration 3250/ 11269 | consumed samples: 832000 | consumed tokens: 1703936000 | elapsed time per iteration (s): 0.47 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 3.688021E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.870 | TFLOPs: 35.44 | +7: iteration 3260/ 11269 | consumed samples: 834560 | consumed tokens: 1709178880 | elapsed time per iteration (s): 0.47 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 3.684510E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.186 | TFLOPs: 35.46 | +7: iteration 3270/ 11269 | consumed samples: 837120 | consumed tokens: 1714421760 | elapsed time per iteration (s): 0.48 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 3.676788E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.360 | TFLOPs: 35.01 | +7: iteration 3280/ 11269 | consumed samples: 839680 | consumed tokens: 1719664640 | elapsed time per iteration (s): 0.47 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 3.669516E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.864 | TFLOPs: 35.44 | +7: iteration 3290/ 11269 | consumed samples: 842240 | consumed tokens: 1724907520 | elapsed time per iteration (s): 0.47 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 3.668855E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.729 | TFLOPs: 35.43 | +7: iteration 3300/ 11269 | consumed samples: 844800 | consumed tokens: 1730150400 | elapsed time per iteration (s): 0.48 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 3.668782E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.951 | TFLOPs: 34.79 | +7: iteration 3310/ 11269 | consumed samples: 847360 | consumed tokens: 1735393280 | elapsed time per iteration (s): 0.47 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 3.662261E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.766 | TFLOPs: 35.30 | +7: iteration 3320/ 11269 | consumed samples: 849920 | consumed tokens: 1740636160 | elapsed time per iteration (s): 0.47 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 3.658845E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.253 | TFLOPs: 35.46 | +7: iteration 3330/ 11269 | consumed samples: 852480 | consumed tokens: 1745879040 | elapsed time per iteration (s): 0.47 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 3.656601E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.050 | TFLOPs: 35.45 | +7: iteration 3340/ 11269 | consumed samples: 855040 | consumed tokens: 1751121920 | elapsed time per iteration (s): 0.47 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 3.655730E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.196 | TFLOPs: 35.46 | +7: iteration 3350/ 11269 | consumed samples: 857600 | consumed tokens: 1756364800 | elapsed time per iteration (s): 0.47 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 3.646388E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.725 | TFLOPs: 35.30 | +7: iteration 3360/ 11269 | consumed samples: 860160 | consumed tokens: 1761607680 | elapsed time per iteration (s): 0.47 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 3.641381E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.328 | TFLOPs: 35.47 | +7: iteration 3370/ 11269 | consumed samples: 862720 | consumed tokens: 1766850560 | elapsed time per iteration (s): 0.47 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 3.639558E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.160 | TFLOPs: 35.46 | +7: iteration 3380/ 11269 | consumed samples: 865280 | consumed tokens: 1772093440 | elapsed time per iteration (s): 0.47 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 3.639384E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.007 | TFLOPs: 35.45 | +7: iteration 3390/ 11269 | consumed samples: 867840 | consumed tokens: 1777336320 | elapsed time per iteration (s): 0.47 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 3.643041E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.071 | TFLOPs: 35.45 | +7: iteration 3400/ 11269 | consumed samples: 870400 | consumed tokens: 1782579200 | elapsed time per iteration (s): 0.48 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 3.646726E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.854 | TFLOPs: 34.98 | +7: iteration 3410/ 11269 | consumed samples: 872960 | consumed tokens: 1787822080 | elapsed time per iteration (s): 0.47 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 3.629934E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.598 | TFLOPs: 35.29 | +7: iteration 3420/ 11269 | consumed samples: 875520 | consumed tokens: 1793064960 | elapsed time per iteration (s): 0.47 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 3.628138E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.195 | TFLOPs: 35.46 | +7: iteration 3430/ 11269 | consumed samples: 878080 | consumed tokens: 1798307840 | elapsed time per iteration (s): 0.47 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 3.640280E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.155 | TFLOPs: 35.46 | +7: iteration 3440/ 11269 | consumed samples: 880640 | consumed tokens: 1803550720 | elapsed time per iteration (s): 0.47 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 3.625219E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.171 | TFLOPs: 35.46 | +7: iteration 3450/ 11269 | consumed samples: 883200 | consumed tokens: 1808793600 | elapsed time per iteration (s): 0.47 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 3.636771E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.099 | TFLOPs: 35.45 | +7: iteration 3460/ 11269 | consumed samples: 885760 | consumed tokens: 1814036480 | elapsed time per iteration (s): 0.48 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 3.630505E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.837 | TFLOPs: 35.11 | +7: iteration 3470/ 11269 | consumed samples: 888320 | consumed tokens: 1819279360 | elapsed time per iteration (s): 0.47 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 3.635820E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.008 | TFLOPs: 35.45 | +7: iteration 3480/ 11269 | consumed samples: 890880 | consumed tokens: 1824522240 | elapsed time per iteration (s): 0.47 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 3.614988E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.597 | TFLOPs: 35.42 | +7: iteration 3490/ 11269 | consumed samples: 893440 | consumed tokens: 1829765120 | elapsed time per iteration (s): 0.47 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 3.632345E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.743 | TFLOPs: 35.43 | +7: iteration 3500/ 11269 | consumed samples: 896000 | consumed tokens: 1835008000 | elapsed time per iteration (s): 0.47 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 3.626513E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.653 | TFLOPs: 35.42 | +7: iteration 3510/ 11269 | consumed samples: 898560 | consumed tokens: 1840250880 | elapsed time per iteration (s): 0.47 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 3.622261E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.650 | TFLOPs: 35.42 | +7: iteration 3520/ 11269 | consumed samples: 901120 | consumed tokens: 1845493760 | elapsed time per iteration (s): 0.48 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 3.619210E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.384 | TFLOPs: 35.08 | +7: iteration 3530/ 11269 | consumed samples: 903680 | consumed tokens: 1850736640 | elapsed time per iteration (s): 0.48 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.615298E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.359 | TFLOPs: 35.14 | +7: iteration 3540/ 11269 | consumed samples: 906240 | consumed tokens: 1855979520 | elapsed time per iteration (s): 0.47 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 3.622126E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.157 | TFLOPs: 35.46 | +7: iteration 3550/ 11269 | consumed samples: 908800 | consumed tokens: 1861222400 | elapsed time per iteration (s): 0.47 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 3.620321E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.764 | TFLOPs: 35.43 | +7: iteration 3560/ 11269 | consumed samples: 911360 | consumed tokens: 1866465280 | elapsed time per iteration (s): 0.47 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.624717E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.903 | TFLOPs: 35.44 | +7: iteration 3570/ 11269 | consumed samples: 913920 | consumed tokens: 1871708160 | elapsed time per iteration (s): 0.47 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 3.610326E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.639 | TFLOPs: 35.42 | +7: iteration 3580/ 11269 | consumed samples: 916480 | consumed tokens: 1876951040 | elapsed time per iteration (s): 0.47 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 3.621282E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.695 | TFLOPs: 35.43 | +7: iteration 3590/ 11269 | consumed samples: 919040 | consumed tokens: 1882193920 | elapsed time per iteration (s): 0.47 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 3.622921E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.173 | TFLOPs: 35.39 | +7: iteration 3600/ 11269 | consumed samples: 921600 | consumed tokens: 1887436800 | elapsed time per iteration (s): 0.47 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 3.613615E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.905 | TFLOPs: 35.38 | +7: iteration 3610/ 11269 | consumed samples: 924160 | consumed tokens: 1892679680 | elapsed time per iteration (s): 0.47 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 3.616655E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.945 | TFLOPs: 35.31 | +7: iteration 3620/ 11269 | consumed samples: 926720 | consumed tokens: 1897922560 | elapsed time per iteration (s): 0.47 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 3.602882E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.331 | TFLOPs: 35.40 | +7: iteration 3630/ 11269 | consumed samples: 929280 | consumed tokens: 1903165440 | elapsed time per iteration (s): 0.47 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 3.615614E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.247 | TFLOPs: 35.40 | +7: iteration 3640/ 11269 | consumed samples: 931840 | consumed tokens: 1908408320 | elapsed time per iteration (s): 0.47 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 3.592681E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.769 | TFLOPs: 35.30 | +7: iteration 3650/ 11269 | consumed samples: 934400 | consumed tokens: 1913651200 | elapsed time per iteration (s): 0.48 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 3.599482E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.956 | TFLOPs: 34.79 | +7: iteration 3660/ 11269 | consumed samples: 936960 | consumed tokens: 1918894080 | elapsed time per iteration (s): 0.48 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 3.617565E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.636 | TFLOPs: 35.23 | +7: iteration 3670/ 11269 | consumed samples: 939520 | consumed tokens: 1924136960 | elapsed time per iteration (s): 0.47 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 3.602139E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.528 | TFLOPs: 35.42 | +7: iteration 3680/ 11269 | consumed samples: 942080 | consumed tokens: 1929379840 | elapsed time per iteration (s): 0.48 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 3.603326E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.456 | TFLOPs: 35.21 | +7: iteration 3690/ 11269 | consumed samples: 944640 | consumed tokens: 1934622720 | elapsed time per iteration (s): 0.47 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.620446E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.139 | TFLOPs: 35.39 | +7: iteration 3700/ 11269 | consumed samples: 947200 | consumed tokens: 1939865600 | elapsed time per iteration (s): 0.47 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 3.619298E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.968 | TFLOPs: 35.38 | +7: iteration 3710/ 11269 | consumed samples: 949760 | consumed tokens: 1945108480 | elapsed time per iteration (s): 0.48 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 3.591425E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 530.207 | TFLOPs: 34.68 | +7: iteration 3720/ 11269 | consumed samples: 952320 | consumed tokens: 1950351360 | elapsed time per iteration (s): 0.47 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 3.612724E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.247 | TFLOPs: 35.40 | +7: iteration 3730/ 11269 | consumed samples: 954880 | consumed tokens: 1955594240 | elapsed time per iteration (s): 0.47 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 3.596725E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.081 | TFLOPs: 35.39 | +7: iteration 3740/ 11269 | consumed samples: 957440 | consumed tokens: 1960837120 | elapsed time per iteration (s): 0.47 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 3.590895E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.968 | TFLOPs: 35.31 | +7: iteration 3750/ 11269 | consumed samples: 960000 | consumed tokens: 1966080000 | elapsed time per iteration (s): 0.47 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 3.594995E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.177 | TFLOPs: 35.39 | +7: iteration 3760/ 11269 | consumed samples: 962560 | consumed tokens: 1971322880 | elapsed time per iteration (s): 0.47 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 3.606850E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.761 | TFLOPs: 35.37 | +7: iteration 3770/ 11269 | consumed samples: 965120 | consumed tokens: 1976565760 | elapsed time per iteration (s): 0.47 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 3.614896E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.208 | TFLOPs: 35.39 | +7: iteration 3780/ 11269 | consumed samples: 967680 | consumed tokens: 1981808640 | elapsed time per iteration (s): 0.47 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 3.595600E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.509 | TFLOPs: 35.35 | +7: iteration 3790/ 11269 | consumed samples: 970240 | consumed tokens: 1987051520 | elapsed time per iteration (s): 0.48 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 3.592293E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.576 | TFLOPs: 34.96 | +7: iteration 3800/ 11269 | consumed samples: 972800 | consumed tokens: 1992294400 | elapsed time per iteration (s): 0.47 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 3.598350E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.900 | TFLOPs: 35.37 | +7: iteration 3810/ 11269 | consumed samples: 975360 | consumed tokens: 1997537280 | elapsed time per iteration (s): 0.47 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 3.580560E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.036 | TFLOPs: 35.38 | +7: iteration 3820/ 11269 | consumed samples: 977920 | consumed tokens: 2002780160 | elapsed time per iteration (s): 0.48 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 3.586628E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.391 | TFLOPs: 35.21 | +7: iteration 3830/ 11269 | consumed samples: 980480 | consumed tokens: 2008023040 | elapsed time per iteration (s): 0.48 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 3.577149E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.458 | TFLOPs: 35.15 | +7: iteration 3840/ 11269 | consumed samples: 983040 | consumed tokens: 2013265920 | elapsed time per iteration (s): 0.47 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 3.594820E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.645 | TFLOPs: 35.36 | +7: iteration 3850/ 11269 | consumed samples: 985600 | consumed tokens: 2018508800 | elapsed time per iteration (s): 0.48 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 3.582261E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.590 | TFLOPs: 35.22 | +7: iteration 3860/ 11269 | consumed samples: 988160 | consumed tokens: 2023751680 | elapsed time per iteration (s): 0.47 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 3.604624E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.594 | TFLOPs: 35.35 | +7: iteration 3870/ 11269 | consumed samples: 990720 | consumed tokens: 2028994560 | elapsed time per iteration (s): 0.47 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 3.580990E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.531 | TFLOPs: 35.35 | +7: iteration 3880/ 11269 | consumed samples: 993280 | consumed tokens: 2034237440 | elapsed time per iteration (s): 0.48 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 3.589249E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.407 | TFLOPs: 35.08 | +7: iteration 3890/ 11269 | consumed samples: 995840 | consumed tokens: 2039480320 | elapsed time per iteration (s): 0.47 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 3.584024E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.253 | TFLOPs: 35.33 | +7: iteration 3900/ 11269 | consumed samples: 998400 | consumed tokens: 2044723200 | elapsed time per iteration (s): 0.47 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 3.582876E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.266 | TFLOPs: 35.33 | +7: iteration 3910/ 11269 | consumed samples: 1000960 | consumed tokens: 2049966080 | elapsed time per iteration (s): 0.47 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 3.589555E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.529 | TFLOPs: 35.35 | +7: iteration 3920/ 11269 | consumed samples: 1003520 | consumed tokens: 2055208960 | elapsed time per iteration (s): 0.47 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 3.579715E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.388 | TFLOPs: 35.34 | +7: iteration 3930/ 11269 | consumed samples: 1006080 | consumed tokens: 2060451840 | elapsed time per iteration (s): 0.47 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 3.569087E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.454 | TFLOPs: 35.35 | +7: iteration 3940/ 11269 | consumed samples: 1008640 | consumed tokens: 2065694720 | elapsed time per iteration (s): 0.47 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 3.585363E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.154 | TFLOPs: 35.33 | +7: iteration 3950/ 11269 | consumed samples: 1011200 | consumed tokens: 2070937600 | elapsed time per iteration (s): 0.47 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 3.584088E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.157 | TFLOPs: 35.33 | +7: iteration 3960/ 11269 | consumed samples: 1013760 | consumed tokens: 2076180480 | elapsed time per iteration (s): 0.47 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 3.590179E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.056 | TFLOPs: 35.32 | +7: iteration 3970/ 11269 | consumed samples: 1016320 | consumed tokens: 2081423360 | elapsed time per iteration (s): 0.47 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 3.567812E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.231 | TFLOPs: 35.33 | +7: iteration 3980/ 11269 | consumed samples: 1018880 | consumed tokens: 2086666240 | elapsed time per iteration (s): 0.47 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 3.571168E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.915 | TFLOPs: 35.31 | +7: iteration 3990/ 11269 | consumed samples: 1021440 | consumed tokens: 2091909120 | elapsed time per iteration (s): 0.47 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 3.585706E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.073 | TFLOPs: 35.32 | +0: [2023-03-15 22:31:31,005] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=0, lr=[0.00015125198701273666, 0.00015125198701273666, 0.00015125198701273666], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 4000/ 11269 | consumed samples: 1024000 | consumed tokens: 2097152000 | elapsed time per iteration (s): 0.47 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 3.587613E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.811 | TFLOPs: 35.30 | +0: steps: 4000 loss: 3.5705 iter time (s): 0.473 samples/sec: 541.627 +7: ----------------------------------------------------------------------------------------------- +7: validation loss at iteration 4000 | lm loss value: 3.600785E+00 | lm loss PPL: 3.662696E+01 | +7: ----------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 4000 to checkpoints_280m5b9400m +0: [2023-03-15 22:31:31,185] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step4000 is begin to save! +0: [2023-03-15 22:31:31,189] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_01-model_00-model_states.pt... +0: [2023-03-15 22:31:31,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_01-model_00-model_states.pt. +0: [2023-03-15 22:31:31,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_03-model_00-model_states.pt... +0: [2023-03-15 22:31:31,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_03-model_00-model_states.pt. +0: [2023-03-15 22:31:31,325] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_04-model_00-model_states.pt... +0: [2023-03-15 22:31:31,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_04-model_00-model_states.pt. +0: [2023-03-15 22:31:31,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_05-model_00-model_states.pt... +0: [2023-03-15 22:31:31,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_05-model_00-model_states.pt. +0: [2023-03-15 22:31:31,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_06-model_00-model_states.pt... +0: [2023-03-15 22:31:31,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_06-model_00-model_states.pt. +0: [2023-03-15 22:31:31,397] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_07-model_00-model_states.pt... +0: [2023-03-15 22:31:31,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_07-model_00-model_states.pt. +0: [2023-03-15 22:31:31,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_08-model_00-model_states.pt... +0: [2023-03-15 22:31:31,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_08-model_00-model_states.pt. +0: [2023-03-15 22:31:31,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_09-model_00-model_states.pt... +0: [2023-03-15 22:31:31,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_09-model_00-model_states.pt. +0: [2023-03-15 22:31:31,470] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_10-model_00-model_states.pt... +0: [2023-03-15 22:31:31,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_10-model_00-model_states.pt. +0: [2023-03-15 22:31:31,494] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_11-model_00-model_states.pt... +0: [2023-03-15 22:31:31,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_11-model_00-model_states.pt. +0: [2023-03-15 22:31:31,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_12-model_00-model_states.pt... +0: [2023-03-15 22:31:31,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_12-model_00-model_states.pt. +0: [2023-03-15 22:31:31,543] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_13-model_00-model_states.pt... +0: [2023-03-15 22:31:31,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_13-model_00-model_states.pt. +0: [2023-03-15 22:31:31,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_14-model_00-model_states.pt... +0: [2023-03-15 22:31:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_14-model_00-model_states.pt. +0: [2023-03-15 22:31:31,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_15-model_00-model_states.pt... +0: [2023-03-15 22:31:31,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_15-model_00-model_states.pt. +0: [2023-03-15 22:31:31,615] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_16-model_00-model_states.pt... +0: [2023-03-15 22:31:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_16-model_00-model_states.pt. +0: [2023-03-15 22:31:31,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_17-model_00-model_states.pt... +0: [2023-03-15 22:31:31,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_17-model_00-model_states.pt. +0: [2023-03-15 22:31:31,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_18-model_00-model_states.pt... +0: [2023-03-15 22:31:31,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_18-model_00-model_states.pt. +0: [2023-03-15 22:31:31,688] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_19-model_00-model_states.pt... +0: [2023-03-15 22:31:31,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_19-model_00-model_states.pt. +0: [2023-03-15 22:31:31,712] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_20-model_00-model_states.pt... +0: [2023-03-15 22:31:31,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_20-model_00-model_states.pt. +0: [2023-03-15 22:31:31,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/layer_22-model_00-model_states.pt... +0: [2023-03-15 22:31:31,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/layer_22-model_00-model_states.pt. +0: [2023-03-15 22:31:31,738] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step4000/mp_rank_00_model_states.pt +0: [2023-03-15 22:31:31,738] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/mp_rank_00_model_states.pt... +0: [2023-03-15 22:31:31,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/mp_rank_00_model_states.pt. +0: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:31:31,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:31:31,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:31:31,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 22:31:31,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +5: [2023-03-15 22:31:31,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:31:31,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 22:31:31,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +0: [2023-03-15 22:31:31,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:31:31,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:31:31,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 22:31:31,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +5: [2023-03-15 22:31:31,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:31:31,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 22:31:31,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +7: [2023-03-15 22:31:31,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:31:31,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 22:31:31,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +6: [2023-03-15 22:31:31,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:31:31,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 22:31:31,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +7: [2023-03-15 22:31:31,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:31:31,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 22:31:31,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +1: [2023-03-15 22:31:31,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:31:31,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 22:31:31,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +6: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:31:31,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +3: [2023-03-15 22:31:31,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:31:31,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:31:31,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 22:31:31,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +4: [2023-03-15 22:31:31,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +3: [2023-03-15 22:31:31,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +3: [2023-03-15 22:31:31,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +4: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +3: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:31:31,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:31:31,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 22:31:31,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +0: [2023-03-15 22:31:31,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:31:31,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 22:31:31,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +0: [2023-03-15 22:31:31,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:31:31,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 22:31:31,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +0: [2023-03-15 22:31:31,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:31:31,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 22:31:31,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +7: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:31:31,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +6: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:31:31,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 22:31:31,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +6: [2023-03-15 22:31:31,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +7: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +7: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +6: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +6: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:31:31,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 22:31:31,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +7: [2023-03-15 22:31:31,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:31:31,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 22:31:31,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +1: [2023-03-15 22:31:31,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:31:31,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 22:31:31,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +6: [2023-03-15 22:31:31,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:31:31,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +4: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:31:31,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +6: [2023-03-15 22:31:31,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +4: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +6: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +5: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:31:31,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +5: [2023-03-15 22:31:31,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +4: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:31:31,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +5: [2023-03-15 22:31:31,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +4: [2023-03-15 22:31:31,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +4: [2023-03-15 22:31:31,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:31:31,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 22:31:31,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +7: [2023-03-15 22:31:31,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:31:31,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 22:31:31,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +0: [2023-03-15 22:31:31,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:31:31,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 22:31:31,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +5: [2023-03-15 22:31:31,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:31:31,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:31:31,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 22:31:31,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 22:31:31,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +5: [2023-03-15 22:31:31,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +5: [2023-03-15 22:31:31,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:31:31,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 22:31:31,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +7: [2023-03-15 22:31:31,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:31:31,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 22:31:31,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +4: [2023-03-15 22:31:31,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:31:31,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 22:31:31,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +4: [2023-03-15 22:31:31,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:31:31,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 22:31:31,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +3: [2023-03-15 22:31:31,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 22:31:31,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 22:31:31,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 22:31:31,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +3: [2023-03-15 22:31:31,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +3: [2023-03-15 22:31:31,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +3: [2023-03-15 22:31:31,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:31:31,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:31:31,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:31:31,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-15 22:31:31,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 22:31:31,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +3: [2023-03-15 22:31:31,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +3: [2023-03-15 22:31:31,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 22:31:31,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +0: [2023-03-15 22:31:31,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:31:31,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 22:31:31,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +0: [2023-03-15 22:31:31,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:31:31,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 22:31:31,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +2: [2023-03-15 22:31:31,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:31:31,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 22:31:31,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +2: [2023-03-15 22:31:31,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:31:31,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 22:31:31,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +2: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:31:31,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +2: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:31:31,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 22:31:31,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +2: [2023-03-15 22:31:31,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:31:31,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 22:31:31,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +2: [2023-03-15 22:31:31,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:31:31,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 22:31:31,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +6: [2023-03-15 22:31:31,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:31:31,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 22:31:31,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +5: [2023-03-15 22:31:31,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:31:31,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:31:31,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:31:31,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 22:31:31,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 22:31:31,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +2: [2023-03-15 22:31:31,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +5: [2023-03-15 22:31:31,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 22:31:31,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +1: [2023-03-15 22:31:31,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:31:31,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 22:31:31,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +6: [2023-03-15 22:31:31,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:31:31,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 22:31:31,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +1: [2023-03-15 22:31:31,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:31:31,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 22:31:31,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +1: [2023-03-15 22:31:31,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:31:31,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 22:31:31,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +1: [2023-03-15 22:31:31,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:31:31,902] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-15 22:31:31,902] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +1: [2023-03-15 22:31:31,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:31:31,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 22:31:31,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +0: [2023-03-15 22:31:31,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 22:31:31,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +1: [2023-03-15 22:31:31,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:31:31,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step4000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 22:31:31,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! +0: successfully saved checkpoint at iteration 4000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 740.19 +7: iteration 4010/ 11269 | consumed samples: 1026560 | consumed tokens: 2102394880 | elapsed time per iteration (s): 0.56 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 3.569915E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 455.525 | TFLOPs: 29.79 | +7: iteration 4020/ 11269 | consumed samples: 1029120 | consumed tokens: 2107637760 | elapsed time per iteration (s): 0.47 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 3.572099E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.834 | TFLOPs: 35.44 | +7: iteration 4030/ 11269 | consumed samples: 1031680 | consumed tokens: 2112880640 | elapsed time per iteration (s): 0.47 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 3.578250E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.100 | TFLOPs: 35.39 | +7: iteration 4040/ 11269 | consumed samples: 1034240 | consumed tokens: 2118123520 | elapsed time per iteration (s): 0.48 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 3.570599E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.790 | TFLOPs: 35.04 | +7: iteration 4050/ 11269 | consumed samples: 1036800 | consumed tokens: 2123366400 | elapsed time per iteration (s): 0.47 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 3.563543E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.827 | TFLOPs: 35.37 | +7: iteration 4060/ 11269 | consumed samples: 1039360 | consumed tokens: 2128609280 | elapsed time per iteration (s): 0.47 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 3.585852E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.273 | TFLOPs: 35.33 | +7: iteration 4070/ 11269 | consumed samples: 1041920 | consumed tokens: 2133852160 | elapsed time per iteration (s): 0.47 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 3.576198E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.390 | TFLOPs: 35.34 | +7: iteration 4080/ 11269 | consumed samples: 1044480 | consumed tokens: 2139095040 | elapsed time per iteration (s): 0.47 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 3.573927E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.248 | TFLOPs: 35.33 | +7: iteration 4090/ 11269 | consumed samples: 1047040 | consumed tokens: 2144337920 | elapsed time per iteration (s): 0.47 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 3.561845E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.957 | TFLOPs: 35.31 | +7: iteration 4100/ 11269 | consumed samples: 1049600 | consumed tokens: 2149580800 | elapsed time per iteration (s): 0.47 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 3.567532E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.822 | TFLOPs: 35.30 | +7: iteration 4110/ 11269 | consumed samples: 1052160 | consumed tokens: 2154823680 | elapsed time per iteration (s): 0.47 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 3.561104E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.639 | TFLOPs: 35.29 | +7: iteration 4120/ 11269 | consumed samples: 1054720 | consumed tokens: 2160066560 | elapsed time per iteration (s): 0.48 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 3.549779E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.892 | TFLOPs: 35.11 | +7: iteration 4130/ 11269 | consumed samples: 1057280 | consumed tokens: 2165309440 | elapsed time per iteration (s): 0.48 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 3.555961E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.164 | TFLOPs: 35.20 | +7: iteration 4140/ 11269 | consumed samples: 1059840 | consumed tokens: 2170552320 | elapsed time per iteration (s): 0.47 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 3.573557E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.848 | TFLOPs: 35.31 | +7: iteration 4150/ 11269 | consumed samples: 1062400 | consumed tokens: 2175795200 | elapsed time per iteration (s): 0.47 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 3.556122E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.884 | TFLOPs: 35.31 | +7: iteration 4160/ 11269 | consumed samples: 1064960 | consumed tokens: 2181038080 | elapsed time per iteration (s): 0.47 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 3.564204E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.586 | TFLOPs: 35.29 | +7: iteration 4170/ 11269 | consumed samples: 1067520 | consumed tokens: 2186280960 | elapsed time per iteration (s): 0.47 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 3.555713E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.770 | TFLOPs: 35.30 | +7: iteration 4180/ 11269 | consumed samples: 1070080 | consumed tokens: 2191523840 | elapsed time per iteration (s): 0.47 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 3.556392E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.909 | TFLOPs: 35.31 | +7: iteration 4190/ 11269 | consumed samples: 1072640 | consumed tokens: 2196766720 | elapsed time per iteration (s): 0.47 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 3.555935E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.645 | TFLOPs: 35.29 | +7: iteration 4200/ 11269 | consumed samples: 1075200 | consumed tokens: 2202009600 | elapsed time per iteration (s): 0.48 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 3.554442E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.132 | TFLOPs: 34.93 | +7: iteration 4210/ 11269 | consumed samples: 1077760 | consumed tokens: 2207252480 | elapsed time per iteration (s): 0.47 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 3.555630E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.272 | TFLOPs: 35.33 | +7: iteration 4220/ 11269 | consumed samples: 1080320 | consumed tokens: 2212495360 | elapsed time per iteration (s): 0.47 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 3.558863E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.761 | TFLOPs: 35.30 | +7: iteration 4230/ 11269 | consumed samples: 1082880 | consumed tokens: 2217738240 | elapsed time per iteration (s): 0.48 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 3.549416E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.611 | TFLOPs: 35.16 | +7: iteration 4240/ 11269 | consumed samples: 1085440 | consumed tokens: 2222981120 | elapsed time per iteration (s): 0.47 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 3.539490E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.903 | TFLOPs: 35.31 | +7: iteration 4250/ 11269 | consumed samples: 1088000 | consumed tokens: 2228224000 | elapsed time per iteration (s): 0.47 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 3.544685E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.646 | TFLOPs: 35.29 | +7: iteration 4260/ 11269 | consumed samples: 1090560 | consumed tokens: 2233466880 | elapsed time per iteration (s): 0.47 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 3.551604E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.415 | TFLOPs: 35.28 | +7: iteration 4270/ 11269 | consumed samples: 1093120 | consumed tokens: 2238709760 | elapsed time per iteration (s): 0.47 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 3.545426E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.640 | TFLOPs: 35.29 | +7: iteration 4280/ 11269 | consumed samples: 1095680 | consumed tokens: 2243952640 | elapsed time per iteration (s): 0.47 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 3.535106E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.743 | TFLOPs: 35.30 | +7: iteration 4290/ 11269 | consumed samples: 1098240 | consumed tokens: 2249195520 | elapsed time per iteration (s): 0.47 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 3.549684E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.670 | TFLOPs: 35.29 | +7: iteration 4300/ 11269 | consumed samples: 1100800 | consumed tokens: 2254438400 | elapsed time per iteration (s): 0.47 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 3.540063E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.047 | TFLOPs: 35.25 | +7: iteration 4310/ 11269 | consumed samples: 1103360 | consumed tokens: 2259681280 | elapsed time per iteration (s): 0.47 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 3.545801E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.841 | TFLOPs: 35.31 | +7: iteration 4320/ 11269 | consumed samples: 1105920 | consumed tokens: 2264924160 | elapsed time per iteration (s): 0.47 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 3.543862E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.439 | TFLOPs: 35.28 | +7: iteration 4330/ 11269 | consumed samples: 1108480 | consumed tokens: 2270167040 | elapsed time per iteration (s): 0.47 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 3.544177E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.610 | TFLOPs: 35.29 | +7: iteration 4340/ 11269 | consumed samples: 1111040 | consumed tokens: 2275409920 | elapsed time per iteration (s): 0.47 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 3.553367E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.605 | TFLOPs: 35.29 | +7: iteration 4350/ 11269 | consumed samples: 1113600 | consumed tokens: 2280652800 | elapsed time per iteration (s): 0.47 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 3.532352E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.604 | TFLOPs: 35.29 | +7: iteration 4360/ 11269 | consumed samples: 1116160 | consumed tokens: 2285895680 | elapsed time per iteration (s): 0.47 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 3.546780E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.665 | TFLOPs: 35.29 | +7: iteration 4370/ 11269 | consumed samples: 1118720 | consumed tokens: 2291138560 | elapsed time per iteration (s): 0.47 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 3.552824E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.458 | TFLOPs: 35.28 | +7: iteration 4380/ 11269 | consumed samples: 1121280 | consumed tokens: 2296381440 | elapsed time per iteration (s): 0.48 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 3.532155E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.664 | TFLOPs: 35.10 | +7: iteration 4390/ 11269 | consumed samples: 1123840 | consumed tokens: 2301624320 | elapsed time per iteration (s): 0.47 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 3.544703E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.462 | TFLOPs: 35.28 | +7: iteration 4400/ 11269 | consumed samples: 1126400 | consumed tokens: 2306867200 | elapsed time per iteration (s): 0.47 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 3.547324E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.447 | TFLOPs: 35.28 | +7: iteration 4410/ 11269 | consumed samples: 1128960 | consumed tokens: 2312110080 | elapsed time per iteration (s): 0.47 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 3.548166E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.758 | TFLOPs: 35.30 | +7: iteration 4420/ 11269 | consumed samples: 1131520 | consumed tokens: 2317352960 | elapsed time per iteration (s): 0.47 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 3.565414E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.364 | TFLOPs: 35.27 | +7: iteration 4430/ 11269 | consumed samples: 1134080 | consumed tokens: 2322595840 | elapsed time per iteration (s): 0.48 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 3.541821E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.219 | TFLOPs: 35.07 | +7: iteration 4440/ 11269 | consumed samples: 1136640 | consumed tokens: 2327838720 | elapsed time per iteration (s): 0.48 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 3.532368E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.468 | TFLOPs: 35.02 | +7: iteration 4450/ 11269 | consumed samples: 1139200 | consumed tokens: 2333081600 | elapsed time per iteration (s): 0.47 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 3.527050E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.707 | TFLOPs: 35.30 | +7: iteration 4460/ 11269 | consumed samples: 1141760 | consumed tokens: 2338324480 | elapsed time per iteration (s): 0.47 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 3.536263E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.807 | TFLOPs: 35.30 | +7: iteration 4470/ 11269 | consumed samples: 1144320 | consumed tokens: 2343567360 | elapsed time per iteration (s): 0.47 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 3.523137E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.730 | TFLOPs: 35.30 | +7: iteration 4480/ 11269 | consumed samples: 1146880 | consumed tokens: 2348810240 | elapsed time per iteration (s): 0.47 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 3.542071E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.785 | TFLOPs: 35.30 | +7: iteration 4490/ 11269 | consumed samples: 1149440 | consumed tokens: 2354053120 | elapsed time per iteration (s): 0.47 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 3.519309E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.673 | TFLOPs: 35.29 | +7: iteration 4500/ 11269 | consumed samples: 1152000 | consumed tokens: 2359296000 | elapsed time per iteration (s): 0.47 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 3.535428E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.822 | TFLOPs: 35.30 | +7: iteration 4510/ 11269 | consumed samples: 1154560 | consumed tokens: 2364538880 | elapsed time per iteration (s): 0.47 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 3.532586E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.692 | TFLOPs: 35.30 | +7: iteration 4520/ 11269 | consumed samples: 1157120 | consumed tokens: 2369781760 | elapsed time per iteration (s): 0.47 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 3.536242E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.320 | TFLOPs: 35.27 | +7: iteration 4530/ 11269 | consumed samples: 1159680 | consumed tokens: 2375024640 | elapsed time per iteration (s): 0.47 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 3.544091E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.468 | TFLOPs: 35.28 | +7: iteration 4540/ 11269 | consumed samples: 1162240 | consumed tokens: 2380267520 | elapsed time per iteration (s): 0.47 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 3.528858E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.291 | TFLOPs: 35.27 | +7: iteration 4550/ 11269 | consumed samples: 1164800 | consumed tokens: 2385510400 | elapsed time per iteration (s): 0.48 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 3.529008E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.068 | TFLOPs: 34.99 | +7: iteration 4560/ 11269 | consumed samples: 1167360 | consumed tokens: 2390753280 | elapsed time per iteration (s): 0.47 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 3.531031E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.344 | TFLOPs: 35.27 | +7: iteration 4570/ 11269 | consumed samples: 1169920 | consumed tokens: 2395996160 | elapsed time per iteration (s): 0.47 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 3.543891E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.420 | TFLOPs: 35.28 | +7: iteration 4580/ 11269 | consumed samples: 1172480 | consumed tokens: 2401239040 | elapsed time per iteration (s): 0.47 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 3.542534E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.240 | TFLOPs: 35.27 | +7: iteration 4590/ 11269 | consumed samples: 1175040 | consumed tokens: 2406481920 | elapsed time per iteration (s): 0.47 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 3.544521E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.590 | TFLOPs: 35.29 | +7: iteration 4600/ 11269 | consumed samples: 1177600 | consumed tokens: 2411724800 | elapsed time per iteration (s): 0.47 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 3.532931E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.364 | TFLOPs: 35.27 | +7: iteration 4610/ 11269 | consumed samples: 1180160 | consumed tokens: 2416967680 | elapsed time per iteration (s): 0.47 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 3.525865E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.679 | TFLOPs: 35.29 | +7: iteration 4620/ 11269 | consumed samples: 1182720 | consumed tokens: 2422210560 | elapsed time per iteration (s): 0.47 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 3.537323E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.268 | TFLOPs: 35.27 | +7: iteration 4630/ 11269 | consumed samples: 1185280 | consumed tokens: 2427453440 | elapsed time per iteration (s): 0.48 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 3.534953E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.017 | TFLOPs: 34.99 | +7: iteration 4640/ 11269 | consumed samples: 1187840 | consumed tokens: 2432696320 | elapsed time per iteration (s): 0.48 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 3.510226E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.650 | TFLOPs: 35.16 | +7: iteration 4650/ 11269 | consumed samples: 1190400 | consumed tokens: 2437939200 | elapsed time per iteration (s): 0.48 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 3.534689E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.242 | TFLOPs: 34.87 | +7: iteration 4660/ 11269 | consumed samples: 1192960 | consumed tokens: 2443182080 | elapsed time per iteration (s): 0.49 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 3.531736E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 521.985 | TFLOPs: 34.14 | +7: iteration 4670/ 11269 | consumed samples: 1195520 | consumed tokens: 2448424960 | elapsed time per iteration (s): 0.47 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 3.505529E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.773 | TFLOPs: 35.30 | +7: iteration 4680/ 11269 | consumed samples: 1198080 | consumed tokens: 2453667840 | elapsed time per iteration (s): 0.47 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 3.498239E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.323 | TFLOPs: 35.27 | +7: iteration 4690/ 11269 | consumed samples: 1200640 | consumed tokens: 2458910720 | elapsed time per iteration (s): 0.48 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 3.515582E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.806 | TFLOPs: 34.98 | +7: iteration 4700/ 11269 | consumed samples: 1203200 | consumed tokens: 2464153600 | elapsed time per iteration (s): 0.47 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 3.510083E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.585 | TFLOPs: 35.29 | +7: iteration 4710/ 11269 | consumed samples: 1205760 | consumed tokens: 2469396480 | elapsed time per iteration (s): 0.47 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 3.524300E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.592 | TFLOPs: 35.29 | +7: iteration 4720/ 11269 | consumed samples: 1208320 | consumed tokens: 2474639360 | elapsed time per iteration (s): 0.47 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 3.517653E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.095 | TFLOPs: 35.26 | +7: iteration 4730/ 11269 | consumed samples: 1210880 | consumed tokens: 2479882240 | elapsed time per iteration (s): 0.48 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 3.516909E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.892 | TFLOPs: 35.24 | +7: iteration 4740/ 11269 | consumed samples: 1213440 | consumed tokens: 2485125120 | elapsed time per iteration (s): 0.48 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 3.514516E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.894 | TFLOPs: 35.24 | +7: iteration 4750/ 11269 | consumed samples: 1216000 | consumed tokens: 2490368000 | elapsed time per iteration (s): 0.47 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 3.507392E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.210 | TFLOPs: 35.26 | +7: iteration 4760/ 11269 | consumed samples: 1218560 | consumed tokens: 2495610880 | elapsed time per iteration (s): 0.48 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 3.513135E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.702 | TFLOPs: 35.23 | +7: iteration 4770/ 11269 | consumed samples: 1221120 | consumed tokens: 2500853760 | elapsed time per iteration (s): 0.48 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 3.512903E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.730 | TFLOPs: 35.23 | +7: iteration 4780/ 11269 | consumed samples: 1223680 | consumed tokens: 2506096640 | elapsed time per iteration (s): 0.48 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 3.520626E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.797 | TFLOPs: 35.24 | +7: iteration 4790/ 11269 | consumed samples: 1226240 | consumed tokens: 2511339520 | elapsed time per iteration (s): 0.48 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 3.503631E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.875 | TFLOPs: 35.24 | +7: iteration 4800/ 11269 | consumed samples: 1228800 | consumed tokens: 2516582400 | elapsed time per iteration (s): 0.48 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 3.505391E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.823 | TFLOPs: 35.24 | +7: iteration 4810/ 11269 | consumed samples: 1231360 | consumed tokens: 2521825280 | elapsed time per iteration (s): 0.48 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 3.507911E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.733 | TFLOPs: 35.23 | +7: iteration 4820/ 11269 | consumed samples: 1233920 | consumed tokens: 2527068160 | elapsed time per iteration (s): 0.48 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 3.507335E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.703 | TFLOPs: 35.23 | +7: iteration 4830/ 11269 | consumed samples: 1236480 | consumed tokens: 2532311040 | elapsed time per iteration (s): 0.47 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 3.505484E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.055 | TFLOPs: 35.25 | +7: iteration 4840/ 11269 | consumed samples: 1239040 | consumed tokens: 2537553920 | elapsed time per iteration (s): 0.48 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 3.497747E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.934 | TFLOPs: 35.25 | +7: iteration 4850/ 11269 | consumed samples: 1241600 | consumed tokens: 2542796800 | elapsed time per iteration (s): 0.48 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 3.506871E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.885 | TFLOPs: 35.05 | +7: iteration 4860/ 11269 | consumed samples: 1244160 | consumed tokens: 2548039680 | elapsed time per iteration (s): 0.48 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 3.502287E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.918 | TFLOPs: 35.25 | +7: iteration 4870/ 11269 | consumed samples: 1246720 | consumed tokens: 2553282560 | elapsed time per iteration (s): 0.48 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 3.509565E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.348 | TFLOPs: 35.21 | +7: iteration 4880/ 11269 | consumed samples: 1249280 | consumed tokens: 2558525440 | elapsed time per iteration (s): 0.48 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 3.516523E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.501 | TFLOPs: 35.22 | +7: iteration 4890/ 11269 | consumed samples: 1251840 | consumed tokens: 2563768320 | elapsed time per iteration (s): 0.48 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 3.501132E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.434 | TFLOPs: 35.21 | +7: iteration 4900/ 11269 | consumed samples: 1254400 | consumed tokens: 2569011200 | elapsed time per iteration (s): 0.48 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 3.505419E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.330 | TFLOPs: 35.21 | +7: iteration 4910/ 11269 | consumed samples: 1256960 | consumed tokens: 2574254080 | elapsed time per iteration (s): 0.48 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 3.496089E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.081 | TFLOPs: 35.19 | +7: iteration 4920/ 11269 | consumed samples: 1259520 | consumed tokens: 2579496960 | elapsed time per iteration (s): 0.48 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 3.493664E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.981 | TFLOPs: 35.18 | +7: iteration 4930/ 11269 | consumed samples: 1262080 | consumed tokens: 2584739840 | elapsed time per iteration (s): 0.48 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 3.491578E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.861 | TFLOPs: 35.18 | +7: iteration 4940/ 11269 | consumed samples: 1264640 | consumed tokens: 2589982720 | elapsed time per iteration (s): 0.48 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 3.491191E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.959 | TFLOPs: 35.18 | +7: iteration 4950/ 11269 | consumed samples: 1267200 | consumed tokens: 2595225600 | elapsed time per iteration (s): 0.48 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 3.509406E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.947 | TFLOPs: 35.18 | +7: iteration 4960/ 11269 | consumed samples: 1269760 | consumed tokens: 2600468480 | elapsed time per iteration (s): 0.48 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 3.493105E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.878 | TFLOPs: 35.18 | +7: iteration 4970/ 11269 | consumed samples: 1272320 | consumed tokens: 2605711360 | elapsed time per iteration (s): 0.48 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 3.491270E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.023 | TFLOPs: 35.19 | +7: iteration 4980/ 11269 | consumed samples: 1274880 | consumed tokens: 2610954240 | elapsed time per iteration (s): 0.48 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 3.498028E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.034 | TFLOPs: 35.19 | +7: iteration 4990/ 11269 | consumed samples: 1277440 | consumed tokens: 2616197120 | elapsed time per iteration (s): 0.48 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 3.509027E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.110 | TFLOPs: 35.06 | +7: iteration 5000/ 11269 | consumed samples: 1280000 | consumed tokens: 2621440000 | elapsed time per iteration (s): 0.48 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 3.503295E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.285 | TFLOPs: 35.20 | +7: ----------------------------------------------------------------------------------------------- +7: validation loss at iteration 5000 | lm loss value: 3.535530E+00 | lm loss PPL: 3.431318E+01 | +7: ----------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 5000 to checkpoints_280m5b9400m +0: [2023-03-15 22:39:27,305] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step5000 is begin to save! +0: [2023-03-15 22:39:27,309] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_01-model_00-model_states.pt... +0: [2023-03-15 22:39:27,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_01-model_00-model_states.pt. +0: [2023-03-15 22:39:27,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_03-model_00-model_states.pt... +0: [2023-03-15 22:39:27,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_03-model_00-model_states.pt. +0: [2023-03-15 22:39:27,448] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_04-model_00-model_states.pt... +0: [2023-03-15 22:39:27,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_04-model_00-model_states.pt. +0: [2023-03-15 22:39:27,473] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_05-model_00-model_states.pt... +0: [2023-03-15 22:39:27,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_05-model_00-model_states.pt. +0: [2023-03-15 22:39:27,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_06-model_00-model_states.pt... +0: [2023-03-15 22:39:27,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_06-model_00-model_states.pt. +0: [2023-03-15 22:39:27,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_07-model_00-model_states.pt... +0: [2023-03-15 22:39:27,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_07-model_00-model_states.pt. +0: [2023-03-15 22:39:27,546] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_08-model_00-model_states.pt... +0: [2023-03-15 22:39:27,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_08-model_00-model_states.pt. +0: [2023-03-15 22:39:27,570] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_09-model_00-model_states.pt... +0: [2023-03-15 22:39:27,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_09-model_00-model_states.pt. +0: [2023-03-15 22:39:27,594] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_10-model_00-model_states.pt... +0: [2023-03-15 22:39:27,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_10-model_00-model_states.pt. +0: [2023-03-15 22:39:27,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_11-model_00-model_states.pt... +0: [2023-03-15 22:39:27,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_11-model_00-model_states.pt. +0: [2023-03-15 22:39:27,642] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_12-model_00-model_states.pt... +0: [2023-03-15 22:39:27,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_12-model_00-model_states.pt. +0: [2023-03-15 22:39:27,666] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_13-model_00-model_states.pt... +0: [2023-03-15 22:39:27,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_13-model_00-model_states.pt. +0: [2023-03-15 22:39:27,691] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_14-model_00-model_states.pt... +0: [2023-03-15 22:39:27,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_14-model_00-model_states.pt. +0: [2023-03-15 22:39:27,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_15-model_00-model_states.pt... +0: [2023-03-15 22:39:27,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_15-model_00-model_states.pt. +0: [2023-03-15 22:39:27,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_16-model_00-model_states.pt... +0: [2023-03-15 22:39:27,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_16-model_00-model_states.pt. +0: [2023-03-15 22:39:27,763] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_17-model_00-model_states.pt... +0: [2023-03-15 22:39:27,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_17-model_00-model_states.pt. +0: [2023-03-15 22:39:27,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_18-model_00-model_states.pt... +0: [2023-03-15 22:39:27,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_18-model_00-model_states.pt. +0: [2023-03-15 22:39:27,811] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_19-model_00-model_states.pt... +0: [2023-03-15 22:39:27,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_19-model_00-model_states.pt. +0: [2023-03-15 22:39:27,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_20-model_00-model_states.pt... +0: [2023-03-15 22:39:27,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_20-model_00-model_states.pt. +0: [2023-03-15 22:39:27,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/layer_22-model_00-model_states.pt... +0: [2023-03-15 22:39:27,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/layer_22-model_00-model_states.pt. +0: [2023-03-15 22:39:27,861] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step5000/mp_rank_00_model_states.pt +0: [2023-03-15 22:39:27,861] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/mp_rank_00_model_states.pt... +0: [2023-03-15 22:39:27,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/mp_rank_00_model_states.pt. +0: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:39:27,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:39:27,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:27,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:27,939] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 22:39:27,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +3: [2023-03-15 22:39:27,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:27,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 22:39:27,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +3: [2023-03-15 22:39:27,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:27,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +5: [2023-03-15 22:39:27,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:39:27,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:27,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +0: [2023-03-15 22:39:27,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:39:27,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 22:39:27,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +0: [2023-03-15 22:39:27,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:39:27,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +0: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:39:27,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +3: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:27,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +1: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:39:27,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +1: [2023-03-15 22:39:27,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +0: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +1: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +4: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:27,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:39:27,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +3: [2023-03-15 22:39:27,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +4: [2023-03-15 22:39:27,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +3: [2023-03-15 22:39:27,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +4: [2023-03-15 22:39:27,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:39:27,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-15 22:39:27,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +7: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:39:27,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 22:39:27,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +7: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +7: [2023-03-15 22:39:27,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +6: [2023-03-15 22:39:27,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:39:27,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 22:39:27,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +6: [2023-03-15 22:39:27,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:39:27,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 22:39:27,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +6: [2023-03-15 22:39:27,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:39:27,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 22:39:27,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +4: [2023-03-15 22:39:27,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:39:27,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 22:39:27,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +6: [2023-03-15 22:39:27,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:39:27,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:39:27,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +4: [2023-03-15 22:39:27,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +6: [2023-03-15 22:39:27,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +4: [2023-03-15 22:39:27,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +5: [2023-03-15 22:39:27,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 22:39:27,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 22:39:27,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +5: [2023-03-15 22:39:27,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +5: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:39:27,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +5: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:39:27,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 22:39:27,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 22:39:27,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +6: [2023-03-15 22:39:27,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +5: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +5: [2023-03-15 22:39:27,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +5: [2023-03-15 22:39:27,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:39:27,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +6: [2023-03-15 22:39:27,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +5: [2023-03-15 22:39:27,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +6: [2023-03-15 22:39:27,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +5: [2023-03-15 22:39:27,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:39:27,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 22:39:27,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +6: [2023-03-15 22:39:27,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:39:27,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 22:39:27,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +6: [2023-03-15 22:39:27,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:39:27,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 22:39:27,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +4: [2023-03-15 22:39:27,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:39:27,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 22:39:27,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +7: [2023-03-15 22:39:27,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:39:27,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 22:39:27,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +7: [2023-03-15 22:39:27,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:39:27,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:39:27,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:39:27,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 22:39:27,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 22:39:27,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +7: [2023-03-15 22:39:27,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 22:39:27,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +7: [2023-03-15 22:39:27,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +3: [2023-03-15 22:39:27,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:27,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 22:39:27,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +7: [2023-03-15 22:39:27,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:39:27,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 22:39:27,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +4: [2023-03-15 22:39:27,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:39:27,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 22:39:27,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +4: [2023-03-15 22:39:27,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:39:27,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 22:39:27,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:39:27,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +4: [2023-03-15 22:39:27,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 22:39:27,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +6: [2023-03-15 22:39:27,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:39:27,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 22:39:27,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +0: [2023-03-15 22:39:27,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 22:39:27,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +1: [2023-03-15 22:39:27,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:39:27,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 22:39:27,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +1: [2023-03-15 22:39:27,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:39:27,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 22:39:27,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +2: [2023-03-15 22:39:27,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:39:27,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 22:39:27,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +2: [2023-03-15 22:39:27,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:39:27,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 22:39:27,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +2: [2023-03-15 22:39:27,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:39:27,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 22:39:27,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +2: [2023-03-15 22:39:27,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:39:27,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 22:39:27,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +2: [2023-03-15 22:39:27,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:39:27,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 22:39:27,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +2: [2023-03-15 22:39:27,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:39:27,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 22:39:27,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +2: [2023-03-15 22:39:27,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:39:27,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 22:39:27,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +2: [2023-03-15 22:39:27,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:39:27,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 22:39:27,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +1: [2023-03-15 22:39:27,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:39:27,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 22:39:27,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +3: [2023-03-15 22:39:27,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:27,996] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 22:39:27,996] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +1: [2023-03-15 22:39:28,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:39:28,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 22:39:28,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +1: [2023-03-15 22:39:28,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:39:28,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 22:39:28,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +1: [2023-03-15 22:39:28,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:39:28,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 22:39:28,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +3: [2023-03-15 22:39:28,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:39:28,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 22:39:28,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +1: [2023-03-15 22:39:28,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:39:28,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 22:39:28,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +0: [2023-03-15 22:39:28,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:39:28,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:39:28,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:39:28,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 22:39:28,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 22:39:28,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step5000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 22:39:28,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +0: [2023-03-15 22:39:28,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +0: [2023-03-15 22:39:28,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! +0: successfully saved checkpoint at iteration 5000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 733.99 +7: iteration 5010/ 11269 | consumed samples: 1282560 | consumed tokens: 2626682880 | elapsed time per iteration (s): 0.56 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 3.493475E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 453.314 | TFLOPs: 29.65 | +7: iteration 5020/ 11269 | consumed samples: 1285120 | consumed tokens: 2631925760 | elapsed time per iteration (s): 0.47 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 3.495748E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.727 | TFLOPs: 35.30 | +7: iteration 5030/ 11269 | consumed samples: 1287680 | consumed tokens: 2637168640 | elapsed time per iteration (s): 0.47 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 3.492948E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.205 | TFLOPs: 35.26 | +7: iteration 5040/ 11269 | consumed samples: 1290240 | consumed tokens: 2642411520 | elapsed time per iteration (s): 0.48 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 3.490750E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.627 | TFLOPs: 35.23 | +7: iteration 5050/ 11269 | consumed samples: 1292800 | consumed tokens: 2647654400 | elapsed time per iteration (s): 0.48 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 3.484851E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.974 | TFLOPs: 35.18 | +7: iteration 5060/ 11269 | consumed samples: 1295360 | consumed tokens: 2652897280 | elapsed time per iteration (s): 0.48 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 3.489478E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.276 | TFLOPs: 35.20 | +7: iteration 5070/ 11269 | consumed samples: 1297920 | consumed tokens: 2658140160 | elapsed time per iteration (s): 0.48 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 3.500307E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.013 | TFLOPs: 35.19 | +7: iteration 5080/ 11269 | consumed samples: 1300480 | consumed tokens: 2663383040 | elapsed time per iteration (s): 0.48 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 3.504337E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.714 | TFLOPs: 35.17 | +7: iteration 5090/ 11269 | consumed samples: 1303040 | consumed tokens: 2668625920 | elapsed time per iteration (s): 0.48 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 3.505586E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.914 | TFLOPs: 35.18 | +7: iteration 5100/ 11269 | consumed samples: 1305600 | consumed tokens: 2673868800 | elapsed time per iteration (s): 0.48 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 3.500150E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.899 | TFLOPs: 35.18 | +7: iteration 5110/ 11269 | consumed samples: 1308160 | consumed tokens: 2679111680 | elapsed time per iteration (s): 0.48 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 3.475576E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.122 | TFLOPs: 35.19 | +7: iteration 5120/ 11269 | consumed samples: 1310720 | consumed tokens: 2684354560 | elapsed time per iteration (s): 0.48 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 3.491161E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.145 | TFLOPs: 35.19 | +7: iteration 5130/ 11269 | consumed samples: 1313280 | consumed tokens: 2689597440 | elapsed time per iteration (s): 0.48 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 3.484547E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.631 | TFLOPs: 35.16 | +7: iteration 5140/ 11269 | consumed samples: 1315840 | consumed tokens: 2694840320 | elapsed time per iteration (s): 0.48 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 3.499185E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.968 | TFLOPs: 35.18 | +7: iteration 5150/ 11269 | consumed samples: 1318400 | consumed tokens: 2700083200 | elapsed time per iteration (s): 0.48 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 3.491121E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.950 | TFLOPs: 35.18 | +7: iteration 5160/ 11269 | consumed samples: 1320960 | consumed tokens: 2705326080 | elapsed time per iteration (s): 0.48 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 3.479597E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.001 | TFLOPs: 35.19 | +7: iteration 5170/ 11269 | consumed samples: 1323520 | consumed tokens: 2710568960 | elapsed time per iteration (s): 0.48 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 3.468608E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.012 | TFLOPs: 35.19 | +7: iteration 5180/ 11269 | consumed samples: 1326080 | consumed tokens: 2715811840 | elapsed time per iteration (s): 0.48 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 3.480220E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.625 | TFLOPs: 35.16 | +7: iteration 5190/ 11269 | consumed samples: 1328640 | consumed tokens: 2721054720 | elapsed time per iteration (s): 0.48 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 3.484446E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.368 | TFLOPs: 34.95 | +7: iteration 5200/ 11269 | consumed samples: 1331200 | consumed tokens: 2726297600 | elapsed time per iteration (s): 0.48 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 3.499223E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.153 | TFLOPs: 35.20 | +7: iteration 5210/ 11269 | consumed samples: 1333760 | consumed tokens: 2731540480 | elapsed time per iteration (s): 0.48 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 3.490013E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.960 | TFLOPs: 35.18 | +7: iteration 5220/ 11269 | consumed samples: 1336320 | consumed tokens: 2736783360 | elapsed time per iteration (s): 0.48 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 3.470759E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.037 | TFLOPs: 34.93 | +7: iteration 5230/ 11269 | consumed samples: 1338880 | consumed tokens: 2742026240 | elapsed time per iteration (s): 0.48 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 3.484445E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.226 | TFLOPs: 35.20 | +7: iteration 5240/ 11269 | consumed samples: 1341440 | consumed tokens: 2747269120 | elapsed time per iteration (s): 0.48 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 3.484791E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.982 | TFLOPs: 35.18 | +7: iteration 5250/ 11269 | consumed samples: 1344000 | consumed tokens: 2752512000 | elapsed time per iteration (s): 0.48 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 3.488728E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.982 | TFLOPs: 35.18 | +7: iteration 5260/ 11269 | consumed samples: 1346560 | consumed tokens: 2757754880 | elapsed time per iteration (s): 0.48 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 3.476784E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.101 | TFLOPs: 35.19 | +7: iteration 5270/ 11269 | consumed samples: 1349120 | consumed tokens: 2762997760 | elapsed time per iteration (s): 0.48 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 3.474360E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.195 | TFLOPs: 35.20 | +7: iteration 5280/ 11269 | consumed samples: 1351680 | consumed tokens: 2768240640 | elapsed time per iteration (s): 0.48 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 3.473171E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.940 | TFLOPs: 35.18 | +7: iteration 5290/ 11269 | consumed samples: 1354240 | consumed tokens: 2773483520 | elapsed time per iteration (s): 0.48 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 3.466803E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.933 | TFLOPs: 35.18 | +7: iteration 5300/ 11269 | consumed samples: 1356800 | consumed tokens: 2778726400 | elapsed time per iteration (s): 0.48 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 3.475943E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.818 | TFLOPs: 35.17 | +7: iteration 5310/ 11269 | consumed samples: 1359360 | consumed tokens: 2783969280 | elapsed time per iteration (s): 0.48 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 3.485269E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.978 | TFLOPs: 35.18 | +7: iteration 5320/ 11269 | consumed samples: 1361920 | consumed tokens: 2789212160 | elapsed time per iteration (s): 0.48 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 3.478102E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.995 | TFLOPs: 35.18 | +7: iteration 5330/ 11269 | consumed samples: 1364480 | consumed tokens: 2794455040 | elapsed time per iteration (s): 0.48 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 3.488144E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.280 | TFLOPs: 35.20 | +7: iteration 5340/ 11269 | consumed samples: 1367040 | consumed tokens: 2799697920 | elapsed time per iteration (s): 0.48 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 3.476329E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.682 | TFLOPs: 35.16 | +7: iteration 5350/ 11269 | consumed samples: 1369600 | consumed tokens: 2804940800 | elapsed time per iteration (s): 0.48 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 3.476046E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.907 | TFLOPs: 35.18 | +7: iteration 5360/ 11269 | consumed samples: 1372160 | consumed tokens: 2810183680 | elapsed time per iteration (s): 0.48 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 3.475746E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.035 | TFLOPs: 35.19 | +7: iteration 5370/ 11269 | consumed samples: 1374720 | consumed tokens: 2815426560 | elapsed time per iteration (s): 0.48 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 3.482919E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.001 | TFLOPs: 35.19 | +7: iteration 5380/ 11269 | consumed samples: 1377280 | consumed tokens: 2820669440 | elapsed time per iteration (s): 0.48 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 3.473159E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.621 | TFLOPs: 35.16 | +7: iteration 5390/ 11269 | consumed samples: 1379840 | consumed tokens: 2825912320 | elapsed time per iteration (s): 0.48 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 3.474450E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.920 | TFLOPs: 34.92 | +7: iteration 5400/ 11269 | consumed samples: 1382400 | consumed tokens: 2831155200 | elapsed time per iteration (s): 0.48 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 3.469470E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.143 | TFLOPs: 35.19 | +7: iteration 5410/ 11269 | consumed samples: 1384960 | consumed tokens: 2836398080 | elapsed time per iteration (s): 0.48 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 3.467048E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.935 | TFLOPs: 35.18 | +7: iteration 5420/ 11269 | consumed samples: 1387520 | consumed tokens: 2841640960 | elapsed time per iteration (s): 0.48 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 3.474428E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.854 | TFLOPs: 34.91 | +7: iteration 5430/ 11269 | consumed samples: 1390080 | consumed tokens: 2846883840 | elapsed time per iteration (s): 0.48 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 3.475266E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.787 | TFLOPs: 35.11 | +7: iteration 5440/ 11269 | consumed samples: 1392640 | consumed tokens: 2852126720 | elapsed time per iteration (s): 0.48 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 3.466885E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.429 | TFLOPs: 35.15 | +7: iteration 5450/ 11269 | consumed samples: 1395200 | consumed tokens: 2857369600 | elapsed time per iteration (s): 0.48 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 3.473208E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.533 | TFLOPs: 35.15 | +7: iteration 5460/ 11269 | consumed samples: 1397760 | consumed tokens: 2862612480 | elapsed time per iteration (s): 0.48 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 3.479366E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.810 | TFLOPs: 35.17 | +7: iteration 5470/ 11269 | consumed samples: 1400320 | consumed tokens: 2867855360 | elapsed time per iteration (s): 0.48 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 3.461372E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.594 | TFLOPs: 35.16 | +7: iteration 5480/ 11269 | consumed samples: 1402880 | consumed tokens: 2873098240 | elapsed time per iteration (s): 0.48 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 3.480242E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.716 | TFLOPs: 35.17 | +7: iteration 5490/ 11269 | consumed samples: 1405440 | consumed tokens: 2878341120 | elapsed time per iteration (s): 0.48 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 3.466716E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.598 | TFLOPs: 35.16 | +7: iteration 5500/ 11269 | consumed samples: 1408000 | consumed tokens: 2883584000 | elapsed time per iteration (s): 0.48 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 3.476184E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.806 | TFLOPs: 35.17 | +7: iteration 5510/ 11269 | consumed samples: 1410560 | consumed tokens: 2888826880 | elapsed time per iteration (s): 0.48 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 3.459838E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.640 | TFLOPs: 34.97 | +7: iteration 5520/ 11269 | consumed samples: 1413120 | consumed tokens: 2894069760 | elapsed time per iteration (s): 0.48 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 3.462251E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.996 | TFLOPs: 35.18 | +7: iteration 5530/ 11269 | consumed samples: 1415680 | consumed tokens: 2899312640 | elapsed time per iteration (s): 0.48 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 3.479703E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.248 | TFLOPs: 35.20 | +7: iteration 5540/ 11269 | consumed samples: 1418240 | consumed tokens: 2904555520 | elapsed time per iteration (s): 0.48 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 3.475929E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.947 | TFLOPs: 34.99 | +7: iteration 5550/ 11269 | consumed samples: 1420800 | consumed tokens: 2909798400 | elapsed time per iteration (s): 0.48 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 3.464724E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.673 | TFLOPs: 34.58 | +7: iteration 5560/ 11269 | consumed samples: 1423360 | consumed tokens: 2915041280 | elapsed time per iteration (s): 0.48 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 3.460003E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.149 | TFLOPs: 34.74 | +7: iteration 5570/ 11269 | consumed samples: 1425920 | consumed tokens: 2920284160 | elapsed time per iteration (s): 0.48 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 3.469551E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.017 | TFLOPs: 35.19 | +7: iteration 5580/ 11269 | consumed samples: 1428480 | consumed tokens: 2925527040 | elapsed time per iteration (s): 0.48 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 3.455764E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.766 | TFLOPs: 35.24 | +7: iteration 5590/ 11269 | consumed samples: 1431040 | consumed tokens: 2930769920 | elapsed time per iteration (s): 0.48 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 3.465266E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.687 | TFLOPs: 35.23 | +7: iteration 5600/ 11269 | consumed samples: 1433600 | consumed tokens: 2936012800 | elapsed time per iteration (s): 0.48 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 3.453191E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.592 | TFLOPs: 35.03 | +7: iteration 5610/ 11269 | consumed samples: 1436160 | consumed tokens: 2941255680 | elapsed time per iteration (s): 0.48 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 3.453803E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.054 | TFLOPs: 35.12 | +7: iteration 5620/ 11269 | consumed samples: 1438720 | consumed tokens: 2946498560 | elapsed time per iteration (s): 0.48 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 3.465922E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.687 | TFLOPs: 35.23 | +7: iteration 5630/ 11269 | consumed samples: 1441280 | consumed tokens: 2951741440 | elapsed time per iteration (s): 0.48 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 3.453612E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.494 | TFLOPs: 35.22 | +7: iteration 5640/ 11269 | consumed samples: 1443840 | consumed tokens: 2956984320 | elapsed time per iteration (s): 0.48 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 3.454318E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.383 | TFLOPs: 34.95 | +7: iteration 5650/ 11269 | consumed samples: 1446400 | consumed tokens: 2962227200 | elapsed time per iteration (s): 0.48 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 3.451409E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.415 | TFLOPs: 35.15 | +7: iteration 5660/ 11269 | consumed samples: 1448960 | consumed tokens: 2967470080 | elapsed time per iteration (s): 0.48 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 3.467605E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.298 | TFLOPs: 35.20 | +7: iteration 5670/ 11269 | consumed samples: 1451520 | consumed tokens: 2972712960 | elapsed time per iteration (s): 0.48 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 3.443607E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.668 | TFLOPs: 35.23 | +7: iteration 5680/ 11269 | consumed samples: 1454080 | consumed tokens: 2977955840 | elapsed time per iteration (s): 0.48 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 3.450143E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.990 | TFLOPs: 35.05 | +7: iteration 5690/ 11269 | consumed samples: 1456640 | consumed tokens: 2983198720 | elapsed time per iteration (s): 0.48 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 3.440339E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.518 | TFLOPs: 35.22 | +7: iteration 5700/ 11269 | consumed samples: 1459200 | consumed tokens: 2988441600 | elapsed time per iteration (s): 0.48 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 3.464775E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.040 | TFLOPs: 35.06 | +7: iteration 5710/ 11269 | consumed samples: 1461760 | consumed tokens: 2993684480 | elapsed time per iteration (s): 0.48 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 3.431411E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.385 | TFLOPs: 34.95 | +7: iteration 5720/ 11269 | consumed samples: 1464320 | consumed tokens: 2998927360 | elapsed time per iteration (s): 0.48 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 3.442437E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.929 | TFLOPs: 35.25 | +7: iteration 5730/ 11269 | consumed samples: 1466880 | consumed tokens: 3004170240 | elapsed time per iteration (s): 0.47 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 3.453783E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.239 | TFLOPs: 35.27 | +7: iteration 5740/ 11269 | consumed samples: 1469440 | consumed tokens: 3009413120 | elapsed time per iteration (s): 0.47 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 3.445419E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.125 | TFLOPs: 35.26 | +7: iteration 5750/ 11269 | consumed samples: 1472000 | consumed tokens: 3014656000 | elapsed time per iteration (s): 0.47 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 3.441238E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.015 | TFLOPs: 35.25 | +7: iteration 5760/ 11269 | consumed samples: 1474560 | consumed tokens: 3019898880 | elapsed time per iteration (s): 0.48 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 3.454681E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.904 | TFLOPs: 35.24 | +7: iteration 5770/ 11269 | consumed samples: 1477120 | consumed tokens: 3025141760 | elapsed time per iteration (s): 0.48 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 3.453607E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.597 | TFLOPs: 35.22 | +7: iteration 5780/ 11269 | consumed samples: 1479680 | consumed tokens: 3030384640 | elapsed time per iteration (s): 0.48 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 3.446737E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.032 | TFLOPs: 34.93 | +7: iteration 5790/ 11269 | consumed samples: 1482240 | consumed tokens: 3035627520 | elapsed time per iteration (s): 0.47 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 3.444495E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.572 | TFLOPs: 35.29 | +7: iteration 5800/ 11269 | consumed samples: 1484800 | consumed tokens: 3040870400 | elapsed time per iteration (s): 0.47 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 3.456130E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.244 | TFLOPs: 35.27 | +7: iteration 5810/ 11269 | consumed samples: 1487360 | consumed tokens: 3046113280 | elapsed time per iteration (s): 0.47 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 3.447393E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.065 | TFLOPs: 35.25 | +7: iteration 5820/ 11269 | consumed samples: 1489920 | consumed tokens: 3051356160 | elapsed time per iteration (s): 0.47 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 3.450786E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.126 | TFLOPs: 35.26 | +7: iteration 5830/ 11269 | consumed samples: 1492480 | consumed tokens: 3056599040 | elapsed time per iteration (s): 0.48 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 3.455126E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.642 | TFLOPs: 34.90 | +7: iteration 5840/ 11269 | consumed samples: 1495040 | consumed tokens: 3061841920 | elapsed time per iteration (s): 0.47 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 3.452491E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.169 | TFLOPs: 35.26 | +7: iteration 5850/ 11269 | consumed samples: 1497600 | consumed tokens: 3067084800 | elapsed time per iteration (s): 0.47 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 3.449771E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.091 | TFLOPs: 35.26 | +7: iteration 5860/ 11269 | consumed samples: 1500160 | consumed tokens: 3072327680 | elapsed time per iteration (s): 0.47 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 3.448683E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.399 | TFLOPs: 35.28 | +7: iteration 5870/ 11269 | consumed samples: 1502720 | consumed tokens: 3077570560 | elapsed time per iteration (s): 0.48 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 3.435830E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.271 | TFLOPs: 34.94 | +7: iteration 5880/ 11269 | consumed samples: 1505280 | consumed tokens: 3082813440 | elapsed time per iteration (s): 0.48 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 3.443459E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.036 | TFLOPs: 35.06 | +7: iteration 5890/ 11269 | consumed samples: 1507840 | consumed tokens: 3088056320 | elapsed time per iteration (s): 0.47 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 3.454259E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.713 | TFLOPs: 35.30 | +7: iteration 5900/ 11269 | consumed samples: 1510400 | consumed tokens: 3093299200 | elapsed time per iteration (s): 0.47 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 3.454796E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.398 | TFLOPs: 35.28 | +7: iteration 5910/ 11269 | consumed samples: 1512960 | consumed tokens: 3098542080 | elapsed time per iteration (s): 0.48 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 3.434761E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.087 | TFLOPs: 34.99 | +7: iteration 5920/ 11269 | consumed samples: 1515520 | consumed tokens: 3103784960 | elapsed time per iteration (s): 0.47 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 3.435603E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.501 | TFLOPs: 35.28 | +7: iteration 5930/ 11269 | consumed samples: 1518080 | consumed tokens: 3109027840 | elapsed time per iteration (s): 0.48 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 3.451527E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.560 | TFLOPs: 35.03 | +7: iteration 5940/ 11269 | consumed samples: 1520640 | consumed tokens: 3114270720 | elapsed time per iteration (s): 0.48 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 3.448270E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.319 | TFLOPs: 35.08 | +7: iteration 5950/ 11269 | consumed samples: 1523200 | consumed tokens: 3119513600 | elapsed time per iteration (s): 0.47 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 3.436526E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.596 | TFLOPs: 35.29 | +7: iteration 5960/ 11269 | consumed samples: 1525760 | consumed tokens: 3124756480 | elapsed time per iteration (s): 0.47 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 3.436576E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.718 | TFLOPs: 35.30 | +7: iteration 5970/ 11269 | consumed samples: 1528320 | consumed tokens: 3129999360 | elapsed time per iteration (s): 0.47 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 3.444768E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.069 | TFLOPs: 35.32 | +7: iteration 5980/ 11269 | consumed samples: 1530880 | consumed tokens: 3135242240 | elapsed time per iteration (s): 0.47 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 3.457648E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.078 | TFLOPs: 35.26 | +7: iteration 5990/ 11269 | consumed samples: 1533440 | consumed tokens: 3140485120 | elapsed time per iteration (s): 0.48 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 3.452099E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.252 | TFLOPs: 35.07 | +0: [2023-03-15 22:47:24,325] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=0, lr=[0.00010217547488893524, 0.00010217547488893524, 0.00010217547488893524], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 6000/ 11269 | consumed samples: 1536000 | consumed tokens: 3145728000 | elapsed time per iteration (s): 0.48 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 3.433497E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.032 | TFLOPs: 34.99 | +0: steps: 6000 loss: 3.4109 iter time (s): 0.473 samples/sec: 540.759 +7: ----------------------------------------------------------------------------------------------- +7: validation loss at iteration 6000 | lm loss value: 3.504254E+00 | lm loss PPL: 3.325663E+01 | +7: ----------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 6000 to checkpoints_280m5b9400m +0: [2023-03-15 22:47:24,503] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step6000 is begin to save! +0: [2023-03-15 22:47:24,506] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_01-model_00-model_states.pt... +0: [2023-03-15 22:47:24,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_01-model_00-model_states.pt. +0: [2023-03-15 22:47:24,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_03-model_00-model_states.pt... +0: [2023-03-15 22:47:24,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_03-model_00-model_states.pt. +0: [2023-03-15 22:47:24,642] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_04-model_00-model_states.pt... +0: [2023-03-15 22:47:24,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_04-model_00-model_states.pt. +0: [2023-03-15 22:47:24,667] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_05-model_00-model_states.pt... +0: [2023-03-15 22:47:24,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_05-model_00-model_states.pt. +0: [2023-03-15 22:47:24,691] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_06-model_00-model_states.pt... +0: [2023-03-15 22:47:24,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_06-model_00-model_states.pt. +0: [2023-03-15 22:47:24,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_07-model_00-model_states.pt... +0: [2023-03-15 22:47:24,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_07-model_00-model_states.pt. +0: [2023-03-15 22:47:24,740] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_08-model_00-model_states.pt... +0: [2023-03-15 22:47:24,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_08-model_00-model_states.pt. +0: [2023-03-15 22:47:24,764] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_09-model_00-model_states.pt... +0: [2023-03-15 22:47:24,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_09-model_00-model_states.pt. +0: [2023-03-15 22:47:24,788] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_10-model_00-model_states.pt... +0: [2023-03-15 22:47:24,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_10-model_00-model_states.pt. +0: [2023-03-15 22:47:24,812] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_11-model_00-model_states.pt... +0: [2023-03-15 22:47:24,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_11-model_00-model_states.pt. +0: [2023-03-15 22:47:24,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_12-model_00-model_states.pt... +0: [2023-03-15 22:47:24,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_12-model_00-model_states.pt. +0: [2023-03-15 22:47:24,861] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_13-model_00-model_states.pt... +0: [2023-03-15 22:47:24,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_13-model_00-model_states.pt. +0: [2023-03-15 22:47:24,885] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_14-model_00-model_states.pt... +0: [2023-03-15 22:47:24,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_14-model_00-model_states.pt. +0: [2023-03-15 22:47:24,909] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_15-model_00-model_states.pt... +0: [2023-03-15 22:47:24,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_15-model_00-model_states.pt. +0: [2023-03-15 22:47:24,934] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_16-model_00-model_states.pt... +0: [2023-03-15 22:47:24,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_16-model_00-model_states.pt. +0: [2023-03-15 22:47:24,958] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_17-model_00-model_states.pt... +0: [2023-03-15 22:47:24,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_17-model_00-model_states.pt. +0: [2023-03-15 22:47:24,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_18-model_00-model_states.pt... +0: [2023-03-15 22:47:25,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_18-model_00-model_states.pt. +0: [2023-03-15 22:47:25,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_19-model_00-model_states.pt... +0: [2023-03-15 22:47:25,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_19-model_00-model_states.pt. +0: [2023-03-15 22:47:25,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_20-model_00-model_states.pt... +0: [2023-03-15 22:47:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_20-model_00-model_states.pt. +0: [2023-03-15 22:47:25,055] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/layer_22-model_00-model_states.pt... +0: [2023-03-15 22:47:25,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/layer_22-model_00-model_states.pt. +0: [2023-03-15 22:47:25,057] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step6000/mp_rank_00_model_states.pt +0: [2023-03-15 22:47:25,057] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/mp_rank_00_model_states.pt... +0: [2023-03-15 22:47:25,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/mp_rank_00_model_states.pt. +0: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:47:25,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:47:25,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 22:47:25,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +5: [2023-03-15 22:47:25,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 22:47:25,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +4: [2023-03-15 22:47:25,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:47:25,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 22:47:25,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +1: [2023-03-15 22:47:25,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:47:25,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:47:25,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 22:47:25,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 22:47:25,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +1: [2023-03-15 22:47:25,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +1: [2023-03-15 22:47:25,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:47:25,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 22:47:25,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +6: [2023-03-15 22:47:25,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:47:25,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 22:47:25,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +6: [2023-03-15 22:47:25,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:47:25,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 22:47:25,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +6: [2023-03-15 22:47:25,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:47:25,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 22:47:25,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +7: [2023-03-15 22:47:25,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:47:25,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 22:47:25,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +1: [2023-03-15 22:47:25,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:47:25,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 22:47:25,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +4: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:47:25,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 22:47:25,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +4: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +3: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +4: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:47:25,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +5: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +6: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:47:25,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +4: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:47:25,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +1: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:47:25,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +5: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +4: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:47:25,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +5: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +4: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +4: [2023-03-15 22:47:25,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +5: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +1: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:47:25,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +4: [2023-03-15 22:47:25,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:47:25,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +4: [2023-03-15 22:47:25,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-15 22:47:25,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +6: [2023-03-15 22:47:25,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:47:25,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 22:47:25,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +2: [2023-03-15 22:47:25,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:47:25,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:47:25,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 22:47:25,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +2: [2023-03-15 22:47:25,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 22:47:25,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +2: [2023-03-15 22:47:25,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:47:25,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 22:47:25,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +2: [2023-03-15 22:47:25,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:47:25,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:47:25,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 22:47:25,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 22:47:25,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +2: [2023-03-15 22:47:25,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +7: [2023-03-15 22:47:25,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:47:25,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:47:25,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:47:25,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 22:47:25,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +7: [2023-03-15 22:47:25,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +7: [2023-03-15 22:47:25,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 22:47:25,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +7: [2023-03-15 22:47:25,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +3: [2023-03-15 22:47:25,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +3: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:47:25,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 22:47:25,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +3: [2023-03-15 22:47:25,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:47:25,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 22:47:25,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +3: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:47:25,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +3: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:47:25,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 22:47:25,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 22:47:25,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +3: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +3: [2023-03-15 22:47:25,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +3: [2023-03-15 22:47:25,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:47:25,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 22:47:25,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +0: [2023-03-15 22:47:25,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:47:25,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:47:25,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:47:25,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:47:25,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 22:47:25,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 22:47:25,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 22:47:25,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 22:47:25,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +0: [2023-03-15 22:47:25,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +0: [2023-03-15 22:47:25,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +0: [2023-03-15 22:47:25,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +2: [2023-03-15 22:47:25,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:47:25,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 22:47:25,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +2: [2023-03-15 22:47:25,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:47:25,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 22:47:25,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +2: [2023-03-15 22:47:25,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:47:25,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 22:47:25,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +1: [2023-03-15 22:47:25,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:47:25,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:47:25,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 22:47:25,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 22:47:25,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +1: [2023-03-15 22:47:25,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +6: [2023-03-15 22:47:25,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:47:25,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 22:47:25,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +5: [2023-03-15 22:47:25,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 22:47:25,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +5: [2023-03-15 22:47:25,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:47:25,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 22:47:25,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +0: [2023-03-15 22:47:25,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:47:25,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:47:25,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:47:25,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 22:47:25,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 22:47:25,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +0: [2023-03-15 22:47:25,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 22:47:25,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +0: [2023-03-15 22:47:25,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +0: [2023-03-15 22:47:25,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 22:47:25,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +6: [2023-03-15 22:47:25,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:47:25,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-15 22:47:25,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +6: [2023-03-15 22:47:25,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:47:25,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step6000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 22:47:25,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! +0: successfully saved checkpoint at iteration 6000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 746.36 +7: iteration 6010/ 11269 | consumed samples: 1538560 | consumed tokens: 3150970880 | elapsed time per iteration (s): 0.56 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 3.439256E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 454.751 | TFLOPs: 29.74 | +7: iteration 6020/ 11269 | consumed samples: 1541120 | consumed tokens: 3156213760 | elapsed time per iteration (s): 0.47 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 3.437464E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.282 | TFLOPs: 35.40 | +7: iteration 6030/ 11269 | consumed samples: 1543680 | consumed tokens: 3161456640 | elapsed time per iteration (s): 0.47 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 3.431055E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.482 | TFLOPs: 35.35 | +7: iteration 6040/ 11269 | consumed samples: 1546240 | consumed tokens: 3166699520 | elapsed time per iteration (s): 0.47 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 3.445513E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.026 | TFLOPs: 35.32 | +7: iteration 6050/ 11269 | consumed samples: 1548800 | consumed tokens: 3171942400 | elapsed time per iteration (s): 0.48 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 3.439691E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.018 | TFLOPs: 35.06 | +7: iteration 6060/ 11269 | consumed samples: 1551360 | consumed tokens: 3177185280 | elapsed time per iteration (s): 0.47 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 3.448228E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.984 | TFLOPs: 35.31 | +7: iteration 6070/ 11269 | consumed samples: 1553920 | consumed tokens: 3182428160 | elapsed time per iteration (s): 0.47 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 3.434833E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.224 | TFLOPs: 35.33 | +7: iteration 6080/ 11269 | consumed samples: 1556480 | consumed tokens: 3187671040 | elapsed time per iteration (s): 0.47 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 3.422869E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.930 | TFLOPs: 35.31 | +7: iteration 6090/ 11269 | consumed samples: 1559040 | consumed tokens: 3192913920 | elapsed time per iteration (s): 0.47 | learning rate: 9.991E-05 | global batch size: 256 | lm loss: 3.421154E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.307 | TFLOPs: 35.27 | +7: iteration 6100/ 11269 | consumed samples: 1561600 | consumed tokens: 3198156800 | elapsed time per iteration (s): 0.47 | learning rate: 9.965E-05 | global batch size: 256 | lm loss: 3.432378E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.739 | TFLOPs: 35.30 | +7: iteration 6110/ 11269 | consumed samples: 1564160 | consumed tokens: 3203399680 | elapsed time per iteration (s): 0.47 | learning rate: 9.940E-05 | global batch size: 256 | lm loss: 3.437401E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.780 | TFLOPs: 35.30 | +7: iteration 6120/ 11269 | consumed samples: 1566720 | consumed tokens: 3208642560 | elapsed time per iteration (s): 0.47 | learning rate: 9.915E-05 | global batch size: 256 | lm loss: 3.442127E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.950 | TFLOPs: 35.31 | +7: iteration 6130/ 11269 | consumed samples: 1569280 | consumed tokens: 3213885440 | elapsed time per iteration (s): 0.47 | learning rate: 9.890E-05 | global batch size: 256 | lm loss: 3.432441E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.644 | TFLOPs: 35.29 | +7: iteration 6140/ 11269 | consumed samples: 1571840 | consumed tokens: 3219128320 | elapsed time per iteration (s): 0.47 | learning rate: 9.865E-05 | global batch size: 256 | lm loss: 3.430826E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.220 | TFLOPs: 35.26 | +7: iteration 6150/ 11269 | consumed samples: 1574400 | consumed tokens: 3224371200 | elapsed time per iteration (s): 0.47 | learning rate: 9.840E-05 | global batch size: 256 | lm loss: 3.439803E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.616 | TFLOPs: 35.29 | +7: iteration 6160/ 11269 | consumed samples: 1576960 | consumed tokens: 3229614080 | elapsed time per iteration (s): 0.48 | learning rate: 9.815E-05 | global batch size: 256 | lm loss: 3.423114E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.472 | TFLOPs: 34.95 | +7: iteration 6170/ 11269 | consumed samples: 1579520 | consumed tokens: 3234856960 | elapsed time per iteration (s): 0.48 | learning rate: 9.789E-05 | global batch size: 256 | lm loss: 3.425268E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.462 | TFLOPs: 35.08 | +7: iteration 6180/ 11269 | consumed samples: 1582080 | consumed tokens: 3240099840 | elapsed time per iteration (s): 0.47 | learning rate: 9.764E-05 | global batch size: 256 | lm loss: 3.436739E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.529 | TFLOPs: 35.29 | +7: iteration 6190/ 11269 | consumed samples: 1584640 | consumed tokens: 3245342720 | elapsed time per iteration (s): 0.47 | learning rate: 9.739E-05 | global batch size: 256 | lm loss: 3.421218E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.565 | TFLOPs: 35.29 | +7: iteration 6200/ 11269 | consumed samples: 1587200 | consumed tokens: 3250585600 | elapsed time per iteration (s): 0.47 | learning rate: 9.714E-05 | global batch size: 256 | lm loss: 3.430359E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.583 | TFLOPs: 35.29 | +7: iteration 6210/ 11269 | consumed samples: 1589760 | consumed tokens: 3255828480 | elapsed time per iteration (s): 0.47 | learning rate: 9.689E-05 | global batch size: 256 | lm loss: 3.420074E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.939 | TFLOPs: 35.31 | +7: iteration 6220/ 11269 | consumed samples: 1592320 | consumed tokens: 3261071360 | elapsed time per iteration (s): 0.47 | learning rate: 9.664E-05 | global batch size: 256 | lm loss: 3.421231E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.469 | TFLOPs: 35.28 | +7: iteration 6230/ 11269 | consumed samples: 1594880 | consumed tokens: 3266314240 | elapsed time per iteration (s): 0.47 | learning rate: 9.639E-05 | global batch size: 256 | lm loss: 3.412927E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.515 | TFLOPs: 35.28 | +7: iteration 6240/ 11269 | consumed samples: 1597440 | consumed tokens: 3271557120 | elapsed time per iteration (s): 0.47 | learning rate: 9.614E-05 | global batch size: 256 | lm loss: 3.419145E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.724 | TFLOPs: 35.30 | +7: iteration 6250/ 11269 | consumed samples: 1600000 | consumed tokens: 3276800000 | elapsed time per iteration (s): 0.47 | learning rate: 9.589E-05 | global batch size: 256 | lm loss: 3.421911E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.186 | TFLOPs: 35.26 | +7: iteration 6260/ 11269 | consumed samples: 1602560 | consumed tokens: 3282042880 | elapsed time per iteration (s): 0.47 | learning rate: 9.564E-05 | global batch size: 256 | lm loss: 3.419669E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.373 | TFLOPs: 35.27 | +7: iteration 6270/ 11269 | consumed samples: 1605120 | consumed tokens: 3287285760 | elapsed time per iteration (s): 0.47 | learning rate: 9.539E-05 | global batch size: 256 | lm loss: 3.421324E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.177 | TFLOPs: 35.26 | +7: iteration 6280/ 11269 | consumed samples: 1607680 | consumed tokens: 3292528640 | elapsed time per iteration (s): 0.47 | learning rate: 9.514E-05 | global batch size: 256 | lm loss: 3.427113E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.297 | TFLOPs: 35.27 | +7: iteration 6290/ 11269 | consumed samples: 1610240 | consumed tokens: 3297771520 | elapsed time per iteration (s): 0.47 | learning rate: 9.489E-05 | global batch size: 256 | lm loss: 3.412718E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.066 | TFLOPs: 35.25 | +7: iteration 6300/ 11269 | consumed samples: 1612800 | consumed tokens: 3303014400 | elapsed time per iteration (s): 0.47 | learning rate: 9.464E-05 | global batch size: 256 | lm loss: 3.410367E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.156 | TFLOPs: 35.26 | +7: iteration 6310/ 11269 | consumed samples: 1615360 | consumed tokens: 3308257280 | elapsed time per iteration (s): 0.47 | learning rate: 9.439E-05 | global batch size: 256 | lm loss: 3.413623E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.434 | TFLOPs: 35.28 | +7: iteration 6320/ 11269 | consumed samples: 1617920 | consumed tokens: 3313500160 | elapsed time per iteration (s): 0.47 | learning rate: 9.414E-05 | global batch size: 256 | lm loss: 3.408359E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.210 | TFLOPs: 35.26 | +7: iteration 6330/ 11269 | consumed samples: 1620480 | consumed tokens: 3318743040 | elapsed time per iteration (s): 0.47 | learning rate: 9.389E-05 | global batch size: 256 | lm loss: 3.418362E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.275 | TFLOPs: 35.27 | +7: iteration 6340/ 11269 | consumed samples: 1623040 | consumed tokens: 3323985920 | elapsed time per iteration (s): 0.47 | learning rate: 9.364E-05 | global batch size: 256 | lm loss: 3.407975E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.542 | TFLOPs: 35.29 | +7: iteration 6350/ 11269 | consumed samples: 1625600 | consumed tokens: 3329228800 | elapsed time per iteration (s): 0.47 | learning rate: 9.339E-05 | global batch size: 256 | lm loss: 3.423827E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.067 | TFLOPs: 35.25 | +7: iteration 6360/ 11269 | consumed samples: 1628160 | consumed tokens: 3334471680 | elapsed time per iteration (s): 0.47 | learning rate: 9.314E-05 | global batch size: 256 | lm loss: 3.415406E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.049 | TFLOPs: 35.25 | +7: iteration 6370/ 11269 | consumed samples: 1630720 | consumed tokens: 3339714560 | elapsed time per iteration (s): 0.48 | learning rate: 9.289E-05 | global batch size: 256 | lm loss: 3.410185E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.432 | TFLOPs: 35.08 | +7: iteration 6380/ 11269 | consumed samples: 1633280 | consumed tokens: 3344957440 | elapsed time per iteration (s): 0.47 | learning rate: 9.264E-05 | global batch size: 256 | lm loss: 3.414592E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.281 | TFLOPs: 35.27 | +7: iteration 6390/ 11269 | consumed samples: 1635840 | consumed tokens: 3350200320 | elapsed time per iteration (s): 0.47 | learning rate: 9.240E-05 | global batch size: 256 | lm loss: 3.412292E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.192 | TFLOPs: 35.26 | +7: iteration 6400/ 11269 | consumed samples: 1638400 | consumed tokens: 3355443200 | elapsed time per iteration (s): 0.47 | learning rate: 9.215E-05 | global batch size: 256 | lm loss: 3.408649E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.198 | TFLOPs: 35.26 | +7: iteration 6410/ 11269 | consumed samples: 1640960 | consumed tokens: 3360686080 | elapsed time per iteration (s): 0.48 | learning rate: 9.190E-05 | global batch size: 256 | lm loss: 3.411223E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.934 | TFLOPs: 35.25 | +7: iteration 6420/ 11269 | consumed samples: 1643520 | consumed tokens: 3365928960 | elapsed time per iteration (s): 0.47 | learning rate: 9.165E-05 | global batch size: 256 | lm loss: 3.400182E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.471 | TFLOPs: 35.28 | +7: iteration 6430/ 11269 | consumed samples: 1646080 | consumed tokens: 3371171840 | elapsed time per iteration (s): 0.47 | learning rate: 9.140E-05 | global batch size: 256 | lm loss: 3.415942E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.074 | TFLOPs: 35.26 | +7: iteration 6440/ 11269 | consumed samples: 1648640 | consumed tokens: 3376414720 | elapsed time per iteration (s): 0.48 | learning rate: 9.115E-05 | global batch size: 256 | lm loss: 3.406147E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.709 | TFLOPs: 35.23 | +7: iteration 6450/ 11269 | consumed samples: 1651200 | consumed tokens: 3381657600 | elapsed time per iteration (s): 0.47 | learning rate: 9.091E-05 | global batch size: 256 | lm loss: 3.424207E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.110 | TFLOPs: 35.26 | +7: iteration 6460/ 11269 | consumed samples: 1653760 | consumed tokens: 3386900480 | elapsed time per iteration (s): 0.48 | learning rate: 9.066E-05 | global batch size: 256 | lm loss: 3.403555E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.787 | TFLOPs: 35.24 | +7: iteration 6470/ 11269 | consumed samples: 1656320 | consumed tokens: 3392143360 | elapsed time per iteration (s): 0.47 | learning rate: 9.041E-05 | global batch size: 256 | lm loss: 3.416362E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.077 | TFLOPs: 35.26 | +7: iteration 6480/ 11269 | consumed samples: 1658880 | consumed tokens: 3397386240 | elapsed time per iteration (s): 0.48 | learning rate: 9.016E-05 | global batch size: 256 | lm loss: 3.429414E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.804 | TFLOPs: 35.24 | +7: iteration 6490/ 11269 | consumed samples: 1661440 | consumed tokens: 3402629120 | elapsed time per iteration (s): 0.47 | learning rate: 8.992E-05 | global batch size: 256 | lm loss: 3.417876E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.108 | TFLOPs: 35.26 | +7: iteration 6500/ 11269 | consumed samples: 1664000 | consumed tokens: 3407872000 | elapsed time per iteration (s): 0.47 | learning rate: 8.967E-05 | global batch size: 256 | lm loss: 3.417070E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.049 | TFLOPs: 35.25 | +7: iteration 6510/ 11269 | consumed samples: 1666560 | consumed tokens: 3413114880 | elapsed time per iteration (s): 0.47 | learning rate: 8.942E-05 | global batch size: 256 | lm loss: 3.417630E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.971 | TFLOPs: 35.25 | +7: iteration 6520/ 11269 | consumed samples: 1669120 | consumed tokens: 3418357760 | elapsed time per iteration (s): 0.47 | learning rate: 8.918E-05 | global batch size: 256 | lm loss: 3.418918E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.326 | TFLOPs: 35.27 | +7: iteration 6530/ 11269 | consumed samples: 1671680 | consumed tokens: 3423600640 | elapsed time per iteration (s): 0.48 | learning rate: 8.893E-05 | global batch size: 256 | lm loss: 3.408022E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.516 | TFLOPs: 35.15 | +7: iteration 6540/ 11269 | consumed samples: 1674240 | consumed tokens: 3428843520 | elapsed time per iteration (s): 0.47 | learning rate: 8.868E-05 | global batch size: 256 | lm loss: 3.416422E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.157 | TFLOPs: 35.26 | +7: iteration 6550/ 11269 | consumed samples: 1676800 | consumed tokens: 3434086400 | elapsed time per iteration (s): 0.47 | learning rate: 8.844E-05 | global batch size: 256 | lm loss: 3.417532E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.297 | TFLOPs: 35.27 | +7: iteration 6560/ 11269 | consumed samples: 1679360 | consumed tokens: 3439329280 | elapsed time per iteration (s): 0.47 | learning rate: 8.819E-05 | global batch size: 256 | lm loss: 3.404197E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.196 | TFLOPs: 35.26 | +7: iteration 6570/ 11269 | consumed samples: 1681920 | consumed tokens: 3444572160 | elapsed time per iteration (s): 0.47 | learning rate: 8.795E-05 | global batch size: 256 | lm loss: 3.408557E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.158 | TFLOPs: 35.26 | +7: iteration 6580/ 11269 | consumed samples: 1684480 | consumed tokens: 3449815040 | elapsed time per iteration (s): 0.47 | learning rate: 8.770E-05 | global batch size: 256 | lm loss: 3.408694E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.316 | TFLOPs: 35.27 | +7: iteration 6590/ 11269 | consumed samples: 1687040 | consumed tokens: 3455057920 | elapsed time per iteration (s): 0.47 | learning rate: 8.746E-05 | global batch size: 256 | lm loss: 3.406874E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.227 | TFLOPs: 35.27 | +7: iteration 6600/ 11269 | consumed samples: 1689600 | consumed tokens: 3460300800 | elapsed time per iteration (s): 0.47 | learning rate: 8.721E-05 | global batch size: 256 | lm loss: 3.417134E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.497 | TFLOPs: 35.28 | +7: iteration 6610/ 11269 | consumed samples: 1692160 | consumed tokens: 3465543680 | elapsed time per iteration (s): 0.47 | learning rate: 8.697E-05 | global batch size: 256 | lm loss: 3.403965E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.462 | TFLOPs: 35.28 | +7: iteration 6620/ 11269 | consumed samples: 1694720 | consumed tokens: 3470786560 | elapsed time per iteration (s): 0.47 | learning rate: 8.672E-05 | global batch size: 256 | lm loss: 3.397976E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.556 | TFLOPs: 35.29 | +7: iteration 6630/ 11269 | consumed samples: 1697280 | consumed tokens: 3476029440 | elapsed time per iteration (s): 0.47 | learning rate: 8.648E-05 | global batch size: 256 | lm loss: 3.395756E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.538 | TFLOPs: 35.29 | +7: iteration 6640/ 11269 | consumed samples: 1699840 | consumed tokens: 3481272320 | elapsed time per iteration (s): 0.47 | learning rate: 8.623E-05 | global batch size: 256 | lm loss: 3.411610E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.573 | TFLOPs: 35.29 | +7: iteration 6650/ 11269 | consumed samples: 1702400 | consumed tokens: 3486515200 | elapsed time per iteration (s): 0.48 | learning rate: 8.599E-05 | global batch size: 256 | lm loss: 3.403269E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.932 | TFLOPs: 35.25 | +7: iteration 6660/ 11269 | consumed samples: 1704960 | consumed tokens: 3491758080 | elapsed time per iteration (s): 0.47 | learning rate: 8.574E-05 | global batch size: 256 | lm loss: 3.404846E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.199 | TFLOPs: 35.26 | +7: iteration 6670/ 11269 | consumed samples: 1707520 | consumed tokens: 3497000960 | elapsed time per iteration (s): 0.48 | learning rate: 8.550E-05 | global batch size: 256 | lm loss: 3.383835E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.923 | TFLOPs: 35.25 | +7: iteration 6680/ 11269 | consumed samples: 1710080 | consumed tokens: 3502243840 | elapsed time per iteration (s): 0.47 | learning rate: 8.525E-05 | global batch size: 256 | lm loss: 3.395335E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.225 | TFLOPs: 35.27 | +7: iteration 6690/ 11269 | consumed samples: 1712640 | consumed tokens: 3507486720 | elapsed time per iteration (s): 0.48 | learning rate: 8.501E-05 | global batch size: 256 | lm loss: 3.408002E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.859 | TFLOPs: 35.24 | +7: iteration 6700/ 11269 | consumed samples: 1715200 | consumed tokens: 3512729600 | elapsed time per iteration (s): 0.47 | learning rate: 8.477E-05 | global batch size: 256 | lm loss: 3.402923E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.193 | TFLOPs: 35.26 | +7: iteration 6710/ 11269 | consumed samples: 1717760 | consumed tokens: 3517972480 | elapsed time per iteration (s): 0.48 | learning rate: 8.452E-05 | global batch size: 256 | lm loss: 3.408067E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.335 | TFLOPs: 35.08 | +7: iteration 6720/ 11269 | consumed samples: 1720320 | consumed tokens: 3523215360 | elapsed time per iteration (s): 0.47 | learning rate: 8.428E-05 | global batch size: 256 | lm loss: 3.405670E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.682 | TFLOPs: 35.30 | +7: iteration 6730/ 11269 | consumed samples: 1722880 | consumed tokens: 3528458240 | elapsed time per iteration (s): 0.47 | learning rate: 8.404E-05 | global batch size: 256 | lm loss: 3.415652E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.926 | TFLOPs: 35.31 | +7: iteration 6740/ 11269 | consumed samples: 1725440 | consumed tokens: 3533701120 | elapsed time per iteration (s): 0.47 | learning rate: 8.380E-05 | global batch size: 256 | lm loss: 3.397996E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.795 | TFLOPs: 35.30 | +7: iteration 6750/ 11269 | consumed samples: 1728000 | consumed tokens: 3538944000 | elapsed time per iteration (s): 0.47 | learning rate: 8.355E-05 | global batch size: 256 | lm loss: 3.403588E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.599 | TFLOPs: 35.29 | +7: iteration 6760/ 11269 | consumed samples: 1730560 | consumed tokens: 3544186880 | elapsed time per iteration (s): 0.47 | learning rate: 8.331E-05 | global batch size: 256 | lm loss: 3.405156E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.084 | TFLOPs: 35.26 | +7: iteration 6770/ 11269 | consumed samples: 1733120 | consumed tokens: 3549429760 | elapsed time per iteration (s): 0.48 | learning rate: 8.307E-05 | global batch size: 256 | lm loss: 3.398749E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.548 | TFLOPs: 35.16 | +7: iteration 6780/ 11269 | consumed samples: 1735680 | consumed tokens: 3554672640 | elapsed time per iteration (s): 0.47 | learning rate: 8.283E-05 | global batch size: 256 | lm loss: 3.403761E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.574 | TFLOPs: 35.29 | +7: iteration 6790/ 11269 | consumed samples: 1738240 | consumed tokens: 3559915520 | elapsed time per iteration (s): 0.47 | learning rate: 8.259E-05 | global batch size: 256 | lm loss: 3.381525E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.084 | TFLOPs: 35.26 | +7: iteration 6800/ 11269 | consumed samples: 1740800 | consumed tokens: 3565158400 | elapsed time per iteration (s): 0.47 | learning rate: 8.235E-05 | global batch size: 256 | lm loss: 3.406296E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.181 | TFLOPs: 35.26 | +7: iteration 6810/ 11269 | consumed samples: 1743360 | consumed tokens: 3570401280 | elapsed time per iteration (s): 0.47 | learning rate: 8.210E-05 | global batch size: 256 | lm loss: 3.407673E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.315 | TFLOPs: 35.27 | +7: iteration 6820/ 11269 | consumed samples: 1745920 | consumed tokens: 3575644160 | elapsed time per iteration (s): 0.47 | learning rate: 8.186E-05 | global batch size: 256 | lm loss: 3.393691E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.468 | TFLOPs: 35.28 | +7: iteration 6830/ 11269 | consumed samples: 1748480 | consumed tokens: 3580887040 | elapsed time per iteration (s): 0.47 | learning rate: 8.162E-05 | global batch size: 256 | lm loss: 3.383891E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.050 | TFLOPs: 35.25 | +7: iteration 6840/ 11269 | consumed samples: 1751040 | consumed tokens: 3586129920 | elapsed time per iteration (s): 0.47 | learning rate: 8.138E-05 | global batch size: 256 | lm loss: 3.396439E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.372 | TFLOPs: 35.27 | +7: iteration 6850/ 11269 | consumed samples: 1753600 | consumed tokens: 3591372800 | elapsed time per iteration (s): 0.47 | learning rate: 8.114E-05 | global batch size: 256 | lm loss: 3.389233E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.027 | TFLOPs: 35.25 | +7: iteration 6860/ 11269 | consumed samples: 1756160 | consumed tokens: 3596615680 | elapsed time per iteration (s): 0.47 | learning rate: 8.090E-05 | global batch size: 256 | lm loss: 3.399974E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.366 | TFLOPs: 35.27 | +7: iteration 6870/ 11269 | consumed samples: 1758720 | consumed tokens: 3601858560 | elapsed time per iteration (s): 0.47 | learning rate: 8.066E-05 | global batch size: 256 | lm loss: 3.389627E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.204 | TFLOPs: 35.26 | +7: iteration 6880/ 11269 | consumed samples: 1761280 | consumed tokens: 3607101440 | elapsed time per iteration (s): 0.48 | learning rate: 8.042E-05 | global batch size: 256 | lm loss: 3.386557E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.780 | TFLOPs: 35.24 | +7: iteration 6890/ 11269 | consumed samples: 1763840 | consumed tokens: 3612344320 | elapsed time per iteration (s): 0.47 | learning rate: 8.018E-05 | global batch size: 256 | lm loss: 3.392017E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.282 | TFLOPs: 35.27 | +7: iteration 6900/ 11269 | consumed samples: 1766400 | consumed tokens: 3617587200 | elapsed time per iteration (s): 0.47 | learning rate: 7.994E-05 | global batch size: 256 | lm loss: 3.392745E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.272 | TFLOPs: 35.27 | +7: iteration 6910/ 11269 | consumed samples: 1768960 | consumed tokens: 3622830080 | elapsed time per iteration (s): 0.47 | learning rate: 7.971E-05 | global batch size: 256 | lm loss: 3.403628E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.099 | TFLOPs: 35.26 | +7: iteration 6920/ 11269 | consumed samples: 1771520 | consumed tokens: 3628072960 | elapsed time per iteration (s): 0.48 | learning rate: 7.947E-05 | global batch size: 256 | lm loss: 3.389312E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.298 | TFLOPs: 34.88 | +7: iteration 6930/ 11269 | consumed samples: 1774080 | consumed tokens: 3633315840 | elapsed time per iteration (s): 0.47 | learning rate: 7.923E-05 | global batch size: 256 | lm loss: 3.385606E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.119 | TFLOPs: 35.26 | +7: iteration 6940/ 11269 | consumed samples: 1776640 | consumed tokens: 3638558720 | elapsed time per iteration (s): 0.47 | learning rate: 7.899E-05 | global batch size: 256 | lm loss: 3.384464E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.133 | TFLOPs: 35.26 | +7: iteration 6950/ 11269 | consumed samples: 1779200 | consumed tokens: 3643801600 | elapsed time per iteration (s): 0.47 | learning rate: 7.875E-05 | global batch size: 256 | lm loss: 3.377762E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.326 | TFLOPs: 35.27 | +7: iteration 6960/ 11269 | consumed samples: 1781760 | consumed tokens: 3649044480 | elapsed time per iteration (s): 0.47 | learning rate: 7.852E-05 | global batch size: 256 | lm loss: 3.401046E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.147 | TFLOPs: 35.26 | +7: iteration 6970/ 11269 | consumed samples: 1784320 | consumed tokens: 3654287360 | elapsed time per iteration (s): 0.47 | learning rate: 7.828E-05 | global batch size: 256 | lm loss: 3.387273E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.282 | TFLOPs: 35.27 | +7: iteration 6980/ 11269 | consumed samples: 1786880 | consumed tokens: 3659530240 | elapsed time per iteration (s): 0.47 | learning rate: 7.804E-05 | global batch size: 256 | lm loss: 3.377106E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.109 | TFLOPs: 35.26 | +7: iteration 6990/ 11269 | consumed samples: 1789440 | consumed tokens: 3664773120 | elapsed time per iteration (s): 0.47 | learning rate: 7.780E-05 | global batch size: 256 | lm loss: 3.386224E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.505 | TFLOPs: 35.28 | +7: iteration 7000/ 11269 | consumed samples: 1792000 | consumed tokens: 3670016000 | elapsed time per iteration (s): 0.47 | learning rate: 7.757E-05 | global batch size: 256 | lm loss: 3.387676E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.646 | TFLOPs: 35.29 | +7: ----------------------------------------------------------------------------------------------- +7: validation loss at iteration 7000 | lm loss value: 3.488997E+00 | lm loss PPL: 3.275309E+01 | +7: ----------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 7000 to checkpoints_280m5b9400m +0: [2023-03-15 22:55:20,240] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step7000 is begin to save! +0: [2023-03-15 22:55:20,243] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_01-model_00-model_states.pt... +0: [2023-03-15 22:55:20,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_01-model_00-model_states.pt. +0: [2023-03-15 22:55:20,357] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_03-model_00-model_states.pt... +0: [2023-03-15 22:55:20,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_03-model_00-model_states.pt. +0: [2023-03-15 22:55:20,382] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_04-model_00-model_states.pt... +0: [2023-03-15 22:55:20,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_04-model_00-model_states.pt. +0: [2023-03-15 22:55:20,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_05-model_00-model_states.pt... +0: [2023-03-15 22:55:20,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_05-model_00-model_states.pt. +0: [2023-03-15 22:55:20,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_06-model_00-model_states.pt... +0: [2023-03-15 22:55:20,455] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_06-model_00-model_states.pt. +0: [2023-03-15 22:55:20,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_07-model_00-model_states.pt... +0: [2023-03-15 22:55:20,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_07-model_00-model_states.pt. +0: [2023-03-15 22:55:20,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_08-model_00-model_states.pt... +0: [2023-03-15 22:55:20,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_08-model_00-model_states.pt. +0: [2023-03-15 22:55:20,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_09-model_00-model_states.pt... +0: [2023-03-15 22:55:20,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_09-model_00-model_states.pt. +0: [2023-03-15 22:55:20,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_10-model_00-model_states.pt... +0: [2023-03-15 22:55:20,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_10-model_00-model_states.pt. +0: [2023-03-15 22:55:20,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_11-model_00-model_states.pt... +0: [2023-03-15 22:55:20,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_11-model_00-model_states.pt. +0: [2023-03-15 22:55:20,577] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_12-model_00-model_states.pt... +0: [2023-03-15 22:55:20,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_12-model_00-model_states.pt. +0: [2023-03-15 22:55:20,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_13-model_00-model_states.pt... +0: [2023-03-15 22:55:20,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_13-model_00-model_states.pt. +0: [2023-03-15 22:55:20,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_14-model_00-model_states.pt... +0: [2023-03-15 22:55:20,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_14-model_00-model_states.pt. +0: [2023-03-15 22:55:20,651] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_15-model_00-model_states.pt... +0: [2023-03-15 22:55:20,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_15-model_00-model_states.pt. +0: [2023-03-15 22:55:20,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_16-model_00-model_states.pt... +0: [2023-03-15 22:55:20,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_16-model_00-model_states.pt. +0: [2023-03-15 22:55:20,699] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_17-model_00-model_states.pt... +0: [2023-03-15 22:55:20,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_17-model_00-model_states.pt. +0: [2023-03-15 22:55:20,723] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_18-model_00-model_states.pt... +0: [2023-03-15 22:55:20,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_18-model_00-model_states.pt. +0: [2023-03-15 22:55:20,748] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_19-model_00-model_states.pt... +0: [2023-03-15 22:55:20,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_19-model_00-model_states.pt. +0: [2023-03-15 22:55:20,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_20-model_00-model_states.pt... +0: [2023-03-15 22:55:20,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_20-model_00-model_states.pt. +0: [2023-03-15 22:55:20,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/layer_22-model_00-model_states.pt... +0: [2023-03-15 22:55:20,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/layer_22-model_00-model_states.pt. +0: [2023-03-15 22:55:20,798] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step7000/mp_rank_00_model_states.pt +0: [2023-03-15 22:55:20,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/mp_rank_00_model_states.pt... +0: [2023-03-15 22:55:20,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/mp_rank_00_model_states.pt. +0: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:55:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-15 22:55:20,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:55:20,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:55:20,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:55:20,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 22:55:20,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +5: [2023-03-15 22:55:20,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 22:55:20,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +4: [2023-03-15 22:55:20,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:55:20,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-15 22:55:20,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +6: [2023-03-15 22:55:20,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:55:20,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 22:55:20,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +6: [2023-03-15 22:55:20,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:55:20,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 22:55:20,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +1: [2023-03-15 22:55:20,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 22:55:20,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 22:55:20,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +1: [2023-03-15 22:55:20,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +1: [2023-03-15 22:55:20,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 22:55:20,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +6: [2023-03-15 22:55:20,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:55:20,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 22:55:20,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +2: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:55:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 22:55:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +2: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +4: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:55:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +4: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:55:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +4: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:55:20,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +7: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:55:20,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +7: [2023-03-15 22:55:20,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +7: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:55:20,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +7: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:55:20,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +1: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +7: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +7: [2023-03-15 22:55:20,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +1: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +2: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +1: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +2: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:55:20,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 22:55:20,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +1: [2023-03-15 22:55:20,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 22:55:20,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +2: [2023-03-15 22:55:20,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +2: [2023-03-15 22:55:20,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +1: [2023-03-15 22:55:20,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +1: [2023-03-15 22:55:20,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +6: [2023-03-15 22:55:20,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:55:20,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 22:55:20,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +2: [2023-03-15 22:55:20,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:55:20,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 22:55:20,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +2: [2023-03-15 22:55:20,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:55:20,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 22:55:20,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +2: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 22:55:20,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +6: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:55:20,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-15 22:55:20,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +6: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +6: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 22:55:20,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 22:55:20,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +6: [2023-03-15 22:55:20,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +4: [2023-03-15 22:55:20,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:55:20,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 22:55:20,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +4: [2023-03-15 22:55:20,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:55:20,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 22:55:20,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +4: [2023-03-15 22:55:20,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:55:20,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 22:55:20,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +0: [2023-03-15 22:55:20,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:55:20,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:55:20,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:55:20,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 22:55:20,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 22:55:20,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +0: [2023-03-15 22:55:20,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +0: [2023-03-15 22:55:20,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 22:55:20,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:55:20,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +0: [2023-03-15 22:55:20,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 22:55:20,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +0: [2023-03-15 22:55:20,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:55:20,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 22:55:20,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +0: [2023-03-15 22:55:20,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:55:20,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 22:55:20,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +0: [2023-03-15 22:55:20,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 22:55:20,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 22:55:20,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +5: [2023-03-15 22:55:20,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 22:55:20,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +5: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:55:20,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 22:55:20,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +5: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +5: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:55:20,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 22:55:20,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +5: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +5: [2023-03-15 22:55:20,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 22:55:20,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +7: [2023-03-15 22:55:20,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:55:20,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 22:55:20,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +7: [2023-03-15 22:55:20,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:55:20,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 22:55:20,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +7: [2023-03-15 22:55:20,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 22:55:20,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 22:55:20,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +3: [2023-03-15 22:55:20,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +4: [2023-03-15 22:55:20,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:55:20,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +3: [2023-03-15 22:55:20,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +4: [2023-03-15 22:55:20,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +3: [2023-03-15 22:55:20,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +4: [2023-03-15 22:55:20,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +3: [2023-03-15 22:55:20,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +3: [2023-03-15 22:55:20,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:55:20,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 22:55:20,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +3: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:55:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +3: [2023-03-15 22:55:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:55:20,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +3: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:55:20,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 22:55:20,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +3: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:55:20,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 22:55:20,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +3: [2023-03-15 22:55:20,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 22:55:20,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 22:55:20,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +0: [2023-03-15 22:55:20,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 22:55:20,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +5: [2023-03-15 22:55:20,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:55:20,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 22:55:20,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +5: [2023-03-15 22:55:20,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 22:55:20,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step7000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 22:55:20,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! +0: successfully saved checkpoint at iteration 7000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 748.10 +7: iteration 7010/ 11269 | consumed samples: 1794560 | consumed tokens: 3675258880 | elapsed time per iteration (s): 0.56 | learning rate: 7.733E-05 | global batch size: 256 | lm loss: 3.395437E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 456.048 | TFLOPs: 29.83 | +7: iteration 7020/ 11269 | consumed samples: 1797120 | consumed tokens: 3680501760 | elapsed time per iteration (s): 0.47 | learning rate: 7.710E-05 | global batch size: 256 | lm loss: 3.397600E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.887 | TFLOPs: 35.37 | +7: iteration 7030/ 11269 | consumed samples: 1799680 | consumed tokens: 3685744640 | elapsed time per iteration (s): 0.47 | learning rate: 7.686E-05 | global batch size: 256 | lm loss: 3.378922E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.932 | TFLOPs: 35.38 | +7: iteration 7040/ 11269 | consumed samples: 1802240 | consumed tokens: 3690987520 | elapsed time per iteration (s): 0.47 | learning rate: 7.662E-05 | global batch size: 256 | lm loss: 3.382222E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.487 | TFLOPs: 35.35 | +7: iteration 7050/ 11269 | consumed samples: 1804800 | consumed tokens: 3696230400 | elapsed time per iteration (s): 0.47 | learning rate: 7.639E-05 | global batch size: 256 | lm loss: 3.375137E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.575 | TFLOPs: 35.35 | +7: iteration 7060/ 11269 | consumed samples: 1807360 | consumed tokens: 3701473280 | elapsed time per iteration (s): 0.47 | learning rate: 7.615E-05 | global batch size: 256 | lm loss: 3.382994E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.589 | TFLOPs: 35.35 | +7: iteration 7070/ 11269 | consumed samples: 1809920 | consumed tokens: 3706716160 | elapsed time per iteration (s): 0.47 | learning rate: 7.592E-05 | global batch size: 256 | lm loss: 3.371759E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.376 | TFLOPs: 35.34 | +7: iteration 7080/ 11269 | consumed samples: 1812480 | consumed tokens: 3711959040 | elapsed time per iteration (s): 0.47 | learning rate: 7.569E-05 | global batch size: 256 | lm loss: 3.391137E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.332 | TFLOPs: 35.34 | +7: iteration 7090/ 11269 | consumed samples: 1815040 | consumed tokens: 3717201920 | elapsed time per iteration (s): 0.48 | learning rate: 7.545E-05 | global batch size: 256 | lm loss: 3.383102E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.329 | TFLOPs: 35.21 | +7: iteration 7100/ 11269 | consumed samples: 1817600 | consumed tokens: 3722444800 | elapsed time per iteration (s): 0.48 | learning rate: 7.522E-05 | global batch size: 256 | lm loss: 3.383538E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.830 | TFLOPs: 35.24 | +7: iteration 7110/ 11269 | consumed samples: 1820160 | consumed tokens: 3727687680 | elapsed time per iteration (s): 0.47 | learning rate: 7.498E-05 | global batch size: 256 | lm loss: 3.390726E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.853 | TFLOPs: 35.31 | +7: iteration 7120/ 11269 | consumed samples: 1822720 | consumed tokens: 3732930560 | elapsed time per iteration (s): 0.47 | learning rate: 7.475E-05 | global batch size: 256 | lm loss: 3.398857E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.867 | TFLOPs: 35.31 | +7: iteration 7130/ 11269 | consumed samples: 1825280 | consumed tokens: 3738173440 | elapsed time per iteration (s): 0.47 | learning rate: 7.452E-05 | global batch size: 256 | lm loss: 3.389166E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.822 | TFLOPs: 35.30 | +7: iteration 7140/ 11269 | consumed samples: 1827840 | consumed tokens: 3743416320 | elapsed time per iteration (s): 0.47 | learning rate: 7.428E-05 | global batch size: 256 | lm loss: 3.384270E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.866 | TFLOPs: 35.31 | +7: iteration 7150/ 11269 | consumed samples: 1830400 | consumed tokens: 3748659200 | elapsed time per iteration (s): 0.48 | learning rate: 7.405E-05 | global batch size: 256 | lm loss: 3.379140E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.531 | TFLOPs: 35.15 | +7: iteration 7160/ 11269 | consumed samples: 1832960 | consumed tokens: 3753902080 | elapsed time per iteration (s): 0.47 | learning rate: 7.382E-05 | global batch size: 256 | lm loss: 3.382446E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.670 | TFLOPs: 35.29 | +7: iteration 7170/ 11269 | consumed samples: 1835520 | consumed tokens: 3759144960 | elapsed time per iteration (s): 0.47 | learning rate: 7.359E-05 | global batch size: 256 | lm loss: 3.378856E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.927 | TFLOPs: 35.31 | +7: iteration 7180/ 11269 | consumed samples: 1838080 | consumed tokens: 3764387840 | elapsed time per iteration (s): 0.47 | learning rate: 7.336E-05 | global batch size: 256 | lm loss: 3.383644E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.311 | TFLOPs: 35.27 | +7: iteration 7190/ 11269 | consumed samples: 1840640 | consumed tokens: 3769630720 | elapsed time per iteration (s): 0.47 | learning rate: 7.313E-05 | global batch size: 256 | lm loss: 3.382055E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.589 | TFLOPs: 35.29 | +7: iteration 7200/ 11269 | consumed samples: 1843200 | consumed tokens: 3774873600 | elapsed time per iteration (s): 0.47 | learning rate: 7.289E-05 | global batch size: 256 | lm loss: 3.396053E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.540 | TFLOPs: 35.29 | +7: iteration 7210/ 11269 | consumed samples: 1845760 | consumed tokens: 3780116480 | elapsed time per iteration (s): 0.47 | learning rate: 7.266E-05 | global batch size: 256 | lm loss: 3.376611E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.856 | TFLOPs: 35.31 | +7: iteration 7220/ 11269 | consumed samples: 1848320 | consumed tokens: 3785359360 | elapsed time per iteration (s): 0.47 | learning rate: 7.243E-05 | global batch size: 256 | lm loss: 3.388765E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.709 | TFLOPs: 35.30 | +7: iteration 7230/ 11269 | consumed samples: 1850880 | consumed tokens: 3790602240 | elapsed time per iteration (s): 0.47 | learning rate: 7.220E-05 | global batch size: 256 | lm loss: 3.378211E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.569 | TFLOPs: 35.29 | +7: iteration 7240/ 11269 | consumed samples: 1853440 | consumed tokens: 3795845120 | elapsed time per iteration (s): 0.47 | learning rate: 7.197E-05 | global batch size: 256 | lm loss: 3.382515E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.998 | TFLOPs: 35.32 | +7: iteration 7250/ 11269 | consumed samples: 1856000 | consumed tokens: 3801088000 | elapsed time per iteration (s): 0.47 | learning rate: 7.174E-05 | global batch size: 256 | lm loss: 3.382980E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.509 | TFLOPs: 35.28 | +7: iteration 7260/ 11269 | consumed samples: 1858560 | consumed tokens: 3806330880 | elapsed time per iteration (s): 0.47 | learning rate: 7.151E-05 | global batch size: 256 | lm loss: 3.363499E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.576 | TFLOPs: 35.29 | +7: iteration 7270/ 11269 | consumed samples: 1861120 | consumed tokens: 3811573760 | elapsed time per iteration (s): 0.47 | learning rate: 7.129E-05 | global batch size: 256 | lm loss: 3.385050E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.542 | TFLOPs: 35.29 | +7: iteration 7280/ 11269 | consumed samples: 1863680 | consumed tokens: 3816816640 | elapsed time per iteration (s): 0.47 | learning rate: 7.106E-05 | global batch size: 256 | lm loss: 3.364845E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.694 | TFLOPs: 35.30 | +7: iteration 7290/ 11269 | consumed samples: 1866240 | consumed tokens: 3822059520 | elapsed time per iteration (s): 0.47 | learning rate: 7.083E-05 | global batch size: 256 | lm loss: 3.370649E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.545 | TFLOPs: 35.29 | +7: iteration 7300/ 11269 | consumed samples: 1868800 | consumed tokens: 3827302400 | elapsed time per iteration (s): 0.47 | learning rate: 7.060E-05 | global batch size: 256 | lm loss: 3.375064E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.609 | TFLOPs: 35.29 | +7: iteration 7310/ 11269 | consumed samples: 1871360 | consumed tokens: 3832545280 | elapsed time per iteration (s): 0.47 | learning rate: 7.037E-05 | global batch size: 256 | lm loss: 3.377546E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.747 | TFLOPs: 35.30 | +7: iteration 7320/ 11269 | consumed samples: 1873920 | consumed tokens: 3837788160 | elapsed time per iteration (s): 0.47 | learning rate: 7.014E-05 | global batch size: 256 | lm loss: 3.375941E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.750 | TFLOPs: 35.30 | +7: iteration 7330/ 11269 | consumed samples: 1876480 | consumed tokens: 3843031040 | elapsed time per iteration (s): 0.47 | learning rate: 6.992E-05 | global batch size: 256 | lm loss: 3.383419E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.553 | TFLOPs: 35.29 | +7: iteration 7340/ 11269 | consumed samples: 1879040 | consumed tokens: 3848273920 | elapsed time per iteration (s): 0.47 | learning rate: 6.969E-05 | global batch size: 256 | lm loss: 3.382812E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.871 | TFLOPs: 35.31 | +7: iteration 7350/ 11269 | consumed samples: 1881600 | consumed tokens: 3853516800 | elapsed time per iteration (s): 0.47 | learning rate: 6.946E-05 | global batch size: 256 | lm loss: 3.371232E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.608 | TFLOPs: 35.29 | +7: iteration 7360/ 11269 | consumed samples: 1884160 | consumed tokens: 3858759680 | elapsed time per iteration (s): 0.47 | learning rate: 6.924E-05 | global batch size: 256 | lm loss: 3.372501E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.036 | TFLOPs: 35.32 | +7: iteration 7370/ 11269 | consumed samples: 1886720 | consumed tokens: 3864002560 | elapsed time per iteration (s): 0.47 | learning rate: 6.901E-05 | global batch size: 256 | lm loss: 3.369599E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.509 | TFLOPs: 35.28 | +7: iteration 7380/ 11269 | consumed samples: 1889280 | consumed tokens: 3869245440 | elapsed time per iteration (s): 0.47 | learning rate: 6.879E-05 | global batch size: 256 | lm loss: 3.385218E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.247 | TFLOPs: 35.27 | +7: iteration 7390/ 11269 | consumed samples: 1891840 | consumed tokens: 3874488320 | elapsed time per iteration (s): 0.47 | learning rate: 6.856E-05 | global batch size: 256 | lm loss: 3.370727E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.894 | TFLOPs: 35.31 | +7: iteration 7400/ 11269 | consumed samples: 1894400 | consumed tokens: 3879731200 | elapsed time per iteration (s): 0.47 | learning rate: 6.834E-05 | global batch size: 256 | lm loss: 3.375203E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.571 | TFLOPs: 35.29 | +7: iteration 7410/ 11269 | consumed samples: 1896960 | consumed tokens: 3884974080 | elapsed time per iteration (s): 0.47 | learning rate: 6.811E-05 | global batch size: 256 | lm loss: 3.356388E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.512 | TFLOPs: 35.28 | +7: iteration 7420/ 11269 | consumed samples: 1899520 | consumed tokens: 3890216960 | elapsed time per iteration (s): 0.47 | learning rate: 6.789E-05 | global batch size: 256 | lm loss: 3.366221E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.725 | TFLOPs: 35.30 | +7: iteration 7430/ 11269 | consumed samples: 1902080 | consumed tokens: 3895459840 | elapsed time per iteration (s): 0.48 | learning rate: 6.766E-05 | global batch size: 256 | lm loss: 3.366190E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.560 | TFLOPs: 35.22 | +7: iteration 7440/ 11269 | consumed samples: 1904640 | consumed tokens: 3900702720 | elapsed time per iteration (s): 0.47 | learning rate: 6.744E-05 | global batch size: 256 | lm loss: 3.371812E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.815 | TFLOPs: 35.30 | +7: iteration 7450/ 11269 | consumed samples: 1907200 | consumed tokens: 3905945600 | elapsed time per iteration (s): 0.47 | learning rate: 6.722E-05 | global batch size: 256 | lm loss: 3.361149E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.599 | TFLOPs: 35.29 | +7: iteration 7460/ 11269 | consumed samples: 1909760 | consumed tokens: 3911188480 | elapsed time per iteration (s): 0.47 | learning rate: 6.700E-05 | global batch size: 256 | lm loss: 3.362653E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.432 | TFLOPs: 35.28 | +7: iteration 7470/ 11269 | consumed samples: 1912320 | consumed tokens: 3916431360 | elapsed time per iteration (s): 0.48 | learning rate: 6.677E-05 | global batch size: 256 | lm loss: 3.366566E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.427 | TFLOPs: 34.95 | +7: iteration 7480/ 11269 | consumed samples: 1914880 | consumed tokens: 3921674240 | elapsed time per iteration (s): 0.48 | learning rate: 6.655E-05 | global batch size: 256 | lm loss: 3.370388E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.242 | TFLOPs: 34.74 | +7: iteration 7490/ 11269 | consumed samples: 1917440 | consumed tokens: 3926917120 | elapsed time per iteration (s): 0.48 | learning rate: 6.633E-05 | global batch size: 256 | lm loss: 3.372507E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.201 | TFLOPs: 35.20 | +7: iteration 7500/ 11269 | consumed samples: 1920000 | consumed tokens: 3932160000 | elapsed time per iteration (s): 0.48 | learning rate: 6.611E-05 | global batch size: 256 | lm loss: 3.371844E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.788 | TFLOPs: 35.24 | +7: iteration 7510/ 11269 | consumed samples: 1922560 | consumed tokens: 3937402880 | elapsed time per iteration (s): 0.47 | learning rate: 6.589E-05 | global batch size: 256 | lm loss: 3.371706E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.136 | TFLOPs: 35.32 | +7: iteration 7520/ 11269 | consumed samples: 1925120 | consumed tokens: 3942645760 | elapsed time per iteration (s): 0.47 | learning rate: 6.567E-05 | global batch size: 256 | lm loss: 3.365607E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.703 | TFLOPs: 35.30 | +7: iteration 7530/ 11269 | consumed samples: 1927680 | consumed tokens: 3947888640 | elapsed time per iteration (s): 0.47 | learning rate: 6.545E-05 | global batch size: 256 | lm loss: 3.370324E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.211 | TFLOPs: 35.33 | +7: iteration 7540/ 11269 | consumed samples: 1930240 | consumed tokens: 3953131520 | elapsed time per iteration (s): 0.47 | learning rate: 6.523E-05 | global batch size: 256 | lm loss: 3.383733E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.663 | TFLOPs: 35.29 | +7: iteration 7550/ 11269 | consumed samples: 1932800 | consumed tokens: 3958374400 | elapsed time per iteration (s): 0.47 | learning rate: 6.501E-05 | global batch size: 256 | lm loss: 3.382576E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.715 | TFLOPs: 35.30 | +7: iteration 7560/ 11269 | consumed samples: 1935360 | consumed tokens: 3963617280 | elapsed time per iteration (s): 0.47 | learning rate: 6.479E-05 | global batch size: 256 | lm loss: 3.381438E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.046 | TFLOPs: 35.32 | +7: iteration 7570/ 11269 | consumed samples: 1937920 | consumed tokens: 3968860160 | elapsed time per iteration (s): 0.47 | learning rate: 6.457E-05 | global batch size: 256 | lm loss: 3.369043E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.602 | TFLOPs: 35.29 | +7: iteration 7580/ 11269 | consumed samples: 1940480 | consumed tokens: 3974103040 | elapsed time per iteration (s): 0.47 | learning rate: 6.435E-05 | global batch size: 256 | lm loss: 3.374106E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.564 | TFLOPs: 35.29 | +7: iteration 7590/ 11269 | consumed samples: 1943040 | consumed tokens: 3979345920 | elapsed time per iteration (s): 0.47 | learning rate: 6.413E-05 | global batch size: 256 | lm loss: 3.369818E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.163 | TFLOPs: 35.26 | +7: iteration 7600/ 11269 | consumed samples: 1945600 | consumed tokens: 3984588800 | elapsed time per iteration (s): 0.47 | learning rate: 6.391E-05 | global batch size: 256 | lm loss: 3.359015E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.533 | TFLOPs: 35.29 | +7: iteration 7610/ 11269 | consumed samples: 1948160 | consumed tokens: 3989831680 | elapsed time per iteration (s): 0.47 | learning rate: 6.370E-05 | global batch size: 256 | lm loss: 3.376344E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.005 | TFLOPs: 35.25 | +7: iteration 7620/ 11269 | consumed samples: 1950720 | consumed tokens: 3995074560 | elapsed time per iteration (s): 0.47 | learning rate: 6.348E-05 | global batch size: 256 | lm loss: 3.368158E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.877 | TFLOPs: 35.31 | +7: iteration 7630/ 11269 | consumed samples: 1953280 | consumed tokens: 4000317440 | elapsed time per iteration (s): 0.47 | learning rate: 6.326E-05 | global batch size: 256 | lm loss: 3.371413E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.768 | TFLOPs: 35.30 | +7: iteration 7640/ 11269 | consumed samples: 1955840 | consumed tokens: 4005560320 | elapsed time per iteration (s): 0.47 | learning rate: 6.305E-05 | global batch size: 256 | lm loss: 3.366294E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.776 | TFLOPs: 35.30 | +7: iteration 7650/ 11269 | consumed samples: 1958400 | consumed tokens: 4010803200 | elapsed time per iteration (s): 0.47 | learning rate: 6.283E-05 | global batch size: 256 | lm loss: 3.361715E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.383 | TFLOPs: 35.28 | +7: iteration 7660/ 11269 | consumed samples: 1960960 | consumed tokens: 4016046080 | elapsed time per iteration (s): 0.47 | learning rate: 6.261E-05 | global batch size: 256 | lm loss: 3.358666E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.634 | TFLOPs: 35.29 | +7: iteration 7670/ 11269 | consumed samples: 1963520 | consumed tokens: 4021288960 | elapsed time per iteration (s): 0.47 | learning rate: 6.240E-05 | global batch size: 256 | lm loss: 3.365503E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.611 | TFLOPs: 35.29 | +7: iteration 7680/ 11269 | consumed samples: 1966080 | consumed tokens: 4026531840 | elapsed time per iteration (s): 0.47 | learning rate: 6.218E-05 | global batch size: 256 | lm loss: 3.370166E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.396 | TFLOPs: 35.28 | +7: iteration 7690/ 11269 | consumed samples: 1968640 | consumed tokens: 4031774720 | elapsed time per iteration (s): 0.47 | learning rate: 6.197E-05 | global batch size: 256 | lm loss: 3.363000E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.656 | TFLOPs: 35.29 | +7: iteration 7700/ 11269 | consumed samples: 1971200 | consumed tokens: 4037017600 | elapsed time per iteration (s): 0.47 | learning rate: 6.175E-05 | global batch size: 256 | lm loss: 3.351837E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.519 | TFLOPs: 35.28 | +7: iteration 7710/ 11269 | consumed samples: 1973760 | consumed tokens: 4042260480 | elapsed time per iteration (s): 0.47 | learning rate: 6.154E-05 | global batch size: 256 | lm loss: 3.362171E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.504 | TFLOPs: 35.28 | +7: iteration 7720/ 11269 | consumed samples: 1976320 | consumed tokens: 4047503360 | elapsed time per iteration (s): 0.47 | learning rate: 6.133E-05 | global batch size: 256 | lm loss: 3.368191E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.797 | TFLOPs: 35.30 | +7: iteration 7730/ 11269 | consumed samples: 1978880 | consumed tokens: 4052746240 | elapsed time per iteration (s): 0.47 | learning rate: 6.111E-05 | global batch size: 256 | lm loss: 3.369874E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.484 | TFLOPs: 35.28 | +7: iteration 7740/ 11269 | consumed samples: 1981440 | consumed tokens: 4057989120 | elapsed time per iteration (s): 0.47 | learning rate: 6.090E-05 | global batch size: 256 | lm loss: 3.353785E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.710 | TFLOPs: 35.30 | +7: iteration 7750/ 11269 | consumed samples: 1984000 | consumed tokens: 4063232000 | elapsed time per iteration (s): 0.47 | learning rate: 6.069E-05 | global batch size: 256 | lm loss: 3.379349E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.633 | TFLOPs: 35.29 | +7: iteration 7760/ 11269 | consumed samples: 1986560 | consumed tokens: 4068474880 | elapsed time per iteration (s): 0.47 | learning rate: 6.048E-05 | global batch size: 256 | lm loss: 3.376830E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.489 | TFLOPs: 35.28 | +7: iteration 7770/ 11269 | consumed samples: 1989120 | consumed tokens: 4073717760 | elapsed time per iteration (s): 0.47 | learning rate: 6.027E-05 | global batch size: 256 | lm loss: 3.373725E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.863 | TFLOPs: 35.31 | +7: iteration 7780/ 11269 | consumed samples: 1991680 | consumed tokens: 4078960640 | elapsed time per iteration (s): 0.47 | learning rate: 6.006E-05 | global batch size: 256 | lm loss: 3.362823E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.562 | TFLOPs: 35.29 | +7: iteration 7790/ 11269 | consumed samples: 1994240 | consumed tokens: 4084203520 | elapsed time per iteration (s): 0.47 | learning rate: 5.984E-05 | global batch size: 256 | lm loss: 3.360585E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.861 | TFLOPs: 35.31 | +7: iteration 7800/ 11269 | consumed samples: 1996800 | consumed tokens: 4089446400 | elapsed time per iteration (s): 0.47 | learning rate: 5.963E-05 | global batch size: 256 | lm loss: 3.362442E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.146 | TFLOPs: 35.33 | +7: iteration 7810/ 11269 | consumed samples: 1999360 | consumed tokens: 4094689280 | elapsed time per iteration (s): 0.47 | learning rate: 5.942E-05 | global batch size: 256 | lm loss: 3.366068E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.005 | TFLOPs: 35.32 | +7: iteration 7820/ 11269 | consumed samples: 2001920 | consumed tokens: 4099932160 | elapsed time per iteration (s): 0.47 | learning rate: 5.922E-05 | global batch size: 256 | lm loss: 3.360478E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.545 | TFLOPs: 35.29 | +7: iteration 7830/ 11269 | consumed samples: 2004480 | consumed tokens: 4105175040 | elapsed time per iteration (s): 0.47 | learning rate: 5.901E-05 | global batch size: 256 | lm loss: 3.359552E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.071 | TFLOPs: 35.32 | +7: iteration 7840/ 11269 | consumed samples: 2007040 | consumed tokens: 4110417920 | elapsed time per iteration (s): 0.47 | learning rate: 5.880E-05 | global batch size: 256 | lm loss: 3.357920E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.870 | TFLOPs: 35.31 | +7: iteration 7850/ 11269 | consumed samples: 2009600 | consumed tokens: 4115660800 | elapsed time per iteration (s): 0.47 | learning rate: 5.859E-05 | global batch size: 256 | lm loss: 3.355674E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.998 | TFLOPs: 35.32 | +7: iteration 7860/ 11269 | consumed samples: 2012160 | consumed tokens: 4120903680 | elapsed time per iteration (s): 0.47 | learning rate: 5.838E-05 | global batch size: 256 | lm loss: 3.355836E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.612 | TFLOPs: 35.29 | +7: iteration 7870/ 11269 | consumed samples: 2014720 | consumed tokens: 4126146560 | elapsed time per iteration (s): 0.47 | learning rate: 5.817E-05 | global batch size: 256 | lm loss: 3.367831E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.695 | TFLOPs: 35.30 | +7: iteration 7880/ 11269 | consumed samples: 2017280 | consumed tokens: 4131389440 | elapsed time per iteration (s): 0.47 | learning rate: 5.797E-05 | global batch size: 256 | lm loss: 3.365918E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.765 | TFLOPs: 35.30 | +7: iteration 7890/ 11269 | consumed samples: 2019840 | consumed tokens: 4136632320 | elapsed time per iteration (s): 0.47 | learning rate: 5.776E-05 | global batch size: 256 | lm loss: 3.358317E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.041 | TFLOPs: 35.32 | +7: iteration 7900/ 11269 | consumed samples: 2022400 | consumed tokens: 4141875200 | elapsed time per iteration (s): 0.47 | learning rate: 5.755E-05 | global batch size: 256 | lm loss: 3.363395E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.590 | TFLOPs: 35.29 | +7: iteration 7910/ 11269 | consumed samples: 2024960 | consumed tokens: 4147118080 | elapsed time per iteration (s): 0.47 | learning rate: 5.735E-05 | global batch size: 256 | lm loss: 3.362957E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.897 | TFLOPs: 35.31 | +7: iteration 7920/ 11269 | consumed samples: 2027520 | consumed tokens: 4152360960 | elapsed time per iteration (s): 0.47 | learning rate: 5.714E-05 | global batch size: 256 | lm loss: 3.350002E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.743 | TFLOPs: 35.30 | +7: iteration 7930/ 11269 | consumed samples: 2030080 | consumed tokens: 4157603840 | elapsed time per iteration (s): 0.47 | learning rate: 5.694E-05 | global batch size: 256 | lm loss: 3.352530E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.667 | TFLOPs: 35.29 | +7: iteration 7940/ 11269 | consumed samples: 2032640 | consumed tokens: 4162846720 | elapsed time per iteration (s): 0.47 | learning rate: 5.673E-05 | global batch size: 256 | lm loss: 3.355146E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.901 | TFLOPs: 35.31 | +7: iteration 7950/ 11269 | consumed samples: 2035200 | consumed tokens: 4168089600 | elapsed time per iteration (s): 0.47 | learning rate: 5.653E-05 | global batch size: 256 | lm loss: 3.352403E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.826 | TFLOPs: 35.30 | +7: iteration 7960/ 11269 | consumed samples: 2037760 | consumed tokens: 4173332480 | elapsed time per iteration (s): 0.47 | learning rate: 5.633E-05 | global batch size: 256 | lm loss: 3.349899E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.618 | TFLOPs: 35.29 | +7: iteration 7970/ 11269 | consumed samples: 2040320 | consumed tokens: 4178575360 | elapsed time per iteration (s): 0.49 | learning rate: 5.612E-05 | global batch size: 256 | lm loss: 3.344794E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 524.333 | TFLOPs: 34.29 | +7: iteration 7980/ 11269 | consumed samples: 2042880 | consumed tokens: 4183818240 | elapsed time per iteration (s): 0.49 | learning rate: 5.592E-05 | global batch size: 256 | lm loss: 3.356477E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 520.993 | TFLOPs: 34.07 | +7: iteration 7990/ 11269 | consumed samples: 2045440 | consumed tokens: 4189061120 | elapsed time per iteration (s): 0.47 | learning rate: 5.572E-05 | global batch size: 256 | lm loss: 3.348618E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.207 | TFLOPs: 35.26 | +0: [2023-03-15 23:03:15,793] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=0, lr=[5.551588445114958e-05, 5.551588445114958e-05, 5.551588445114958e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 8000/ 11269 | consumed samples: 2048000 | consumed tokens: 4194304000 | elapsed time per iteration (s): 0.48 | learning rate: 5.552E-05 | global batch size: 256 | lm loss: 3.341730E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.113 | TFLOPs: 34.93 | +0: steps: 8000 loss: 3.3186 iter time (s): 0.473 samples/sec: 541.741 +7: ----------------------------------------------------------------------------------------------- +7: validation loss at iteration 8000 | lm loss value: 3.467995E+00 | lm loss PPL: 3.207236E+01 | +7: ----------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 8000 to checkpoints_280m5b9400m +0: [2023-03-15 23:03:15,973] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step8000 is begin to save! +0: [2023-03-15 23:03:15,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:03:16,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:03:16,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:03:16,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:03:16,135] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:03:16,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:03:16,161] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:03:16,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:03:16,188] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:03:16,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:03:16,215] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:03:16,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:03:16,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:03:16,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:03:16,265] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:03:16,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:03:16,292] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:03:16,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:03:16,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:03:16,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:03:16,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:03:16,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:03:16,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:03:16,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:03:16,398] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:03:16,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:03:16,424] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:03:16,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:03:16,450] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:03:16,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:03:16,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:03:16,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:03:16,502] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:03:16,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:03:16,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:03:16,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:03:16,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:03:16,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:03:16,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:03:16,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:03:16,587] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step8000/mp_rank_00_model_states.pt +0: [2023-03-15 23:03:16,587] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/mp_rank_00_model_states.pt... +0: [2023-03-15 23:03:16,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/mp_rank_00_model_states.pt. +0: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:03:16,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:03:16,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 23:03:16,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +6: [2023-03-15 23:03:16,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:03:16,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 23:03:16,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +0: [2023-03-15 23:03:16,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:03:16,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 23:03:16,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +0: [2023-03-15 23:03:16,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:03:16,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 23:03:16,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +3: [2023-03-15 23:03:16,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 23:03:16,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +0: [2023-03-15 23:03:16,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:03:16,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 23:03:16,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +6: [2023-03-15 23:03:16,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:03:16,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 23:03:16,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +4: [2023-03-15 23:03:16,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:03:16,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-15 23:03:16,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +4: [2023-03-15 23:03:16,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:03:16,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 23:03:16,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +4: [2023-03-15 23:03:16,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:03:16,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 23:03:16,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +6: [2023-03-15 23:03:16,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:03:16,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:03:16,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-15 23:03:16,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 23:03:16,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +6: [2023-03-15 23:03:16,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +4: [2023-03-15 23:03:16,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:03:16,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-15 23:03:16,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:03:16,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 23:03:16,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:03:16,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 23:03:16,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 23:03:16,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +7: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +7: [2023-03-15 23:03:16,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:03:16,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 23:03:16,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +6: [2023-03-15 23:03:16,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:03:16,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 23:03:16,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +7: [2023-03-15 23:03:16,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:03:16,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 23:03:16,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +6: [2023-03-15 23:03:16,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:03:16,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 23:03:16,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +6: [2023-03-15 23:03:16,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:03:16,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 23:03:16,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +4: [2023-03-15 23:03:16,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:03:16,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 23:03:16,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +2: [2023-03-15 23:03:16,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:03:16,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:03:16,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:03:16,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 23:03:16,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +2: [2023-03-15 23:03:16,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 23:03:16,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 23:03:16,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +2: [2023-03-15 23:03:16,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +5: [2023-03-15 23:03:16,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:03:16,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:03:16,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 23:03:16,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +4: [2023-03-15 23:03:16,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:03:16,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-15 23:03:16,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +4: [2023-03-15 23:03:16,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:03:16,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 23:03:16,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +4: [2023-03-15 23:03:16,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:03:16,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 23:03:16,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +0: [2023-03-15 23:03:16,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:03:16,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 23:03:16,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +0: [2023-03-15 23:03:16,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:03:16,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 23:03:16,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +0: [2023-03-15 23:03:16,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:03:16,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 23:03:16,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +2: [2023-03-15 23:03:16,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:03:16,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 23:03:16,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +2: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:03:16,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 23:03:16,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 23:03:16,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 23:03:16,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +2: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +2: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +2: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +6: [2023-03-15 23:03:16,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:03:16,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 23:03:16,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +1: [2023-03-15 23:03:16,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 23:03:16,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-15 23:03:16,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +1: [2023-03-15 23:03:16,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +1: [2023-03-15 23:03:16,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:03:16,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 23:03:16,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +1: [2023-03-15 23:03:16,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 23:03:16,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 23:03:16,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 23:03:16,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +1: [2023-03-15 23:03:16,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +1: [2023-03-15 23:03:16,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +3: [2023-03-15 23:03:16,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 23:03:16,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +3: [2023-03-15 23:03:16,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:03:16,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 23:03:16,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +3: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:03:16,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-15 23:03:16,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +3: [2023-03-15 23:03:16,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +3: [2023-03-15 23:03:16,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:03:16,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 23:03:16,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +3: [2023-03-15 23:03:16,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:03:16,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 23:03:16,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +3: [2023-03-15 23:03:16,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:03:16,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 23:03:16,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +3: [2023-03-15 23:03:16,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:03:16,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 23:03:16,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +5: [2023-03-15 23:03:16,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 23:03:16,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +5: [2023-03-15 23:03:16,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:03:16,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 23:03:16,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +5: [2023-03-15 23:03:16,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:03:16,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 23:03:16,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +5: [2023-03-15 23:03:16,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:03:16,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 23:03:16,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +5: [2023-03-15 23:03:16,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:03:16,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 23:03:16,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +5: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:03:16,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 23:03:16,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +5: [2023-03-15 23:03:16,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:03:16,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 23:03:16,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +5: [2023-03-15 23:03:16,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:03:16,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 23:03:16,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +0: [2023-03-15 23:03:16,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:03:16,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 23:03:16,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +0: [2023-03-15 23:03:16,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step8000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 23:03:16,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! +0: successfully saved checkpoint at iteration 8000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 817.74 +7: iteration 8010/ 11269 | consumed samples: 2050560 | consumed tokens: 4199546880 | elapsed time per iteration (s): 0.59 | learning rate: 5.531E-05 | global batch size: 256 | lm loss: 3.346957E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 432.532 | TFLOPs: 28.29 | +7: iteration 8020/ 11269 | consumed samples: 2053120 | consumed tokens: 4204789760 | elapsed time per iteration (s): 0.48 | learning rate: 5.511E-05 | global batch size: 256 | lm loss: 3.352927E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.893 | TFLOPs: 34.85 | +7: iteration 8030/ 11269 | consumed samples: 2055680 | consumed tokens: 4210032640 | elapsed time per iteration (s): 0.49 | learning rate: 5.491E-05 | global batch size: 256 | lm loss: 3.351870E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 524.506 | TFLOPs: 34.30 | +7: iteration 8040/ 11269 | consumed samples: 2058240 | consumed tokens: 4215275520 | elapsed time per iteration (s): 0.47 | learning rate: 5.471E-05 | global batch size: 256 | lm loss: 3.357246E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.291 | TFLOPs: 35.40 | +7: iteration 8050/ 11269 | consumed samples: 2060800 | consumed tokens: 4220518400 | elapsed time per iteration (s): 0.47 | learning rate: 5.451E-05 | global batch size: 256 | lm loss: 3.346156E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.136 | TFLOPs: 35.39 | +7: iteration 8060/ 11269 | consumed samples: 2063360 | consumed tokens: 4225761280 | elapsed time per iteration (s): 0.47 | learning rate: 5.431E-05 | global batch size: 256 | lm loss: 3.348359E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.963 | TFLOPs: 35.38 | +7: iteration 8070/ 11269 | consumed samples: 2065920 | consumed tokens: 4231004160 | elapsed time per iteration (s): 0.47 | learning rate: 5.411E-05 | global batch size: 256 | lm loss: 3.352924E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.081 | TFLOPs: 35.39 | +7: iteration 8080/ 11269 | consumed samples: 2068480 | consumed tokens: 4236247040 | elapsed time per iteration (s): 0.47 | learning rate: 5.392E-05 | global batch size: 256 | lm loss: 3.359683E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.536 | TFLOPs: 35.35 | +7: iteration 8090/ 11269 | consumed samples: 2071040 | consumed tokens: 4241489920 | elapsed time per iteration (s): 0.51 | learning rate: 5.372E-05 | global batch size: 256 | lm loss: 3.353542E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 497.669 | TFLOPs: 32.55 | +7: iteration 8100/ 11269 | consumed samples: 2073600 | consumed tokens: 4246732800 | elapsed time per iteration (s): 0.55 | learning rate: 5.352E-05 | global batch size: 256 | lm loss: 3.357018E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 462.617 | TFLOPs: 30.26 | +7: iteration 8110/ 11269 | consumed samples: 2076160 | consumed tokens: 4251975680 | elapsed time per iteration (s): 0.57 | learning rate: 5.332E-05 | global batch size: 256 | lm loss: 3.341851E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 449.346 | TFLOPs: 29.39 | +7: iteration 8120/ 11269 | consumed samples: 2078720 | consumed tokens: 4257218560 | elapsed time per iteration (s): 0.51 | learning rate: 5.313E-05 | global batch size: 256 | lm loss: 3.350813E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 501.924 | TFLOPs: 32.83 | +7: iteration 8130/ 11269 | consumed samples: 2081280 | consumed tokens: 4262461440 | elapsed time per iteration (s): 0.48 | learning rate: 5.293E-05 | global batch size: 256 | lm loss: 3.342475E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.689 | TFLOPs: 34.90 | +7: iteration 8140/ 11269 | consumed samples: 2083840 | consumed tokens: 4267704320 | elapsed time per iteration (s): 0.48 | learning rate: 5.273E-05 | global batch size: 256 | lm loss: 3.350589E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 530.813 | TFLOPs: 34.71 | +7: iteration 8150/ 11269 | consumed samples: 2086400 | consumed tokens: 4272947200 | elapsed time per iteration (s): 0.48 | learning rate: 5.254E-05 | global batch size: 256 | lm loss: 3.352605E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.590 | TFLOPs: 34.77 | +7: iteration 8160/ 11269 | consumed samples: 2088960 | consumed tokens: 4278190080 | elapsed time per iteration (s): 0.49 | learning rate: 5.234E-05 | global batch size: 256 | lm loss: 3.348596E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 522.621 | TFLOPs: 34.18 | +7: iteration 8170/ 11269 | consumed samples: 2091520 | consumed tokens: 4283432960 | elapsed time per iteration (s): 0.51 | learning rate: 5.215E-05 | global batch size: 256 | lm loss: 3.348682E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 499.077 | TFLOPs: 32.64 | +7: iteration 8180/ 11269 | consumed samples: 2094080 | consumed tokens: 4288675840 | elapsed time per iteration (s): 0.48 | learning rate: 5.196E-05 | global batch size: 256 | lm loss: 3.347631E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.698 | TFLOPs: 35.03 | +7: iteration 8190/ 11269 | consumed samples: 2096640 | consumed tokens: 4293918720 | elapsed time per iteration (s): 0.48 | learning rate: 5.176E-05 | global batch size: 256 | lm loss: 3.339325E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.827 | TFLOPs: 35.17 | +7: iteration 8200/ 11269 | consumed samples: 2099200 | consumed tokens: 4299161600 | elapsed time per iteration (s): 0.48 | learning rate: 5.157E-05 | global batch size: 256 | lm loss: 3.347945E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.128 | TFLOPs: 35.00 | +7: iteration 8210/ 11269 | consumed samples: 2101760 | consumed tokens: 4304404480 | elapsed time per iteration (s): 0.48 | learning rate: 5.138E-05 | global batch size: 256 | lm loss: 3.348700E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.891 | TFLOPs: 35.11 | +7: iteration 8220/ 11269 | consumed samples: 2104320 | consumed tokens: 4309647360 | elapsed time per iteration (s): 0.47 | learning rate: 5.119E-05 | global batch size: 256 | lm loss: 3.350323E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.608 | TFLOPs: 35.29 | +7: iteration 8230/ 11269 | consumed samples: 2106880 | consumed tokens: 4314890240 | elapsed time per iteration (s): 0.47 | learning rate: 5.099E-05 | global batch size: 256 | lm loss: 3.351265E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.976 | TFLOPs: 35.31 | +7: iteration 8240/ 11269 | consumed samples: 2109440 | consumed tokens: 4320133120 | elapsed time per iteration (s): 0.47 | learning rate: 5.080E-05 | global batch size: 256 | lm loss: 3.337860E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.009 | TFLOPs: 35.38 | +7: iteration 8250/ 11269 | consumed samples: 2112000 | consumed tokens: 4325376000 | elapsed time per iteration (s): 0.48 | learning rate: 5.061E-05 | global batch size: 256 | lm loss: 3.333883E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.154 | TFLOPs: 35.06 | +7: iteration 8260/ 11269 | consumed samples: 2114560 | consumed tokens: 4330618880 | elapsed time per iteration (s): 0.48 | learning rate: 5.042E-05 | global batch size: 256 | lm loss: 3.340737E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.900 | TFLOPs: 34.85 | +7: iteration 8270/ 11269 | consumed samples: 2117120 | consumed tokens: 4335861760 | elapsed time per iteration (s): 0.48 | learning rate: 5.023E-05 | global batch size: 256 | lm loss: 3.340205E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.973 | TFLOPs: 34.99 | +7: iteration 8280/ 11269 | consumed samples: 2119680 | consumed tokens: 4341104640 | elapsed time per iteration (s): 0.48 | learning rate: 5.004E-05 | global batch size: 256 | lm loss: 3.346501E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.929 | TFLOPs: 35.05 | +7: iteration 8290/ 11269 | consumed samples: 2122240 | consumed tokens: 4346347520 | elapsed time per iteration (s): 0.49 | learning rate: 4.985E-05 | global batch size: 256 | lm loss: 3.326854E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 519.795 | TFLOPs: 33.99 | +7: iteration 8300/ 11269 | consumed samples: 2124800 | consumed tokens: 4351590400 | elapsed time per iteration (s): 0.49 | learning rate: 4.967E-05 | global batch size: 256 | lm loss: 3.335303E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 526.075 | TFLOPs: 34.41 | +7: iteration 8310/ 11269 | consumed samples: 2127360 | consumed tokens: 4356833280 | elapsed time per iteration (s): 0.48 | learning rate: 4.948E-05 | global batch size: 256 | lm loss: 3.333305E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.472 | TFLOPs: 35.09 | +7: iteration 8320/ 11269 | consumed samples: 2129920 | consumed tokens: 4362076160 | elapsed time per iteration (s): 0.48 | learning rate: 4.929E-05 | global batch size: 256 | lm loss: 3.343834E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 530.003 | TFLOPs: 34.66 | +7: iteration 8330/ 11269 | consumed samples: 2132480 | consumed tokens: 4367319040 | elapsed time per iteration (s): 0.49 | learning rate: 4.910E-05 | global batch size: 256 | lm loss: 3.353397E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 526.724 | TFLOPs: 34.45 | +7: iteration 8340/ 11269 | consumed samples: 2135040 | consumed tokens: 4372561920 | elapsed time per iteration (s): 0.48 | learning rate: 4.892E-05 | global batch size: 256 | lm loss: 3.334957E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.632 | TFLOPs: 34.77 | +7: iteration 8350/ 11269 | consumed samples: 2137600 | consumed tokens: 4377804800 | elapsed time per iteration (s): 0.48 | learning rate: 4.873E-05 | global batch size: 256 | lm loss: 3.348095E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.331 | TFLOPs: 34.55 | +7: iteration 8360/ 11269 | consumed samples: 2140160 | consumed tokens: 4383047680 | elapsed time per iteration (s): 0.48 | learning rate: 4.855E-05 | global batch size: 256 | lm loss: 3.347230E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.318 | TFLOPs: 34.55 | +7: iteration 8370/ 11269 | consumed samples: 2142720 | consumed tokens: 4388290560 | elapsed time per iteration (s): 0.47 | learning rate: 4.836E-05 | global batch size: 256 | lm loss: 3.333686E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.319 | TFLOPs: 35.40 | +7: iteration 8380/ 11269 | consumed samples: 2145280 | consumed tokens: 4393533440 | elapsed time per iteration (s): 0.48 | learning rate: 4.818E-05 | global batch size: 256 | lm loss: 3.337275E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.212 | TFLOPs: 34.74 | +7: iteration 8390/ 11269 | consumed samples: 2147840 | consumed tokens: 4398776320 | elapsed time per iteration (s): 0.49 | learning rate: 4.799E-05 | global batch size: 256 | lm loss: 3.340240E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 520.735 | TFLOPs: 34.06 | +7: iteration 8400/ 11269 | consumed samples: 2150400 | consumed tokens: 4404019200 | elapsed time per iteration (s): 0.50 | learning rate: 4.781E-05 | global batch size: 256 | lm loss: 3.331691E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 509.269 | TFLOPs: 33.31 | +7: iteration 8410/ 11269 | consumed samples: 2152960 | consumed tokens: 4409262080 | elapsed time per iteration (s): 0.47 | learning rate: 4.763E-05 | global batch size: 256 | lm loss: 3.345345E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.091 | TFLOPs: 35.32 | +7: iteration 8420/ 11269 | consumed samples: 2155520 | consumed tokens: 4414504960 | elapsed time per iteration (s): 0.47 | learning rate: 4.744E-05 | global batch size: 256 | lm loss: 3.342693E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.090 | TFLOPs: 35.39 | +7: iteration 8430/ 11269 | consumed samples: 2158080 | consumed tokens: 4419747840 | elapsed time per iteration (s): 0.47 | learning rate: 4.726E-05 | global batch size: 256 | lm loss: 3.347787E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.683 | TFLOPs: 35.36 | +7: iteration 8440/ 11269 | consumed samples: 2160640 | consumed tokens: 4424990720 | elapsed time per iteration (s): 0.47 | learning rate: 4.708E-05 | global batch size: 256 | lm loss: 3.344220E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.360 | TFLOPs: 35.34 | +7: iteration 8450/ 11269 | consumed samples: 2163200 | consumed tokens: 4430233600 | elapsed time per iteration (s): 0.48 | learning rate: 4.690E-05 | global batch size: 256 | lm loss: 3.343270E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.192 | TFLOPs: 35.20 | +7: iteration 8460/ 11269 | consumed samples: 2165760 | consumed tokens: 4435476480 | elapsed time per iteration (s): 0.48 | learning rate: 4.672E-05 | global batch size: 256 | lm loss: 3.333337E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.467 | TFLOPs: 34.95 | +7: iteration 8470/ 11269 | consumed samples: 2168320 | consumed tokens: 4440719360 | elapsed time per iteration (s): 0.48 | learning rate: 4.654E-05 | global batch size: 256 | lm loss: 3.324895E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.339 | TFLOPs: 35.21 | +7: iteration 8480/ 11269 | consumed samples: 2170880 | consumed tokens: 4445962240 | elapsed time per iteration (s): 0.47 | learning rate: 4.636E-05 | global batch size: 256 | lm loss: 3.331133E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.056 | TFLOPs: 35.32 | +7: iteration 8490/ 11269 | consumed samples: 2173440 | consumed tokens: 4451205120 | elapsed time per iteration (s): 0.47 | learning rate: 4.618E-05 | global batch size: 256 | lm loss: 3.347952E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.302 | TFLOPs: 35.34 | +7: iteration 8500/ 11269 | consumed samples: 2176000 | consumed tokens: 4456448000 | elapsed time per iteration (s): 0.48 | learning rate: 4.600E-05 | global batch size: 256 | lm loss: 3.347871E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.147 | TFLOPs: 35.06 | +7: iteration 8510/ 11269 | consumed samples: 2178560 | consumed tokens: 4461690880 | elapsed time per iteration (s): 0.48 | learning rate: 4.582E-05 | global batch size: 256 | lm loss: 3.344165E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.975 | TFLOPs: 35.05 | +7: iteration 8520/ 11269 | consumed samples: 2181120 | consumed tokens: 4466933760 | elapsed time per iteration (s): 0.48 | learning rate: 4.565E-05 | global batch size: 256 | lm loss: 3.340998E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.721 | TFLOPs: 35.17 | +7: iteration 8530/ 11269 | consumed samples: 2183680 | consumed tokens: 4472176640 | elapsed time per iteration (s): 0.47 | learning rate: 4.547E-05 | global batch size: 256 | lm loss: 3.333259E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.059 | TFLOPs: 35.32 | +7: iteration 8540/ 11269 | consumed samples: 2186240 | consumed tokens: 4477419520 | elapsed time per iteration (s): 0.48 | learning rate: 4.529E-05 | global batch size: 256 | lm loss: 3.334092E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.024 | TFLOPs: 35.06 | +7: iteration 8550/ 11269 | consumed samples: 2188800 | consumed tokens: 4482662400 | elapsed time per iteration (s): 0.48 | learning rate: 4.512E-05 | global batch size: 256 | lm loss: 3.343022E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.228 | TFLOPs: 35.00 | +7: iteration 8560/ 11269 | consumed samples: 2191360 | consumed tokens: 4487905280 | elapsed time per iteration (s): 0.47 | learning rate: 4.494E-05 | global batch size: 256 | lm loss: 3.336051E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.028 | TFLOPs: 35.32 | +7: iteration 8570/ 11269 | consumed samples: 2193920 | consumed tokens: 4493148160 | elapsed time per iteration (s): 0.48 | learning rate: 4.477E-05 | global batch size: 256 | lm loss: 3.330336E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.049 | TFLOPs: 35.12 | +7: iteration 8580/ 11269 | consumed samples: 2196480 | consumed tokens: 4498391040 | elapsed time per iteration (s): 0.48 | learning rate: 4.459E-05 | global batch size: 256 | lm loss: 3.327454E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 531.281 | TFLOPs: 34.75 | +7: iteration 8590/ 11269 | consumed samples: 2199040 | consumed tokens: 4503633920 | elapsed time per iteration (s): 0.48 | learning rate: 4.442E-05 | global batch size: 256 | lm loss: 3.339904E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.165 | TFLOPs: 35.07 | +7: iteration 8600/ 11269 | consumed samples: 2201600 | consumed tokens: 4508876800 | elapsed time per iteration (s): 0.48 | learning rate: 4.425E-05 | global batch size: 256 | lm loss: 3.332501E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.627 | TFLOPs: 35.10 | +7: iteration 8610/ 11269 | consumed samples: 2204160 | consumed tokens: 4514119680 | elapsed time per iteration (s): 0.48 | learning rate: 4.407E-05 | global batch size: 256 | lm loss: 3.350783E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.134 | TFLOPs: 35.06 | +7: iteration 8620/ 11269 | consumed samples: 2206720 | consumed tokens: 4519362560 | elapsed time per iteration (s): 0.48 | learning rate: 4.390E-05 | global batch size: 256 | lm loss: 3.328754E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.111 | TFLOPs: 35.00 | +7: iteration 8630/ 11269 | consumed samples: 2209280 | consumed tokens: 4524605440 | elapsed time per iteration (s): 0.48 | learning rate: 4.373E-05 | global batch size: 256 | lm loss: 3.339239E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.707 | TFLOPs: 34.90 | +7: iteration 8640/ 11269 | consumed samples: 2211840 | consumed tokens: 4529848320 | elapsed time per iteration (s): 0.47 | learning rate: 4.356E-05 | global batch size: 256 | lm loss: 3.346037E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.203 | TFLOPs: 35.33 | +7: iteration 8650/ 11269 | consumed samples: 2214400 | consumed tokens: 4535091200 | elapsed time per iteration (s): 0.48 | learning rate: 4.339E-05 | global batch size: 256 | lm loss: 3.328410E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.374 | TFLOPs: 34.95 | +7: iteration 8660/ 11269 | consumed samples: 2216960 | consumed tokens: 4540334080 | elapsed time per iteration (s): 0.48 | learning rate: 4.322E-05 | global batch size: 256 | lm loss: 3.333214E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.774 | TFLOPs: 35.04 | +7: iteration 8670/ 11269 | consumed samples: 2219520 | consumed tokens: 4545576960 | elapsed time per iteration (s): 0.47 | learning rate: 4.305E-05 | global batch size: 256 | lm loss: 3.327511E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.293 | TFLOPs: 35.34 | +7: iteration 8680/ 11269 | consumed samples: 2222080 | consumed tokens: 4550819840 | elapsed time per iteration (s): 0.47 | learning rate: 4.288E-05 | global batch size: 256 | lm loss: 3.325333E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.935 | TFLOPs: 35.31 | +7: iteration 8690/ 11269 | consumed samples: 2224640 | consumed tokens: 4556062720 | elapsed time per iteration (s): 0.47 | learning rate: 4.271E-05 | global batch size: 256 | lm loss: 3.328397E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.859 | TFLOPs: 35.31 | +7: iteration 8700/ 11269 | consumed samples: 2227200 | consumed tokens: 4561305600 | elapsed time per iteration (s): 0.47 | learning rate: 4.254E-05 | global batch size: 256 | lm loss: 3.335538E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.595 | TFLOPs: 35.29 | +7: iteration 8710/ 11269 | consumed samples: 2229760 | consumed tokens: 4566548480 | elapsed time per iteration (s): 0.48 | learning rate: 4.237E-05 | global batch size: 256 | lm loss: 3.335431E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.943 | TFLOPs: 35.18 | +7: iteration 8720/ 11269 | consumed samples: 2232320 | consumed tokens: 4571791360 | elapsed time per iteration (s): 0.47 | learning rate: 4.221E-05 | global batch size: 256 | lm loss: 3.327683E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.992 | TFLOPs: 35.32 | +7: iteration 8730/ 11269 | consumed samples: 2234880 | consumed tokens: 4577034240 | elapsed time per iteration (s): 0.47 | learning rate: 4.204E-05 | global batch size: 256 | lm loss: 3.332487E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.147 | TFLOPs: 35.26 | +7: iteration 8740/ 11269 | consumed samples: 2237440 | consumed tokens: 4582277120 | elapsed time per iteration (s): 0.47 | learning rate: 4.188E-05 | global batch size: 256 | lm loss: 3.333670E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.724 | TFLOPs: 35.30 | +7: iteration 8750/ 11269 | consumed samples: 2240000 | consumed tokens: 4587520000 | elapsed time per iteration (s): 0.48 | learning rate: 4.171E-05 | global batch size: 256 | lm loss: 3.339067E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.573 | TFLOPs: 35.09 | +7: iteration 8760/ 11269 | consumed samples: 2242560 | consumed tokens: 4592762880 | elapsed time per iteration (s): 0.47 | learning rate: 4.154E-05 | global batch size: 256 | lm loss: 3.330478E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.943 | TFLOPs: 35.31 | +7: iteration 8770/ 11269 | consumed samples: 2245120 | consumed tokens: 4598005760 | elapsed time per iteration (s): 0.47 | learning rate: 4.138E-05 | global batch size: 256 | lm loss: 3.340259E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.858 | TFLOPs: 35.31 | +7: iteration 8780/ 11269 | consumed samples: 2247680 | consumed tokens: 4603248640 | elapsed time per iteration (s): 0.47 | learning rate: 4.122E-05 | global batch size: 256 | lm loss: 3.326702E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.819 | TFLOPs: 35.30 | +7: iteration 8790/ 11269 | consumed samples: 2250240 | consumed tokens: 4608491520 | elapsed time per iteration (s): 0.47 | learning rate: 4.105E-05 | global batch size: 256 | lm loss: 3.328143E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.728 | TFLOPs: 35.30 | +7: iteration 8800/ 11269 | consumed samples: 2252800 | consumed tokens: 4613734400 | elapsed time per iteration (s): 0.47 | learning rate: 4.089E-05 | global batch size: 256 | lm loss: 3.337477E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.882 | TFLOPs: 35.31 | +7: iteration 8810/ 11269 | consumed samples: 2255360 | consumed tokens: 4618977280 | elapsed time per iteration (s): 0.47 | learning rate: 4.073E-05 | global batch size: 256 | lm loss: 3.337276E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.109 | TFLOPs: 35.32 | +7: iteration 8820/ 11269 | consumed samples: 2257920 | consumed tokens: 4624220160 | elapsed time per iteration (s): 0.47 | learning rate: 4.057E-05 | global batch size: 256 | lm loss: 3.337083E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.852 | TFLOPs: 35.31 | +7: iteration 8830/ 11269 | consumed samples: 2260480 | consumed tokens: 4629463040 | elapsed time per iteration (s): 0.48 | learning rate: 4.041E-05 | global batch size: 256 | lm loss: 3.318769E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.161 | TFLOPs: 35.20 | +7: iteration 8840/ 11269 | consumed samples: 2263040 | consumed tokens: 4634705920 | elapsed time per iteration (s): 0.48 | learning rate: 4.025E-05 | global batch size: 256 | lm loss: 3.316468E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.804 | TFLOPs: 35.11 | +7: iteration 8850/ 11269 | consumed samples: 2265600 | consumed tokens: 4639948800 | elapsed time per iteration (s): 0.47 | learning rate: 4.009E-05 | global batch size: 256 | lm loss: 3.331725E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.082 | TFLOPs: 35.32 | +7: iteration 8860/ 11269 | consumed samples: 2268160 | consumed tokens: 4645191680 | elapsed time per iteration (s): 0.48 | learning rate: 3.993E-05 | global batch size: 256 | lm loss: 3.343082E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.697 | TFLOPs: 35.03 | +7: iteration 8870/ 11269 | consumed samples: 2270720 | consumed tokens: 4650434560 | elapsed time per iteration (s): 0.47 | learning rate: 3.977E-05 | global batch size: 256 | lm loss: 3.332667E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.962 | TFLOPs: 35.31 | +7: iteration 8880/ 11269 | consumed samples: 2273280 | consumed tokens: 4655677440 | elapsed time per iteration (s): 0.47 | learning rate: 3.961E-05 | global batch size: 256 | lm loss: 3.332137E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.761 | TFLOPs: 35.30 | +7: iteration 8890/ 11269 | consumed samples: 2275840 | consumed tokens: 4660920320 | elapsed time per iteration (s): 0.47 | learning rate: 3.945E-05 | global batch size: 256 | lm loss: 3.339101E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.953 | TFLOPs: 35.31 | +7: iteration 8900/ 11269 | consumed samples: 2278400 | consumed tokens: 4666163200 | elapsed time per iteration (s): 0.47 | learning rate: 3.930E-05 | global batch size: 256 | lm loss: 3.326251E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.815 | TFLOPs: 35.30 | +7: iteration 8910/ 11269 | consumed samples: 2280960 | consumed tokens: 4671406080 | elapsed time per iteration (s): 0.47 | learning rate: 3.914E-05 | global batch size: 256 | lm loss: 3.337200E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.160 | TFLOPs: 35.26 | +7: iteration 8920/ 11269 | consumed samples: 2283520 | consumed tokens: 4676648960 | elapsed time per iteration (s): 0.47 | learning rate: 3.898E-05 | global batch size: 256 | lm loss: 3.332147E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.476 | TFLOPs: 35.28 | +7: iteration 8930/ 11269 | consumed samples: 2286080 | consumed tokens: 4681891840 | elapsed time per iteration (s): 0.47 | learning rate: 3.883E-05 | global batch size: 256 | lm loss: 3.331721E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.498 | TFLOPs: 35.28 | +7: iteration 8940/ 11269 | consumed samples: 2288640 | consumed tokens: 4687134720 | elapsed time per iteration (s): 0.47 | learning rate: 3.867E-05 | global batch size: 256 | lm loss: 3.328350E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.681 | TFLOPs: 35.29 | +7: iteration 8950/ 11269 | consumed samples: 2291200 | consumed tokens: 4692377600 | elapsed time per iteration (s): 0.47 | learning rate: 3.852E-05 | global batch size: 256 | lm loss: 3.316887E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.582 | TFLOPs: 35.29 | +7: iteration 8960/ 11269 | consumed samples: 2293760 | consumed tokens: 4697620480 | elapsed time per iteration (s): 0.48 | learning rate: 3.836E-05 | global batch size: 256 | lm loss: 3.329145E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.130 | TFLOPs: 34.93 | +7: iteration 8970/ 11269 | consumed samples: 2296320 | consumed tokens: 4702863360 | elapsed time per iteration (s): 0.47 | learning rate: 3.821E-05 | global batch size: 256 | lm loss: 3.328048E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.220 | TFLOPs: 35.26 | +7: iteration 8980/ 11269 | consumed samples: 2298880 | consumed tokens: 4708106240 | elapsed time per iteration (s): 0.47 | learning rate: 3.806E-05 | global batch size: 256 | lm loss: 3.324206E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.362 | TFLOPs: 35.27 | +7: iteration 8990/ 11269 | consumed samples: 2301440 | consumed tokens: 4713349120 | elapsed time per iteration (s): 0.47 | learning rate: 3.791E-05 | global batch size: 256 | lm loss: 3.333245E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.430 | TFLOPs: 35.28 | +7: iteration 9000/ 11269 | consumed samples: 2304000 | consumed tokens: 4718592000 | elapsed time per iteration (s): 0.47 | learning rate: 3.776E-05 | global batch size: 256 | lm loss: 3.321058E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.735 | TFLOPs: 35.30 | +7: ----------------------------------------------------------------------------------------------- +7: validation loss at iteration 9000 | lm loss value: 3.392526E+00 | lm loss PPL: 2.974097E+01 | +7: ----------------------------------------------------------------------------------------------- +0: saving checkpoint at iteration 9000 to checkpoints_280m5b9400m +0: [2023-03-15 23:11:17,206] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step9000 is begin to save! +0: [2023-03-15 23:11:17,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:11:17,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:11:17,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:11:17,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:11:17,378] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:11:17,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:11:17,404] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:11:17,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:11:17,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:11:17,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:11:17,454] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:11:17,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:11:17,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:11:17,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:11:17,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:11:17,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:11:17,530] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:11:17,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:11:17,555] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:11:17,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:11:17,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:11:17,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:11:17,604] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:11:17,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:11:17,629] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:11:17,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:11:17,654] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:11:17,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:11:17,679] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:11:17,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:11:17,705] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:11:17,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:11:17,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:11:17,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:11:17,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:11:17,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:11:17,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:11:17,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:11:17,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:11:17,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:11:17,806] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step9000/mp_rank_00_model_states.pt +0: [2023-03-15 23:11:17,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/mp_rank_00_model_states.pt... +0: [2023-03-15 23:11:17,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/mp_rank_00_model_states.pt. +0: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:11:17,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:11:17,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:11:17,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:11:17,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:11:17,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:11:17,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 23:11:17,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 23:11:17,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 23:11:17,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 23:11:17,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +7: [2023-03-15 23:11:17,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +7: [2023-03-15 23:11:17,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +7: [2023-03-15 23:11:17,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +7: [2023-03-15 23:11:17,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:11:17,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 23:11:17,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +7: [2023-03-15 23:11:17,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:11:17,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 23:11:17,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +7: [2023-03-15 23:11:17,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:11:17,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 23:11:17,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +7: [2023-03-15 23:11:17,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:11:17,922] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 23:11:17,922] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +0: [2023-03-15 23:11:17,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:11:17,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:11:17,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 23:11:17,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 23:11:17,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +0: [2023-03-15 23:11:17,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +0: [2023-03-15 23:11:17,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:11:17,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 23:11:17,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +0: [2023-03-15 23:11:17,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:11:17,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 23:11:17,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +0: [2023-03-15 23:11:17,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:11:17,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 23:11:17,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:11:17,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +0: [2023-03-15 23:11:17,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:11:17,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 23:11:17,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +0: [2023-03-15 23:11:17,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:11:17,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 23:11:17,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:11:17,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 23:11:17,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 23:11:17,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 23:11:17,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-15 23:11:17,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 23:11:17,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +6: [2023-03-15 23:11:17,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +6: [2023-03-15 23:11:17,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 23:11:17,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:11:17,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 23:11:17,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 23:11:17,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 23:11:17,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:11:17,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 23:11:17,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 23:11:17,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +5: [2023-03-15 23:11:17,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +5: [2023-03-15 23:11:17,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:11:17,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 23:11:17,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 23:11:17,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 23:11:17,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 23:11:17,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 23:11:17,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 23:11:17,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 23:11:17,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +2: [2023-03-15 23:11:17,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +0: [2023-03-15 23:11:17,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 23:11:17,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:11:17,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 23:11:17,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 23:11:17,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 23:11:17,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +4: [2023-03-15 23:11:17,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 23:11:17,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 23:11:17,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +4: [2023-03-15 23:11:17,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +4: [2023-03-15 23:11:17,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:11:17,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-15 23:11:17,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 23:11:18,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 23:11:18,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 23:11:18,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 23:11:18,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +3: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +3: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:11:18,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 23:11:18,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +3: [2023-03-15 23:11:18,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 23:11:18,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +3: [2023-03-15 23:11:18,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 23:11:18,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 23:11:18,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 23:11:18,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 23:11:18,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +1: [2023-03-15 23:11:18,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +3: [2023-03-15 23:11:18,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step9000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +3: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +1: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +1: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +3: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +3: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +3: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +3: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +3: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +3: [2023-03-15 23:11:18,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! +0: successfully saved checkpoint at iteration 9000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 848.71 +7: iteration 9010/ 11269 | consumed samples: 2306560 | consumed tokens: 4723834880 | elapsed time per iteration (s): 0.57 | learning rate: 3.760E-05 | global batch size: 256 | lm loss: 3.325890E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 447.328 | TFLOPs: 29.26 | +7: iteration 9020/ 11269 | consumed samples: 2309120 | consumed tokens: 4729077760 | elapsed time per iteration (s): 0.47 | learning rate: 3.745E-05 | global batch size: 256 | lm loss: 3.316447E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.624 | TFLOPs: 35.42 | +7: iteration 9030/ 11269 | consumed samples: 2311680 | consumed tokens: 4734320640 | elapsed time per iteration (s): 0.47 | learning rate: 3.730E-05 | global batch size: 256 | lm loss: 3.320748E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.124 | TFLOPs: 35.39 | +7: iteration 9040/ 11269 | consumed samples: 2314240 | consumed tokens: 4739563520 | elapsed time per iteration (s): 0.47 | learning rate: 3.716E-05 | global batch size: 256 | lm loss: 3.325248E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.468 | TFLOPs: 35.35 | +7: iteration 9050/ 11269 | consumed samples: 2316800 | consumed tokens: 4744806400 | elapsed time per iteration (s): 0.47 | learning rate: 3.701E-05 | global batch size: 256 | lm loss: 3.322734E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.300 | TFLOPs: 35.34 | +7: iteration 9060/ 11269 | consumed samples: 2319360 | consumed tokens: 4750049280 | elapsed time per iteration (s): 0.47 | learning rate: 3.686E-05 | global batch size: 256 | lm loss: 3.338685E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.288 | TFLOPs: 35.33 | +7: iteration 9070/ 11269 | consumed samples: 2321920 | consumed tokens: 4755292160 | elapsed time per iteration (s): 0.47 | learning rate: 3.671E-05 | global batch size: 256 | lm loss: 3.330685E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.723 | TFLOPs: 35.30 | +7: iteration 9080/ 11269 | consumed samples: 2324480 | consumed tokens: 4760535040 | elapsed time per iteration (s): 0.47 | learning rate: 3.656E-05 | global batch size: 256 | lm loss: 3.333285E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.508 | TFLOPs: 35.28 | +7: iteration 9090/ 11269 | consumed samples: 2327040 | consumed tokens: 4765777920 | elapsed time per iteration (s): 0.47 | learning rate: 3.642E-05 | global batch size: 256 | lm loss: 3.328292E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.979 | TFLOPs: 35.31 | +7: iteration 9100/ 11269 | consumed samples: 2329600 | consumed tokens: 4771020800 | elapsed time per iteration (s): 0.47 | learning rate: 3.627E-05 | global batch size: 256 | lm loss: 3.320432E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.745 | TFLOPs: 35.30 | +7: iteration 9110/ 11269 | consumed samples: 2332160 | consumed tokens: 4776263680 | elapsed time per iteration (s): 0.47 | learning rate: 3.613E-05 | global batch size: 256 | lm loss: 3.325027E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.713 | TFLOPs: 35.30 | +7: iteration 9120/ 11269 | consumed samples: 2334720 | consumed tokens: 4781506560 | elapsed time per iteration (s): 0.47 | learning rate: 3.598E-05 | global batch size: 256 | lm loss: 3.321106E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.780 | TFLOPs: 35.30 | +7: iteration 9130/ 11269 | consumed samples: 2337280 | consumed tokens: 4786749440 | elapsed time per iteration (s): 0.47 | learning rate: 3.584E-05 | global batch size: 256 | lm loss: 3.317316E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.928 | TFLOPs: 35.31 | +7: iteration 9140/ 11269 | consumed samples: 2339840 | consumed tokens: 4791992320 | elapsed time per iteration (s): 0.47 | learning rate: 3.570E-05 | global batch size: 256 | lm loss: 3.326775E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.954 | TFLOPs: 35.31 | +7: iteration 9150/ 11269 | consumed samples: 2342400 | consumed tokens: 4797235200 | elapsed time per iteration (s): 0.47 | learning rate: 3.555E-05 | global batch size: 256 | lm loss: 3.335121E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.198 | TFLOPs: 35.33 | +7: iteration 9160/ 11269 | consumed samples: 2344960 | consumed tokens: 4802478080 | elapsed time per iteration (s): 0.47 | learning rate: 3.541E-05 | global batch size: 256 | lm loss: 3.328239E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.836 | TFLOPs: 35.31 | +7: iteration 9170/ 11269 | consumed samples: 2347520 | consumed tokens: 4807720960 | elapsed time per iteration (s): 0.47 | learning rate: 3.527E-05 | global batch size: 256 | lm loss: 3.339150E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.919 | TFLOPs: 35.31 | +7: iteration 9180/ 11269 | consumed samples: 2350080 | consumed tokens: 4812963840 | elapsed time per iteration (s): 0.47 | learning rate: 3.513E-05 | global batch size: 256 | lm loss: 3.332687E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.006 | TFLOPs: 35.32 | +7: iteration 9190/ 11269 | consumed samples: 2352640 | consumed tokens: 4818206720 | elapsed time per iteration (s): 0.47 | learning rate: 3.499E-05 | global batch size: 256 | lm loss: 3.323779E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.600 | TFLOPs: 35.29 | +7: iteration 9200/ 11269 | consumed samples: 2355200 | consumed tokens: 4823449600 | elapsed time per iteration (s): 0.47 | learning rate: 3.485E-05 | global batch size: 256 | lm loss: 3.335056E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.814 | TFLOPs: 35.30 | +7: iteration 9210/ 11269 | consumed samples: 2357760 | consumed tokens: 4828692480 | elapsed time per iteration (s): 0.47 | learning rate: 3.471E-05 | global batch size: 256 | lm loss: 3.330942E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.592 | TFLOPs: 35.29 | +7: iteration 9220/ 11269 | consumed samples: 2360320 | consumed tokens: 4833935360 | elapsed time per iteration (s): 0.48 | learning rate: 3.457E-05 | global batch size: 256 | lm loss: 3.322591E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.738 | TFLOPs: 35.10 | +7: iteration 9230/ 11269 | consumed samples: 2362880 | consumed tokens: 4839178240 | elapsed time per iteration (s): 0.47 | learning rate: 3.443E-05 | global batch size: 256 | lm loss: 3.307954E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.182 | TFLOPs: 35.33 | +7: iteration 9240/ 11269 | consumed samples: 2365440 | consumed tokens: 4844421120 | elapsed time per iteration (s): 0.47 | learning rate: 3.430E-05 | global batch size: 256 | lm loss: 3.319587E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.232 | TFLOPs: 35.33 | +7: iteration 9250/ 11269 | consumed samples: 2368000 | consumed tokens: 4849664000 | elapsed time per iteration (s): 0.47 | learning rate: 3.416E-05 | global batch size: 256 | lm loss: 3.321380E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.117 | TFLOPs: 35.32 | +7: iteration 9260/ 11269 | consumed samples: 2370560 | consumed tokens: 4854906880 | elapsed time per iteration (s): 0.48 | learning rate: 3.402E-05 | global batch size: 256 | lm loss: 3.331778E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.225 | TFLOPs: 35.07 | +7: iteration 9270/ 11269 | consumed samples: 2373120 | consumed tokens: 4860149760 | elapsed time per iteration (s): 0.47 | learning rate: 3.389E-05 | global batch size: 256 | lm loss: 3.316853E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.275 | TFLOPs: 35.33 | +7: iteration 9280/ 11269 | consumed samples: 2375680 | consumed tokens: 4865392640 | elapsed time per iteration (s): 0.47 | learning rate: 3.375E-05 | global batch size: 256 | lm loss: 3.312777E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.732 | TFLOPs: 35.30 | +7: iteration 9290/ 11269 | consumed samples: 2378240 | consumed tokens: 4870635520 | elapsed time per iteration (s): 0.48 | learning rate: 3.362E-05 | global batch size: 256 | lm loss: 3.320986E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.549 | TFLOPs: 35.22 | +7: iteration 9300/ 11269 | consumed samples: 2380800 | consumed tokens: 4875878400 | elapsed time per iteration (s): 0.47 | learning rate: 3.348E-05 | global batch size: 256 | lm loss: 3.325394E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.919 | TFLOPs: 35.31 | +7: iteration 9310/ 11269 | consumed samples: 2383360 | consumed tokens: 4881121280 | elapsed time per iteration (s): 0.48 | learning rate: 3.335E-05 | global batch size: 256 | lm loss: 3.328994E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.318 | TFLOPs: 35.01 | +7: iteration 9320/ 11269 | consumed samples: 2385920 | consumed tokens: 4886364160 | elapsed time per iteration (s): 0.47 | learning rate: 3.322E-05 | global batch size: 256 | lm loss: 3.314394E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.277 | TFLOPs: 35.33 | +7: iteration 9330/ 11269 | consumed samples: 2388480 | consumed tokens: 4891607040 | elapsed time per iteration (s): 0.47 | learning rate: 3.309E-05 | global batch size: 256 | lm loss: 3.320523E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.491 | TFLOPs: 35.35 | +7: iteration 9340/ 11269 | consumed samples: 2391040 | consumed tokens: 4896849920 | elapsed time per iteration (s): 0.47 | learning rate: 3.296E-05 | global batch size: 256 | lm loss: 3.312473E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.311 | TFLOPs: 35.34 | +7: iteration 9350/ 11269 | consumed samples: 2393600 | consumed tokens: 4902092800 | elapsed time per iteration (s): 0.47 | learning rate: 3.282E-05 | global batch size: 256 | lm loss: 3.311703E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.031 | TFLOPs: 35.32 | +7: iteration 9360/ 11269 | consumed samples: 2396160 | consumed tokens: 4907335680 | elapsed time per iteration (s): 0.47 | learning rate: 3.269E-05 | global batch size: 256 | lm loss: 3.318692E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.238 | TFLOPs: 35.33 | +7: iteration 9370/ 11269 | consumed samples: 2398720 | consumed tokens: 4912578560 | elapsed time per iteration (s): 0.47 | learning rate: 3.257E-05 | global batch size: 256 | lm loss: 3.315847E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.601 | TFLOPs: 35.29 | +7: iteration 9380/ 11269 | consumed samples: 2401280 | consumed tokens: 4917821440 | elapsed time per iteration (s): 0.48 | learning rate: 3.244E-05 | global batch size: 256 | lm loss: 3.315657E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.824 | TFLOPs: 35.04 | +7: iteration 9390/ 11269 | consumed samples: 2403840 | consumed tokens: 4923064320 | elapsed time per iteration (s): 0.47 | learning rate: 3.231E-05 | global batch size: 256 | lm loss: 3.325801E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.583 | TFLOPs: 35.29 | +7: iteration 9400/ 11269 | consumed samples: 2406400 | consumed tokens: 4928307200 | elapsed time per iteration (s): 0.47 | learning rate: 3.218E-05 | global batch size: 256 | lm loss: 3.320361E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.315 | TFLOPs: 35.34 | +7: iteration 9410/ 11269 | consumed samples: 2408960 | consumed tokens: 4933550080 | elapsed time per iteration (s): 0.47 | learning rate: 3.205E-05 | global batch size: 256 | lm loss: 3.319061E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.525 | TFLOPs: 35.35 | +7: iteration 9420/ 11269 | consumed samples: 2411520 | consumed tokens: 4938792960 | elapsed time per iteration (s): 0.47 | learning rate: 3.193E-05 | global batch size: 256 | lm loss: 3.311233E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.997 | TFLOPs: 35.32 | +7: iteration 9430/ 11269 | consumed samples: 2414080 | consumed tokens: 4944035840 | elapsed time per iteration (s): 0.48 | learning rate: 3.180E-05 | global batch size: 256 | lm loss: 3.318502E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.456 | TFLOPs: 35.15 | +7: iteration 9440/ 11269 | consumed samples: 2416640 | consumed tokens: 4949278720 | elapsed time per iteration (s): 0.48 | learning rate: 3.168E-05 | global batch size: 256 | lm loss: 3.324685E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 528.973 | TFLOPs: 34.59 | +7: iteration 9450/ 11269 | consumed samples: 2419200 | consumed tokens: 4954521600 | elapsed time per iteration (s): 0.48 | learning rate: 3.155E-05 | global batch size: 256 | lm loss: 3.323498E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 527.922 | TFLOPs: 34.53 | +7: iteration 9460/ 11269 | consumed samples: 2421760 | consumed tokens: 4959764480 | elapsed time per iteration (s): 0.48 | learning rate: 3.143E-05 | global batch size: 256 | lm loss: 3.330342E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 532.370 | TFLOPs: 34.82 | +7: iteration 9470/ 11269 | consumed samples: 2424320 | consumed tokens: 4965007360 | elapsed time per iteration (s): 0.48 | learning rate: 3.130E-05 | global batch size: 256 | lm loss: 3.315230E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 534.502 | TFLOPs: 34.96 | +7: iteration 9480/ 11269 | consumed samples: 2426880 | consumed tokens: 4970250240 | elapsed time per iteration (s): 0.48 | learning rate: 3.118E-05 | global batch size: 256 | lm loss: 3.321272E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.332 | TFLOPs: 35.08 | +7: iteration 9490/ 11269 | consumed samples: 2429440 | consumed tokens: 4975493120 | elapsed time per iteration (s): 0.48 | learning rate: 3.106E-05 | global batch size: 256 | lm loss: 3.335855E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.651 | TFLOPs: 35.23 | +7: iteration 9500/ 11269 | consumed samples: 2432000 | consumed tokens: 4980736000 | elapsed time per iteration (s): 0.47 | learning rate: 3.094E-05 | global batch size: 256 | lm loss: 3.311408E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.716 | TFLOPs: 35.30 | +7: iteration 9510/ 11269 | consumed samples: 2434560 | consumed tokens: 4985978880 | elapsed time per iteration (s): 0.47 | learning rate: 3.082E-05 | global batch size: 256 | lm loss: 3.313414E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.828 | TFLOPs: 35.30 | +7: iteration 9520/ 11269 | consumed samples: 2437120 | consumed tokens: 4991221760 | elapsed time per iteration (s): 0.47 | learning rate: 3.070E-05 | global batch size: 256 | lm loss: 3.314258E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.128 | TFLOPs: 35.32 | +7: iteration 9530/ 11269 | consumed samples: 2439680 | consumed tokens: 4996464640 | elapsed time per iteration (s): 0.47 | learning rate: 3.058E-05 | global batch size: 256 | lm loss: 3.315903E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.130 | TFLOPs: 35.32 | +7: iteration 9540/ 11269 | consumed samples: 2442240 | consumed tokens: 5001707520 | elapsed time per iteration (s): 0.47 | learning rate: 3.046E-05 | global batch size: 256 | lm loss: 3.314369E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.052 | TFLOPs: 35.32 | +7: iteration 9550/ 11269 | consumed samples: 2444800 | consumed tokens: 5006950400 | elapsed time per iteration (s): 0.47 | learning rate: 3.034E-05 | global batch size: 256 | lm loss: 3.319090E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.473 | TFLOPs: 35.35 | +7: iteration 9560/ 11269 | consumed samples: 2447360 | consumed tokens: 5012193280 | elapsed time per iteration (s): 0.47 | learning rate: 3.022E-05 | global batch size: 256 | lm loss: 3.328969E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.244 | TFLOPs: 35.33 | +7: iteration 9570/ 11269 | consumed samples: 2449920 | consumed tokens: 5017436160 | elapsed time per iteration (s): 0.47 | learning rate: 3.011E-05 | global batch size: 256 | lm loss: 3.335327E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.051 | TFLOPs: 35.32 | +7: iteration 9580/ 11269 | consumed samples: 2452480 | consumed tokens: 5022679040 | elapsed time per iteration (s): 0.47 | learning rate: 2.999E-05 | global batch size: 256 | lm loss: 3.310271E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.970 | TFLOPs: 35.31 | +7: iteration 9590/ 11269 | consumed samples: 2455040 | consumed tokens: 5027921920 | elapsed time per iteration (s): 0.47 | learning rate: 2.987E-05 | global batch size: 256 | lm loss: 3.323361E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.380 | TFLOPs: 35.34 | +7: iteration 9600/ 11269 | consumed samples: 2457600 | consumed tokens: 5033164800 | elapsed time per iteration (s): 0.47 | learning rate: 2.976E-05 | global batch size: 256 | lm loss: 3.317080E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.241 | TFLOPs: 35.33 | +7: iteration 9610/ 11269 | consumed samples: 2460160 | consumed tokens: 5038407680 | elapsed time per iteration (s): 0.47 | learning rate: 2.964E-05 | global batch size: 256 | lm loss: 3.323788E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.673 | TFLOPs: 35.29 | +7: iteration 9620/ 11269 | consumed samples: 2462720 | consumed tokens: 5043650560 | elapsed time per iteration (s): 0.47 | learning rate: 2.953E-05 | global batch size: 256 | lm loss: 3.322927E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.278 | TFLOPs: 35.33 | +7: iteration 9630/ 11269 | consumed samples: 2465280 | consumed tokens: 5048893440 | elapsed time per iteration (s): 0.47 | learning rate: 2.942E-05 | global batch size: 256 | lm loss: 3.315225E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.307 | TFLOPs: 35.34 | +7: iteration 9640/ 11269 | consumed samples: 2467840 | consumed tokens: 5054136320 | elapsed time per iteration (s): 0.47 | learning rate: 2.930E-05 | global batch size: 256 | lm loss: 3.308345E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.843 | TFLOPs: 35.31 | +7: iteration 9650/ 11269 | consumed samples: 2470400 | consumed tokens: 5059379200 | elapsed time per iteration (s): 0.47 | learning rate: 2.919E-05 | global batch size: 256 | lm loss: 3.323321E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.283 | TFLOPs: 35.33 | +7: iteration 9660/ 11269 | consumed samples: 2472960 | consumed tokens: 5064622080 | elapsed time per iteration (s): 0.47 | learning rate: 2.908E-05 | global batch size: 256 | lm loss: 3.310519E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.433 | TFLOPs: 35.34 | +7: iteration 9670/ 11269 | consumed samples: 2475520 | consumed tokens: 5069864960 | elapsed time per iteration (s): 0.47 | learning rate: 2.897E-05 | global batch size: 256 | lm loss: 3.305755E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.448 | TFLOPs: 35.35 | +7: iteration 9680/ 11269 | consumed samples: 2478080 | consumed tokens: 5075107840 | elapsed time per iteration (s): 0.47 | learning rate: 2.886E-05 | global batch size: 256 | lm loss: 3.312355E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.416 | TFLOPs: 35.34 | +7: iteration 9690/ 11269 | consumed samples: 2480640 | consumed tokens: 5080350720 | elapsed time per iteration (s): 0.47 | learning rate: 2.875E-05 | global batch size: 256 | lm loss: 3.316987E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.462 | TFLOPs: 35.35 | +7: iteration 9700/ 11269 | consumed samples: 2483200 | consumed tokens: 5085593600 | elapsed time per iteration (s): 0.47 | learning rate: 2.864E-05 | global batch size: 256 | lm loss: 3.317546E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.439 | TFLOPs: 35.34 | +7: iteration 9710/ 11269 | consumed samples: 2485760 | consumed tokens: 5090836480 | elapsed time per iteration (s): 0.47 | learning rate: 2.853E-05 | global batch size: 256 | lm loss: 3.310954E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.673 | TFLOPs: 35.36 | +7: iteration 9720/ 11269 | consumed samples: 2488320 | consumed tokens: 5096079360 | elapsed time per iteration (s): 0.47 | learning rate: 2.843E-05 | global batch size: 256 | lm loss: 3.320952E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.869 | TFLOPs: 35.31 | +7: iteration 9730/ 11269 | consumed samples: 2490880 | consumed tokens: 5101322240 | elapsed time per iteration (s): 0.47 | learning rate: 2.832E-05 | global batch size: 256 | lm loss: 3.312064E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.605 | TFLOPs: 35.36 | +7: iteration 9740/ 11269 | consumed samples: 2493440 | consumed tokens: 5106565120 | elapsed time per iteration (s): 0.47 | learning rate: 2.821E-05 | global batch size: 256 | lm loss: 3.307042E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.662 | TFLOPs: 35.36 | +7: iteration 9750/ 11269 | consumed samples: 2496000 | consumed tokens: 5111808000 | elapsed time per iteration (s): 0.47 | learning rate: 2.811E-05 | global batch size: 256 | lm loss: 3.302326E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.397 | TFLOPs: 35.34 | +7: iteration 9760/ 11269 | consumed samples: 2498560 | consumed tokens: 5117050880 | elapsed time per iteration (s): 0.47 | learning rate: 2.800E-05 | global batch size: 256 | lm loss: 3.320468E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.902 | TFLOPs: 35.31 | +7: iteration 9770/ 11269 | consumed samples: 2501120 | consumed tokens: 5122293760 | elapsed time per iteration (s): 0.47 | learning rate: 2.790E-05 | global batch size: 256 | lm loss: 3.316857E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.034 | TFLOPs: 35.32 | +7: iteration 9780/ 11269 | consumed samples: 2503680 | consumed tokens: 5127536640 | elapsed time per iteration (s): 0.47 | learning rate: 2.780E-05 | global batch size: 256 | lm loss: 3.309514E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.282 | TFLOPs: 35.33 | +7: iteration 9790/ 11269 | consumed samples: 2506240 | consumed tokens: 5132779520 | elapsed time per iteration (s): 0.47 | learning rate: 2.769E-05 | global batch size: 256 | lm loss: 3.310839E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.393 | TFLOPs: 35.34 | +7: iteration 9800/ 11269 | consumed samples: 2508800 | consumed tokens: 5138022400 | elapsed time per iteration (s): 0.47 | learning rate: 2.759E-05 | global batch size: 256 | lm loss: 3.309772E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.368 | TFLOPs: 35.34 | +7: iteration 9810/ 11269 | consumed samples: 2511360 | consumed tokens: 5143265280 | elapsed time per iteration (s): 0.47 | learning rate: 2.749E-05 | global batch size: 256 | lm loss: 3.312812E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.203 | TFLOPs: 35.33 | +7: iteration 9820/ 11269 | consumed samples: 2513920 | consumed tokens: 5148508160 | elapsed time per iteration (s): 0.47 | learning rate: 2.739E-05 | global batch size: 256 | lm loss: 3.314819E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.346 | TFLOPs: 35.34 | +7: iteration 9830/ 11269 | consumed samples: 2516480 | consumed tokens: 5153751040 | elapsed time per iteration (s): 0.47 | learning rate: 2.729E-05 | global batch size: 256 | lm loss: 3.305862E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.196 | TFLOPs: 35.33 | +7: iteration 9840/ 11269 | consumed samples: 2519040 | consumed tokens: 5158993920 | elapsed time per iteration (s): 0.47 | learning rate: 2.719E-05 | global batch size: 256 | lm loss: 3.300914E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.095 | TFLOPs: 35.32 | +7: iteration 9850/ 11269 | consumed samples: 2521600 | consumed tokens: 5164236800 | elapsed time per iteration (s): 0.47 | learning rate: 2.709E-05 | global batch size: 256 | lm loss: 3.316385E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.340 | TFLOPs: 35.34 | +7: iteration 9860/ 11269 | consumed samples: 2524160 | consumed tokens: 5169479680 | elapsed time per iteration (s): 0.47 | learning rate: 2.699E-05 | global batch size: 256 | lm loss: 3.305336E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.227 | TFLOPs: 35.33 | +7: iteration 9870/ 11269 | consumed samples: 2526720 | consumed tokens: 5174722560 | elapsed time per iteration (s): 0.47 | learning rate: 2.689E-05 | global batch size: 256 | lm loss: 3.317137E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.412 | TFLOPs: 35.34 | +7: iteration 9880/ 11269 | consumed samples: 2529280 | consumed tokens: 5179965440 | elapsed time per iteration (s): 0.47 | learning rate: 2.680E-05 | global batch size: 256 | lm loss: 3.328242E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.533 | TFLOPs: 35.35 | +7: iteration 9890/ 11269 | consumed samples: 2531840 | consumed tokens: 5185208320 | elapsed time per iteration (s): 0.47 | learning rate: 2.670E-05 | global batch size: 256 | lm loss: 3.309160E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.465 | TFLOPs: 35.35 | +7: iteration 9900/ 11269 | consumed samples: 2534400 | consumed tokens: 5190451200 | elapsed time per iteration (s): 0.47 | learning rate: 2.661E-05 | global batch size: 256 | lm loss: 3.303590E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.417 | TFLOPs: 35.34 | +7: iteration 9910/ 11269 | consumed samples: 2536960 | consumed tokens: 5195694080 | elapsed time per iteration (s): 0.47 | learning rate: 2.651E-05 | global batch size: 256 | lm loss: 3.306400E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.239 | TFLOPs: 35.33 | +7: iteration 9920/ 11269 | consumed samples: 2539520 | consumed tokens: 5200936960 | elapsed time per iteration (s): 0.47 | learning rate: 2.642E-05 | global batch size: 256 | lm loss: 3.303472E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.690 | TFLOPs: 35.36 | +7: iteration 9930/ 11269 | consumed samples: 2542080 | consumed tokens: 5206179840 | elapsed time per iteration (s): 0.47 | learning rate: 2.632E-05 | global batch size: 256 | lm loss: 3.315769E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.885 | TFLOPs: 35.37 | +7: iteration 9940/ 11269 | consumed samples: 2544640 | consumed tokens: 5211422720 | elapsed time per iteration (s): 0.47 | learning rate: 2.623E-05 | global batch size: 256 | lm loss: 3.318786E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.731 | TFLOPs: 35.36 | +7: iteration 9950/ 11269 | consumed samples: 2547200 | consumed tokens: 5216665600 | elapsed time per iteration (s): 0.47 | learning rate: 2.614E-05 | global batch size: 256 | lm loss: 3.295103E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.573 | TFLOPs: 35.35 | +7: iteration 9960/ 11269 | consumed samples: 2549760 | consumed tokens: 5221908480 | elapsed time per iteration (s): 0.47 | learning rate: 2.605E-05 | global batch size: 256 | lm loss: 3.300507E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.679 | TFLOPs: 35.36 | +7: iteration 9970/ 11269 | consumed samples: 2552320 | consumed tokens: 5227151360 | elapsed time per iteration (s): 0.47 | learning rate: 2.595E-05 | global batch size: 256 | lm loss: 3.303519E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.823 | TFLOPs: 35.37 | +7: iteration 9980/ 11269 | consumed samples: 2554880 | consumed tokens: 5232394240 | elapsed time per iteration (s): 0.47 | learning rate: 2.586E-05 | global batch size: 256 | lm loss: 3.304668E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.830 | TFLOPs: 35.37 | +7: iteration 9990/ 11269 | consumed samples: 2557440 | consumed tokens: 5237637120 | elapsed time per iteration (s): 0.47 | learning rate: 2.577E-05 | global batch size: 256 | lm loss: 3.317606E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.852 | TFLOPs: 35.37 | +0: [2023-03-15 23:19:12,427] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=0, lr=[2.568592118488235e-05, 2.568592118488235e-05, 2.568592118488235e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +7: iteration 10000/ 11269 | consumed samples: 2560000 | consumed tokens: 5242880000 | elapsed time per iteration (s): 0.47 | learning rate: 2.569E-05 | global batch size: 256 | lm loss: 3.307291E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.728 | TFLOPs: 35.36 | +0: steps: 10000 loss: 3.3268 iter time (s): 0.475 samples/sec: 538.820 +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 10000 | lm loss value: 3.340298E+00 | lm loss PPL: 2.822753E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 10000 to checkpoints_280m5b9400m +0: [2023-03-15 23:19:12,605] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step10000 is begin to save! +0: [2023-03-15 23:19:12,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:19:12,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:19:12,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:19:12,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:19:12,754] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:19:12,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:19:12,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:19:12,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:19:12,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:19:12,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:19:12,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:19:12,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:19:12,853] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:19:12,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:19:12,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:19:12,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:19:12,904] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:19:12,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:19:12,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:19:12,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:19:12,954] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:19:12,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:19:12,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:19:13,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:19:13,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:19:13,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:19:13,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:19:13,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:19:13,052] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:19:13,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:19:13,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:19:13,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:19:13,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:19:13,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:19:13,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:19:13,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:19:13,150] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:19:13,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:19:13,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:19:13,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:19:13,177] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step10000/mp_rank_00_model_states.pt +0: [2023-03-15 23:19:13,177] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/mp_rank_00_model_states.pt... +0: [2023-03-15 23:19:13,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/mp_rank_00_model_states.pt. +0: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:19:13,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:19:13,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:19:13,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 23:19:13,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-15 23:19:13,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:19:13,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 23:19:13,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-15 23:19:13,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:19:13,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 23:19:13,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-15 23:19:13,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:19:13,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 23:19:13,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-15 23:19:13,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 23:19:13,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-15 23:19:13,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:19:13,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-15 23:19:13,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-15 23:19:13,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:19:13,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 23:19:13,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-15 23:19:13,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:19:13,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:19:13,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-15 23:19:13,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +6: [2023-03-15 23:19:13,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-15 23:19:13,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 23:19:13,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +2: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:19:13,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-15 23:19:13,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:19:13,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 23:19:13,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-15 23:19:13,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:19:13,265] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 23:19:13,265] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:19:13,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:19:13,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:19:13,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +1: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-15 23:19:13,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +2: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:19:13,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 23:19:13,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-15 23:19:13,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 23:19:13,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-15 23:19:13,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:19:13,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-15 23:19:13,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-15 23:19:13,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:19:13,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 23:19:13,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-15 23:19:13,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:19:13,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 23:19:13,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-15 23:19:13,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:19:13,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-15 23:19:13,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:19:13,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:19:13,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +7: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:19:13,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 23:19:13,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 23:19:13,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 23:19:13,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +0: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-15 23:19:13,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:19:13,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 23:19:13,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-15 23:19:13,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:19:13,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 23:19:13,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-15 23:19:13,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:19:13,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 23:19:13,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-15 23:19:13,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:19:13,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 23:19:13,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-15 23:19:13,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:19:13,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 23:19:13,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-15 23:19:13,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:19:13,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 23:19:13,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-15 23:19:13,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:19:13,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 23:19:13,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-15 23:19:13,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:19:13,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:19:13,272] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 23:19:13,272] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 23:19:13,272] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-15 23:19:13,272] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +6: [2023-03-15 23:19:13,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:19:13,272] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 23:19:13,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-15 23:19:13,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:19:13,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 23:19:13,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +2: [2023-03-15 23:19:13,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:19:13,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 23:19:13,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +1: [2023-03-15 23:19:13,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:19:13,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 23:19:13,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-15 23:19:13,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:19:13,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-15 23:19:13,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-15 23:19:13,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:19:13,272] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 23:19:13,272] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-15 23:19:13,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:19:13,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 23:19:13,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-15 23:19:13,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:19:13,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:19:13,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 23:19:13,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 23:19:13,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +5: [2023-03-15 23:19:13,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-15 23:19:13,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:19:13,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:19:13,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 23:19:13,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-15 23:19:13,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:19:13,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 23:19:13,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-15 23:19:13,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:19:13,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 23:19:13,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-15 23:19:13,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:19:13,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 23:19:13,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +7: [2023-03-15 23:19:13,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:19:13,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 23:19:13,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-15 23:19:13,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:19:13,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 23:19:13,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +4: [2023-03-15 23:19:13,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:19:13,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-15 23:19:13,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-15 23:19:13,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 23:19:13,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-15 23:19:13,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 23:19:13,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-15 23:19:13,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,302] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 23:19:13,302] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +3: [2023-03-15 23:19:13,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:19:13,302] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 23:19:13,302] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: [2023-03-15 23:19:13,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 23:19:13,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! +0: successfully saved checkpoint at iteration 10000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 751.66 +7: iteration 10010/ 11269 | consumed samples: 2562560 | consumed tokens: 5248122880 | elapsed time per iteration (s): 0.57 | learning rate: 2.560E-05 | global batch size: 256 | lm loss: 3.310117E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 451.045 | TFLOPs: 29.50 | +7: iteration 10020/ 11269 | consumed samples: 2565120 | consumed tokens: 5253365760 | elapsed time per iteration (s): 0.47 | learning rate: 2.551E-05 | global batch size: 256 | lm loss: 3.308623E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.977 | TFLOPs: 35.45 | +7: iteration 10030/ 11269 | consumed samples: 2567680 | consumed tokens: 5258608640 | elapsed time per iteration (s): 0.47 | learning rate: 2.542E-05 | global batch size: 256 | lm loss: 3.313069E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.703 | TFLOPs: 35.43 | +7: iteration 10040/ 11269 | consumed samples: 2570240 | consumed tokens: 5263851520 | elapsed time per iteration (s): 0.47 | learning rate: 2.534E-05 | global batch size: 256 | lm loss: 3.312185E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.599 | TFLOPs: 35.42 | +7: iteration 10050/ 11269 | consumed samples: 2572800 | consumed tokens: 5269094400 | elapsed time per iteration (s): 0.47 | learning rate: 2.525E-05 | global batch size: 256 | lm loss: 3.316109E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.373 | TFLOPs: 35.41 | +7: iteration 10060/ 11269 | consumed samples: 2575360 | consumed tokens: 5274337280 | elapsed time per iteration (s): 0.47 | learning rate: 2.517E-05 | global batch size: 256 | lm loss: 3.312875E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.582 | TFLOPs: 35.42 | +7: iteration 10070/ 11269 | consumed samples: 2577920 | consumed tokens: 5279580160 | elapsed time per iteration (s): 0.47 | learning rate: 2.508E-05 | global batch size: 256 | lm loss: 3.325964E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.485 | TFLOPs: 35.41 | +7: iteration 10080/ 11269 | consumed samples: 2580480 | consumed tokens: 5284823040 | elapsed time per iteration (s): 0.47 | learning rate: 2.500E-05 | global batch size: 256 | lm loss: 3.313785E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.009 | TFLOPs: 35.38 | +7: iteration 10090/ 11269 | consumed samples: 2583040 | consumed tokens: 5290065920 | elapsed time per iteration (s): 0.47 | learning rate: 2.492E-05 | global batch size: 256 | lm loss: 3.325539E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.296 | TFLOPs: 35.40 | +7: iteration 10100/ 11269 | consumed samples: 2585600 | consumed tokens: 5295308800 | elapsed time per iteration (s): 0.48 | learning rate: 2.483E-05 | global batch size: 256 | lm loss: 3.312859E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.861 | TFLOPs: 35.11 | +7: iteration 10110/ 11269 | consumed samples: 2588160 | consumed tokens: 5300551680 | elapsed time per iteration (s): 0.47 | learning rate: 2.475E-05 | global batch size: 256 | lm loss: 3.296546E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.080 | TFLOPs: 35.39 | +7: iteration 10120/ 11269 | consumed samples: 2590720 | consumed tokens: 5305794560 | elapsed time per iteration (s): 0.47 | learning rate: 2.467E-05 | global batch size: 256 | lm loss: 3.320692E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.816 | TFLOPs: 35.37 | +7: iteration 10130/ 11269 | consumed samples: 2593280 | consumed tokens: 5311037440 | elapsed time per iteration (s): 0.47 | learning rate: 2.459E-05 | global batch size: 256 | lm loss: 3.289388E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.113 | TFLOPs: 35.39 | +7: iteration 10140/ 11269 | consumed samples: 2595840 | consumed tokens: 5316280320 | elapsed time per iteration (s): 0.47 | learning rate: 2.451E-05 | global batch size: 256 | lm loss: 3.302771E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.161 | TFLOPs: 35.39 | +7: iteration 10150/ 11269 | consumed samples: 2598400 | consumed tokens: 5321523200 | elapsed time per iteration (s): 0.47 | learning rate: 2.443E-05 | global batch size: 256 | lm loss: 3.295461E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.790 | TFLOPs: 35.37 | +7: iteration 10160/ 11269 | consumed samples: 2600960 | consumed tokens: 5326766080 | elapsed time per iteration (s): 0.47 | learning rate: 2.435E-05 | global batch size: 256 | lm loss: 3.311135E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.957 | TFLOPs: 35.38 | +7: iteration 10170/ 11269 | consumed samples: 2603520 | consumed tokens: 5332008960 | elapsed time per iteration (s): 0.47 | learning rate: 2.428E-05 | global batch size: 256 | lm loss: 3.303082E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.908 | TFLOPs: 35.38 | +7: iteration 10180/ 11269 | consumed samples: 2606080 | consumed tokens: 5337251840 | elapsed time per iteration (s): 0.47 | learning rate: 2.420E-05 | global batch size: 256 | lm loss: 3.313264E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.111 | TFLOPs: 35.39 | +7: iteration 10190/ 11269 | consumed samples: 2608640 | consumed tokens: 5342494720 | elapsed time per iteration (s): 0.47 | learning rate: 2.412E-05 | global batch size: 256 | lm loss: 3.295168E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.840 | TFLOPs: 35.37 | +7: iteration 10200/ 11269 | consumed samples: 2611200 | consumed tokens: 5347737600 | elapsed time per iteration (s): 0.47 | learning rate: 2.405E-05 | global batch size: 256 | lm loss: 3.304145E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.866 | TFLOPs: 35.37 | +7: iteration 10210/ 11269 | consumed samples: 2613760 | consumed tokens: 5352980480 | elapsed time per iteration (s): 0.47 | learning rate: 2.397E-05 | global batch size: 256 | lm loss: 3.318631E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.068 | TFLOPs: 35.39 | +7: iteration 10220/ 11269 | consumed samples: 2616320 | consumed tokens: 5358223360 | elapsed time per iteration (s): 0.47 | learning rate: 2.390E-05 | global batch size: 256 | lm loss: 3.312860E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.738 | TFLOPs: 35.36 | +7: iteration 10230/ 11269 | consumed samples: 2618880 | consumed tokens: 5363466240 | elapsed time per iteration (s): 0.47 | learning rate: 2.383E-05 | global batch size: 256 | lm loss: 3.295499E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.909 | TFLOPs: 35.38 | +7: iteration 10240/ 11269 | consumed samples: 2621440 | consumed tokens: 5368709120 | elapsed time per iteration (s): 0.47 | learning rate: 2.375E-05 | global batch size: 256 | lm loss: 3.308125E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.851 | TFLOPs: 35.37 | +7: iteration 10250/ 11269 | consumed samples: 2624000 | consumed tokens: 5373952000 | elapsed time per iteration (s): 0.47 | learning rate: 2.368E-05 | global batch size: 256 | lm loss: 3.305819E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.688 | TFLOPs: 35.36 | +7: iteration 10260/ 11269 | consumed samples: 2626560 | consumed tokens: 5379194880 | elapsed time per iteration (s): 0.47 | learning rate: 2.361E-05 | global batch size: 256 | lm loss: 3.300026E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.991 | TFLOPs: 35.32 | +7: iteration 10270/ 11269 | consumed samples: 2629120 | consumed tokens: 5384437760 | elapsed time per iteration (s): 0.48 | learning rate: 2.354E-05 | global batch size: 256 | lm loss: 3.294808E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 537.516 | TFLOPs: 35.15 | +7: iteration 10280/ 11269 | consumed samples: 2631680 | consumed tokens: 5389680640 | elapsed time per iteration (s): 0.47 | learning rate: 2.347E-05 | global batch size: 256 | lm loss: 3.304939E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.777 | TFLOPs: 35.37 | +7: iteration 10290/ 11269 | consumed samples: 2634240 | consumed tokens: 5394923520 | elapsed time per iteration (s): 0.47 | learning rate: 2.340E-05 | global batch size: 256 | lm loss: 3.295316E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.950 | TFLOPs: 35.38 | +7: iteration 10300/ 11269 | consumed samples: 2636800 | consumed tokens: 5400166400 | elapsed time per iteration (s): 0.47 | learning rate: 2.333E-05 | global batch size: 256 | lm loss: 3.313411E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.730 | TFLOPs: 35.36 | +7: iteration 10310/ 11269 | consumed samples: 2639360 | consumed tokens: 5405409280 | elapsed time per iteration (s): 0.47 | learning rate: 2.326E-05 | global batch size: 256 | lm loss: 3.303754E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.174 | TFLOPs: 35.39 | +7: iteration 10320/ 11269 | consumed samples: 2641920 | consumed tokens: 5410652160 | elapsed time per iteration (s): 0.47 | learning rate: 2.319E-05 | global batch size: 256 | lm loss: 3.307568E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.560 | TFLOPs: 35.35 | +7: iteration 10330/ 11269 | consumed samples: 2644480 | consumed tokens: 5415895040 | elapsed time per iteration (s): 0.47 | learning rate: 2.313E-05 | global batch size: 256 | lm loss: 3.315317E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.862 | TFLOPs: 35.37 | +7: iteration 10340/ 11269 | consumed samples: 2647040 | consumed tokens: 5421137920 | elapsed time per iteration (s): 0.47 | learning rate: 2.306E-05 | global batch size: 256 | lm loss: 3.295657E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.915 | TFLOPs: 35.38 | +7: iteration 10350/ 11269 | consumed samples: 2649600 | consumed tokens: 5426380800 | elapsed time per iteration (s): 0.47 | learning rate: 2.300E-05 | global batch size: 256 | lm loss: 3.286523E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.042 | TFLOPs: 35.38 | +7: iteration 10360/ 11269 | consumed samples: 2652160 | consumed tokens: 5431623680 | elapsed time per iteration (s): 0.47 | learning rate: 2.293E-05 | global batch size: 256 | lm loss: 3.302799E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.518 | TFLOPs: 35.35 | +7: iteration 10370/ 11269 | consumed samples: 2654720 | consumed tokens: 5436866560 | elapsed time per iteration (s): 0.47 | learning rate: 2.287E-05 | global batch size: 256 | lm loss: 3.309127E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.785 | TFLOPs: 35.37 | +7: iteration 10380/ 11269 | consumed samples: 2657280 | consumed tokens: 5442109440 | elapsed time per iteration (s): 0.47 | learning rate: 2.281E-05 | global batch size: 256 | lm loss: 3.296850E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.727 | TFLOPs: 35.36 | +7: iteration 10390/ 11269 | consumed samples: 2659840 | consumed tokens: 5447352320 | elapsed time per iteration (s): 0.47 | learning rate: 2.274E-05 | global batch size: 256 | lm loss: 3.283238E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.796 | TFLOPs: 35.37 | +7: iteration 10400/ 11269 | consumed samples: 2662400 | consumed tokens: 5452595200 | elapsed time per iteration (s): 0.47 | learning rate: 2.268E-05 | global batch size: 256 | lm loss: 3.308766E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.795 | TFLOPs: 35.37 | +7: iteration 10410/ 11269 | consumed samples: 2664960 | consumed tokens: 5457838080 | elapsed time per iteration (s): 0.47 | learning rate: 2.262E-05 | global batch size: 256 | lm loss: 3.303588E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.498 | TFLOPs: 35.35 | +7: iteration 10420/ 11269 | consumed samples: 2667520 | consumed tokens: 5463080960 | elapsed time per iteration (s): 0.47 | learning rate: 2.256E-05 | global batch size: 256 | lm loss: 3.298882E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.800 | TFLOPs: 35.37 | +7: iteration 10430/ 11269 | consumed samples: 2670080 | consumed tokens: 5468323840 | elapsed time per iteration (s): 0.47 | learning rate: 2.250E-05 | global batch size: 256 | lm loss: 3.303219E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.172 | TFLOPs: 35.39 | +7: iteration 10440/ 11269 | consumed samples: 2672640 | consumed tokens: 5473566720 | elapsed time per iteration (s): 0.47 | learning rate: 2.244E-05 | global batch size: 256 | lm loss: 3.306059E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.618 | TFLOPs: 35.36 | +7: iteration 10450/ 11269 | consumed samples: 2675200 | consumed tokens: 5478809600 | elapsed time per iteration (s): 0.47 | learning rate: 2.238E-05 | global batch size: 256 | lm loss: 3.300653E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.954 | TFLOPs: 35.38 | +7: iteration 10460/ 11269 | consumed samples: 2677760 | consumed tokens: 5484052480 | elapsed time per iteration (s): 0.47 | learning rate: 2.233E-05 | global batch size: 256 | lm loss: 3.296422E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.595 | TFLOPs: 35.35 | +7: iteration 10470/ 11269 | consumed samples: 2680320 | consumed tokens: 5489295360 | elapsed time per iteration (s): 0.47 | learning rate: 2.227E-05 | global batch size: 256 | lm loss: 3.305302E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.604 | TFLOPs: 35.36 | +7: iteration 10480/ 11269 | consumed samples: 2682880 | consumed tokens: 5494538240 | elapsed time per iteration (s): 0.47 | learning rate: 2.221E-05 | global batch size: 256 | lm loss: 3.297302E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.583 | TFLOPs: 35.35 | +7: iteration 10490/ 11269 | consumed samples: 2685440 | consumed tokens: 5499781120 | elapsed time per iteration (s): 0.47 | learning rate: 2.216E-05 | global batch size: 256 | lm loss: 3.300962E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.658 | TFLOPs: 35.36 | +7: iteration 10500/ 11269 | consumed samples: 2688000 | consumed tokens: 5505024000 | elapsed time per iteration (s): 0.47 | learning rate: 2.210E-05 | global batch size: 256 | lm loss: 3.302254E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.729 | TFLOPs: 35.36 | +7: iteration 10510/ 11269 | consumed samples: 2690560 | consumed tokens: 5510266880 | elapsed time per iteration (s): 0.47 | learning rate: 2.205E-05 | global batch size: 256 | lm loss: 3.291319E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.934 | TFLOPs: 35.38 | +7: iteration 10520/ 11269 | consumed samples: 2693120 | consumed tokens: 5515509760 | elapsed time per iteration (s): 0.47 | learning rate: 2.199E-05 | global batch size: 256 | lm loss: 3.289344E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.003 | TFLOPs: 35.38 | +7: iteration 10530/ 11269 | consumed samples: 2695680 | consumed tokens: 5520752640 | elapsed time per iteration (s): 0.47 | learning rate: 2.194E-05 | global batch size: 256 | lm loss: 3.306117E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.999 | TFLOPs: 35.38 | +7: iteration 10540/ 11269 | consumed samples: 2698240 | consumed tokens: 5525995520 | elapsed time per iteration (s): 0.47 | learning rate: 2.189E-05 | global batch size: 256 | lm loss: 3.315158E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.766 | TFLOPs: 35.37 | +7: iteration 10550/ 11269 | consumed samples: 2700800 | consumed tokens: 5531238400 | elapsed time per iteration (s): 0.47 | learning rate: 2.184E-05 | global batch size: 256 | lm loss: 3.286473E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.952 | TFLOPs: 35.38 | +7: iteration 10560/ 11269 | consumed samples: 2703360 | consumed tokens: 5536481280 | elapsed time per iteration (s): 0.47 | learning rate: 2.179E-05 | global batch size: 256 | lm loss: 3.297543E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.377 | TFLOPs: 35.34 | +7: iteration 10570/ 11269 | consumed samples: 2705920 | consumed tokens: 5541724160 | elapsed time per iteration (s): 0.47 | learning rate: 2.174E-05 | global batch size: 256 | lm loss: 3.311323E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.616 | TFLOPs: 35.36 | +7: iteration 10580/ 11269 | consumed samples: 2708480 | consumed tokens: 5546967040 | elapsed time per iteration (s): 0.47 | learning rate: 2.169E-05 | global batch size: 256 | lm loss: 3.293797E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.848 | TFLOPs: 35.37 | +7: iteration 10590/ 11269 | consumed samples: 2711040 | consumed tokens: 5552209920 | elapsed time per iteration (s): 0.47 | learning rate: 2.164E-05 | global batch size: 256 | lm loss: 3.308405E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.585 | TFLOPs: 35.35 | +7: iteration 10600/ 11269 | consumed samples: 2713600 | consumed tokens: 5557452800 | elapsed time per iteration (s): 0.47 | learning rate: 2.159E-05 | global batch size: 256 | lm loss: 3.293490E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.964 | TFLOPs: 35.38 | +7: iteration 10610/ 11269 | consumed samples: 2716160 | consumed tokens: 5562695680 | elapsed time per iteration (s): 0.47 | learning rate: 2.155E-05 | global batch size: 256 | lm loss: 3.298980E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.970 | TFLOPs: 35.38 | +7: iteration 10620/ 11269 | consumed samples: 2718720 | consumed tokens: 5567938560 | elapsed time per iteration (s): 0.47 | learning rate: 2.150E-05 | global batch size: 256 | lm loss: 3.305222E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.959 | TFLOPs: 35.38 | +7: iteration 10630/ 11269 | consumed samples: 2721280 | consumed tokens: 5573181440 | elapsed time per iteration (s): 0.47 | learning rate: 2.145E-05 | global batch size: 256 | lm loss: 3.303561E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.665 | TFLOPs: 35.36 | +7: iteration 10640/ 11269 | consumed samples: 2723840 | consumed tokens: 5578424320 | elapsed time per iteration (s): 0.47 | learning rate: 2.141E-05 | global batch size: 256 | lm loss: 3.306315E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.537 | TFLOPs: 35.35 | +7: iteration 10650/ 11269 | consumed samples: 2726400 | consumed tokens: 5583667200 | elapsed time per iteration (s): 0.47 | learning rate: 2.136E-05 | global batch size: 256 | lm loss: 3.311818E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.724 | TFLOPs: 35.36 | +7: iteration 10660/ 11269 | consumed samples: 2728960 | consumed tokens: 5588910080 | elapsed time per iteration (s): 0.47 | learning rate: 2.132E-05 | global batch size: 256 | lm loss: 3.306777E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.417 | TFLOPs: 35.41 | +7: iteration 10670/ 11269 | consumed samples: 2731520 | consumed tokens: 5594152960 | elapsed time per iteration (s): 0.47 | learning rate: 2.128E-05 | global batch size: 256 | lm loss: 3.292141E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.352 | TFLOPs: 35.34 | +7: iteration 10680/ 11269 | consumed samples: 2734080 | consumed tokens: 5599395840 | elapsed time per iteration (s): 0.48 | learning rate: 2.124E-05 | global batch size: 256 | lm loss: 3.285120E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 536.454 | TFLOPs: 35.08 | +7: iteration 10690/ 11269 | consumed samples: 2736640 | consumed tokens: 5604638720 | elapsed time per iteration (s): 0.47 | learning rate: 2.119E-05 | global batch size: 256 | lm loss: 3.290388E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.660 | TFLOPs: 35.36 | +7: iteration 10700/ 11269 | consumed samples: 2739200 | consumed tokens: 5609881600 | elapsed time per iteration (s): 0.47 | learning rate: 2.115E-05 | global batch size: 256 | lm loss: 3.279569E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.553 | TFLOPs: 35.35 | +7: iteration 10710/ 11269 | consumed samples: 2741760 | consumed tokens: 5615124480 | elapsed time per iteration (s): 0.47 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 3.295605E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.507 | TFLOPs: 35.35 | +7: iteration 10720/ 11269 | consumed samples: 2744320 | consumed tokens: 5620367360 | elapsed time per iteration (s): 0.47 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 3.292328E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.655 | TFLOPs: 35.36 | +7: iteration 10730/ 11269 | consumed samples: 2746880 | consumed tokens: 5625610240 | elapsed time per iteration (s): 0.48 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 3.289630E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.159 | TFLOPs: 35.20 | +7: iteration 10740/ 11269 | consumed samples: 2749440 | consumed tokens: 5630853120 | elapsed time per iteration (s): 0.47 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 3.291000E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.674 | TFLOPs: 35.36 | +7: iteration 10750/ 11269 | consumed samples: 2752000 | consumed tokens: 5636096000 | elapsed time per iteration (s): 0.47 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 3.287538E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.613 | TFLOPs: 35.36 | +7: iteration 10760/ 11269 | consumed samples: 2754560 | consumed tokens: 5641338880 | elapsed time per iteration (s): 0.47 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 3.280148E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.394 | TFLOPs: 35.34 | +7: iteration 10770/ 11269 | consumed samples: 2757120 | consumed tokens: 5646581760 | elapsed time per iteration (s): 0.47 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 3.293938E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.559 | TFLOPs: 35.35 | +7: iteration 10780/ 11269 | consumed samples: 2759680 | consumed tokens: 5651824640 | elapsed time per iteration (s): 0.47 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 3.286836E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.629 | TFLOPs: 35.36 | +7: iteration 10790/ 11269 | consumed samples: 2762240 | consumed tokens: 5657067520 | elapsed time per iteration (s): 0.47 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 3.289599E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.556 | TFLOPs: 35.35 | +7: iteration 10800/ 11269 | consumed samples: 2764800 | consumed tokens: 5662310400 | elapsed time per iteration (s): 0.47 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 3.293338E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.885 | TFLOPs: 35.37 | +7: iteration 10810/ 11269 | consumed samples: 2767360 | consumed tokens: 5667553280 | elapsed time per iteration (s): 0.47 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 3.289097E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.139 | TFLOPs: 35.39 | +7: iteration 10820/ 11269 | consumed samples: 2769920 | consumed tokens: 5672796160 | elapsed time per iteration (s): 0.47 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 3.302684E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.839 | TFLOPs: 35.37 | +7: iteration 10830/ 11269 | consumed samples: 2772480 | consumed tokens: 5678039040 | elapsed time per iteration (s): 0.47 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 3.291480E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.923 | TFLOPs: 35.38 | +7: iteration 10840/ 11269 | consumed samples: 2775040 | consumed tokens: 5683281920 | elapsed time per iteration (s): 0.47 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 3.288485E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.843 | TFLOPs: 35.37 | +7: iteration 10850/ 11269 | consumed samples: 2777600 | consumed tokens: 5688524800 | elapsed time per iteration (s): 0.47 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 3.274915E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.642 | TFLOPs: 35.36 | +7: iteration 10860/ 11269 | consumed samples: 2780160 | consumed tokens: 5693767680 | elapsed time per iteration (s): 0.47 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 3.298071E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.565 | TFLOPs: 35.35 | +7: iteration 10870/ 11269 | consumed samples: 2782720 | consumed tokens: 5699010560 | elapsed time per iteration (s): 0.47 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 3.281837E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.730 | TFLOPs: 35.36 | +7: iteration 10880/ 11269 | consumed samples: 2785280 | consumed tokens: 5704253440 | elapsed time per iteration (s): 0.47 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 3.288931E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.722 | TFLOPs: 35.36 | +7: iteration 10890/ 11269 | consumed samples: 2787840 | consumed tokens: 5709496320 | elapsed time per iteration (s): 0.47 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 3.294603E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.957 | TFLOPs: 35.38 | +7: iteration 10900/ 11269 | consumed samples: 2790400 | consumed tokens: 5714739200 | elapsed time per iteration (s): 0.47 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 3.293974E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.275 | TFLOPs: 35.33 | +7: iteration 10910/ 11269 | consumed samples: 2792960 | consumed tokens: 5719982080 | elapsed time per iteration (s): 0.47 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 3.290797E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.913 | TFLOPs: 35.38 | +7: iteration 10920/ 11269 | consumed samples: 2795520 | consumed tokens: 5725224960 | elapsed time per iteration (s): 0.47 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 3.275294E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.667 | TFLOPs: 35.36 | +7: iteration 10930/ 11269 | consumed samples: 2798080 | consumed tokens: 5730467840 | elapsed time per iteration (s): 0.47 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 3.280841E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.583 | TFLOPs: 35.35 | +7: iteration 10940/ 11269 | consumed samples: 2800640 | consumed tokens: 5735710720 | elapsed time per iteration (s): 0.47 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 3.302108E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.547 | TFLOPs: 35.35 | +7: iteration 10950/ 11269 | consumed samples: 2803200 | consumed tokens: 5740953600 | elapsed time per iteration (s): 0.47 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 3.282794E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.658 | TFLOPs: 35.36 | +7: iteration 10960/ 11269 | consumed samples: 2805760 | consumed tokens: 5746196480 | elapsed time per iteration (s): 0.47 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 3.298049E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.865 | TFLOPs: 35.37 | +7: iteration 10970/ 11269 | consumed samples: 2808320 | consumed tokens: 5751439360 | elapsed time per iteration (s): 0.47 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 3.299276E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.550 | TFLOPs: 35.35 | +7: iteration 10980/ 11269 | consumed samples: 2810880 | consumed tokens: 5756682240 | elapsed time per iteration (s): 0.47 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 3.288021E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.291 | TFLOPs: 35.33 | +7: iteration 10990/ 11269 | consumed samples: 2813440 | consumed tokens: 5761925120 | elapsed time per iteration (s): 0.47 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 3.284876E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.381 | TFLOPs: 35.34 | +7: iteration 11000/ 11269 | consumed samples: 2816000 | consumed tokens: 5767168000 | elapsed time per iteration (s): 0.47 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 3.287982E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.451 | TFLOPs: 35.35 | +7: ------------------------------------------------------------------------------------------------ +7: validation loss at iteration 11000 | lm loss value: 3.441182E+00 | lm loss PPL: 3.122385E+01 | +7: ------------------------------------------------------------------------------------------------ +0: saving checkpoint at iteration 11000 to checkpoints_280m5b9400m +0: [2023-03-15 23:27:07,013] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step11000 is begin to save! +0: [2023-03-15 23:27:07,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:27:07,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:27:07,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:27:07,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:27:07,157] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:27:07,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:27:07,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:27:07,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:27:07,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:27:07,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:27:07,230] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:27:07,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:27:07,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:27:07,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:27:07,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:27:07,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:27:07,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:27:07,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:27:07,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:27:07,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:27:07,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:27:07,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:27:07,376] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:27:07,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:27:07,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:27:07,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:27:07,427] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:27:07,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:27:07,451] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:27:07,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:27:07,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:27:07,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:27:07,500] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:27:07,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:27:07,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:27:07,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:27:07,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:27:07,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:27:07,573] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:27:07,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:27:07,575] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step11000/mp_rank_00_model_states.pt +0: [2023-03-15 23:27:07,575] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/mp_rank_00_model_states.pt... +0: [2023-03-15 23:27:07,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/mp_rank_00_model_states.pt. +0: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:27:07,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:27:07,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:27:07,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:27:07,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:27:07,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +5: [2023-03-15 23:27:07,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +2: [2023-03-15 23:27:07,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +5: [2023-03-15 23:27:07,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +7: [2023-03-15 23:27:07,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:27:07,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 23:27:07,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +5: [2023-03-15 23:27:07,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:27:07,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 23:27:07,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +6: [2023-03-15 23:27:07,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:27:07,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-15 23:27:07,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +5: [2023-03-15 23:27:07,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:27:07,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 23:27:07,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +3: [2023-03-15 23:27:07,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:27:07,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:27:07,659] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 23:27:07,659] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +6: [2023-03-15 23:27:07,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:27:07,659] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 23:27:07,659] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +1: [2023-03-15 23:27:07,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:27:07,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 23:27:07,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +5: [2023-03-15 23:27:07,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:27:07,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 23:27:07,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +7: [2023-03-15 23:27:07,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:27:07,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 23:27:07,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +1: [2023-03-15 23:27:07,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:27:07,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-15 23:27:07,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:27:07,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +1: [2023-03-15 23:27:07,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-15 23:27:07,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +6: [2023-03-15 23:27:07,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:27:07,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 23:27:07,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +6: [2023-03-15 23:27:07,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:27:07,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +5: [2023-03-15 23:27:07,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:27:07,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +5: [2023-03-15 23:27:07,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 23:27:07,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +0: [2023-03-15 23:27:07,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:27:07,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 23:27:07,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +0: [2023-03-15 23:27:07,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:27:07,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 23:27:07,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +0: [2023-03-15 23:27:07,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:27:07,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 23:27:07,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +0: [2023-03-15 23:27:07,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:27:07,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 23:27:07,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +6: [2023-03-15 23:27:07,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:27:07,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 23:27:07,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +6: [2023-03-15 23:27:07,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:27:07,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 23:27:07,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +1: [2023-03-15 23:27:07,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:27:07,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:27:07,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 23:27:07,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-15 23:27:07,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +1: [2023-03-15 23:27:07,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +6: [2023-03-15 23:27:07,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:27:07,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 23:27:07,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +5: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:27:07,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 23:27:07,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +5: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +1: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:27:07,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 23:27:07,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +1: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +7: [2023-03-15 23:27:07,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:27:07,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:27:07,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 23:27:07,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +0: [2023-03-15 23:27:07,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:27:07,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:27:07,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:27:07,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +7: [2023-03-15 23:27:07,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +0: [2023-03-15 23:27:07,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 23:27:07,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 23:27:07,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 23:27:07,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +0: [2023-03-15 23:27:07,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +0: [2023-03-15 23:27:07,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +5: [2023-03-15 23:27:07,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:27:07,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 23:27:07,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +7: [2023-03-15 23:27:07,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:27:07,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 23:27:07,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +6: [2023-03-15 23:27:07,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:27:07,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 23:27:07,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +4: [2023-03-15 23:27:07,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:27:07,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 23:27:07,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +3: [2023-03-15 23:27:07,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:27:07,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +4: [2023-03-15 23:27:07,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +3: [2023-03-15 23:27:07,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +4: [2023-03-15 23:27:07,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +3: [2023-03-15 23:27:07,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:27:07,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:27:07,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +4: [2023-03-15 23:27:07,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +3: [2023-03-15 23:27:07,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +4: [2023-03-15 23:27:07,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +3: [2023-03-15 23:27:07,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:27:07,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:27:07,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:27:07,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +3: [2023-03-15 23:27:07,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +4: [2023-03-15 23:27:07,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +3: [2023-03-15 23:27:07,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +4: [2023-03-15 23:27:07,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:27:07,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +4: [2023-03-15 23:27:07,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +3: [2023-03-15 23:27:07,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +4: [2023-03-15 23:27:07,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +4: [2023-03-15 23:27:07,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:27:07,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-15 23:27:07,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +4: [2023-03-15 23:27:07,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:27:07,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 23:27:07,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +4: [2023-03-15 23:27:07,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:27:07,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +4: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:27:07,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +2: [2023-03-15 23:27:07,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:27:07,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +2: [2023-03-15 23:27:07,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +2: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:27:07,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 23:27:07,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +2: [2023-03-15 23:27:07,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +2: [2023-03-15 23:27:07,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:27:07,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 23:27:07,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +2: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:27:07,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +2: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:27:07,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +2: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:27:07,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +3: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:27:07,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 23:27:07,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +3: [2023-03-15 23:27:07,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:27:07,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 23:27:07,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +3: [2023-03-15 23:27:07,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:27:07,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 23:27:07,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +0: [2023-03-15 23:27:07,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 23:27:07,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +7: [2023-03-15 23:27:07,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:27:07,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 23:27:07,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +7: [2023-03-15 23:27:07,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:27:07,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +7: [2023-03-15 23:27:07,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +7: [2023-03-15 23:27:07,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:27:07,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 23:27:07,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! +0: successfully saved checkpoint at iteration 11000 to checkpoints_280m5b9400m +7: time (ms) | save-checkpoint: 706.26 +7: iteration 11010/ 11269 | consumed samples: 2818560 | consumed tokens: 5772410880 | elapsed time per iteration (s): 0.56 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 3.290878E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 457.014 | TFLOPs: 29.89 | +7: iteration 11020/ 11269 | consumed samples: 2821120 | consumed tokens: 5777653760 | elapsed time per iteration (s): 0.47 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 3.295529E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.055 | TFLOPs: 35.45 | +7: iteration 11030/ 11269 | consumed samples: 2823680 | consumed tokens: 5782896640 | elapsed time per iteration (s): 0.47 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 3.291371E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.203 | TFLOPs: 35.39 | +7: iteration 11040/ 11269 | consumed samples: 2826240 | consumed tokens: 5788139520 | elapsed time per iteration (s): 0.47 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 3.295230E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.232 | TFLOPs: 35.40 | +7: iteration 11050/ 11269 | consumed samples: 2828800 | consumed tokens: 5793382400 | elapsed time per iteration (s): 0.47 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 3.276016E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.159 | TFLOPs: 35.39 | +7: iteration 11060/ 11269 | consumed samples: 2831360 | consumed tokens: 5798625280 | elapsed time per iteration (s): 0.47 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 3.300319E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.176 | TFLOPs: 35.26 | +7: iteration 11070/ 11269 | consumed samples: 2833920 | consumed tokens: 5803868160 | elapsed time per iteration (s): 0.47 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 3.303376E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.022 | TFLOPs: 35.38 | +7: iteration 11080/ 11269 | consumed samples: 2836480 | consumed tokens: 5809111040 | elapsed time per iteration (s): 0.47 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 3.290928E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.267 | TFLOPs: 35.40 | +7: iteration 11090/ 11269 | consumed samples: 2839040 | consumed tokens: 5814353920 | elapsed time per iteration (s): 0.47 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 3.285753E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.041 | TFLOPs: 35.38 | +7: iteration 11100/ 11269 | consumed samples: 2841600 | consumed tokens: 5819596800 | elapsed time per iteration (s): 0.47 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 3.289551E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.470 | TFLOPs: 35.35 | +7: iteration 11110/ 11269 | consumed samples: 2844160 | consumed tokens: 5824839680 | elapsed time per iteration (s): 0.47 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 3.287623E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.974 | TFLOPs: 35.38 | +7: iteration 11120/ 11269 | consumed samples: 2846720 | consumed tokens: 5830082560 | elapsed time per iteration (s): 0.47 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 3.320606E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.801 | TFLOPs: 35.37 | +7: iteration 11130/ 11269 | consumed samples: 2849280 | consumed tokens: 5835325440 | elapsed time per iteration (s): 0.47 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 3.293619E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.704 | TFLOPs: 35.36 | +7: iteration 11140/ 11269 | consumed samples: 2851840 | consumed tokens: 5840568320 | elapsed time per iteration (s): 0.47 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 3.289527E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.700 | TFLOPs: 35.36 | +7: iteration 11150/ 11269 | consumed samples: 2854400 | consumed tokens: 5845811200 | elapsed time per iteration (s): 0.48 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 3.280003E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.265 | TFLOPs: 35.20 | +7: iteration 11160/ 11269 | consumed samples: 2856960 | consumed tokens: 5851054080 | elapsed time per iteration (s): 0.47 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 3.285721E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.821 | TFLOPs: 35.37 | +7: iteration 11170/ 11269 | consumed samples: 2859520 | consumed tokens: 5856296960 | elapsed time per iteration (s): 0.47 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 3.283740E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.779 | TFLOPs: 35.37 | +7: iteration 11180/ 11269 | consumed samples: 2862080 | consumed tokens: 5861539840 | elapsed time per iteration (s): 0.47 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 3.289019E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.591 | TFLOPs: 35.35 | +7: iteration 11190/ 11269 | consumed samples: 2864640 | consumed tokens: 5866782720 | elapsed time per iteration (s): 0.47 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 3.292329E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.760 | TFLOPs: 35.37 | +7: iteration 11200/ 11269 | consumed samples: 2867200 | consumed tokens: 5872025600 | elapsed time per iteration (s): 0.47 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 3.297464E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.514 | TFLOPs: 35.35 | +7: iteration 11210/ 11269 | consumed samples: 2869760 | consumed tokens: 5877268480 | elapsed time per iteration (s): 0.47 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 3.283976E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.638 | TFLOPs: 35.36 | +7: iteration 11220/ 11269 | consumed samples: 2872320 | consumed tokens: 5882511360 | elapsed time per iteration (s): 0.47 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 3.293830E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.567 | TFLOPs: 35.35 | +7: iteration 11230/ 11269 | consumed samples: 2874880 | consumed tokens: 5887754240 | elapsed time per iteration (s): 0.47 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 3.297632E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.161 | TFLOPs: 35.33 | +7: iteration 11240/ 11269 | consumed samples: 2877440 | consumed tokens: 5892997120 | elapsed time per iteration (s): 0.47 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 3.290126E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.695 | TFLOPs: 35.36 | +7: iteration 11250/ 11269 | consumed samples: 2880000 | consumed tokens: 5898240000 | elapsed time per iteration (s): 0.47 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 3.289125E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.887 | TFLOPs: 35.37 | +7: iteration 11260/ 11269 | consumed samples: 2882560 | consumed tokens: 5903482880 | elapsed time per iteration (s): 0.47 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 3.292947E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.668 | TFLOPs: 35.36 | +0: [after training is done] datetime: 2023-03-15 23:29:15 +0: saving checkpoint at iteration 11269 to checkpoints_280m5b9400m +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.325924E+00 | lm loss PPL: 2.782471E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +0: [2023-03-15 23:29:15,246] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step11269 is begin to save! +0: [2023-03-15 23:29:15,249] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:29:15,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:29:15,363] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:29:15,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:29:15,389] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:29:15,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:29:15,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:29:15,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:29:15,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:29:15,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:29:15,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:29:15,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:29:15,486] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:29:15,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:29:15,510] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:29:15,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:29:15,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:29:15,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:29:15,558] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:29:15,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:29:15,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:29:15,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:29:15,607] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:29:15,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:29:15,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:29:15,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:29:15,655] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:29:15,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:29:15,679] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:29:15,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:29:15,703] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:29:15,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:29:15,727] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:29:15,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:29:15,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:29:15,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:29:15,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:29:15,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:29:15,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:29:15,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:29:15,801] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt +0: [2023-03-15 23:29:15,801] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:29:15,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:29:15,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:29:15,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:29:15,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-15 23:29:15,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +0: [2023-03-15 23:29:15,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:29:15,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +5: [2023-03-15 23:29:15,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:29:15,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt +5: [2023-03-15 23:29:15,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +7: [2023-03-15 23:29:15,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:29:15,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt +7: [2023-03-15 23:29:15,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +0: [2023-03-15 23:29:15,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:29:15,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +0: [2023-03-15 23:29:15,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:29:15,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +0: [2023-03-15 23:29:15,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:29:15,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +5: [2023-03-15 23:29:15,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +7: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:29:15,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt +7: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +2: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +1: [2023-03-15 23:29:15,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:29:15,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-15 23:29:15,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +7: [2023-03-15 23:29:15,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +7: [2023-03-15 23:29:15,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt +1: [2023-03-15 23:29:15,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +7: [2023-03-15 23:29:15,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt +7: [2023-03-15 23:29:15,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +7: [2023-03-15 23:29:15,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +4: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:29:15,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt +4: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +4: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:29:15,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt +4: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +3: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt +2: [2023-03-15 23:29:15,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +7: [2023-03-15 23:29:15,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:29:15,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:29:15,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt +7: [2023-03-15 23:29:15,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt +7: [2023-03-15 23:29:15,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +7: [2023-03-15 23:29:15,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +0: [2023-03-15 23:29:15,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:29:15,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +2: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt +2: [2023-03-15 23:29:15,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt +2: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +2: [2023-03-15 23:29:15,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt +2: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +2: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +4: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:29:15,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt +4: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +4: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:29:15,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt +4: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:29:15,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +4: [2023-03-15 23:29:15,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt +4: [2023-03-15 23:29:15,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt +4: [2023-03-15 23:29:15,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +4: [2023-03-15 23:29:15,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +4: [2023-03-15 23:29:15,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:29:15,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt +4: [2023-03-15 23:29:15,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +4: [2023-03-15 23:29:15,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:29:15,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt +4: [2023-03-15 23:29:15,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +7: [2023-03-15 23:29:15,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:29:15,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt +7: [2023-03-15 23:29:15,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +7: [2023-03-15 23:29:15,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:29:15,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt +7: [2023-03-15 23:29:15,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +2: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt +2: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +2: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:29:15,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt +2: [2023-03-15 23:29:15,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt +2: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +2: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +6: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:29:15,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt +6: [2023-03-15 23:29:15,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt +6: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +6: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +6: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:29:15,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt +6: [2023-03-15 23:29:15,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt +6: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +6: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +6: [2023-03-15 23:29:15,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:29:15,902] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt +6: [2023-03-15 23:29:15,902] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +0: [2023-03-15 23:29:15,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:29:15,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,902] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-15 23:29:15,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,902] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +1: [2023-03-15 23:29:15,902] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,903] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,903] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +1: [2023-03-15 23:29:15,903] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +1: [2023-03-15 23:29:15,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,903] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-15 23:29:15,903] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +1: [2023-03-15 23:29:15,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,903] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-15 23:29:15,903] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +3: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:29:15,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt +3: [2023-03-15 23:29:15,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt +3: [2023-03-15 23:29:15,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt +3: [2023-03-15 23:29:15,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt +3: [2023-03-15 23:29:15,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt +3: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +3: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +3: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +3: [2023-03-15 23:29:15,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +3: [2023-03-15 23:29:15,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +3: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:29:15,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt +3: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +3: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:29:15,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt +3: [2023-03-15 23:29:15,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +3: [2023-03-15 23:29:15,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:29:15,903] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt +3: [2023-03-15 23:29:15,903] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +6: [2023-03-15 23:29:15,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:29:15,906] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt +6: [2023-03-15 23:29:15,906] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +6: [2023-03-15 23:29:15,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:29:15,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:29:15,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt +6: [2023-03-15 23:29:15,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt +6: [2023-03-15 23:29:15,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +6: [2023-03-15 23:29:15,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +5: [2023-03-15 23:29:15,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt +5: [2023-03-15 23:29:15,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +5: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:29:15,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt +5: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +5: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:29:15,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt +5: [2023-03-15 23:29:15,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +5: [2023-03-15 23:29:15,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:29:15,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt +5: [2023-03-15 23:29:15,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +5: [2023-03-15 23:29:15,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:29:15,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt +5: [2023-03-15 23:29:15,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +5: [2023-03-15 23:29:15,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:29:15,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt +5: [2023-03-15 23:29:15,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +5: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:29:15,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt +5: [2023-03-15 23:29:15,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +1: [2023-03-15 23:29:15,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:29:15,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-15 23:29:15,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +0: [2023-03-15 23:29:15,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-15 23:29:15,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11269 is ready now! +0: successfully saved checkpoint at iteration 11269 to checkpoints_280m5b9400m +END 3318400: Wed 15 Mar 2023 11:29:24 PM EET diff --git a/280m5b9400m/3318675.err b/280m5b9400m/3318675.err new file mode 100644 index 0000000000000000000000000000000000000000..911e9e97414565e8e8c666bab202a7dad07d88e8 --- /dev/null +++ b/280m5b9400m/3318675.err @@ -0,0 +1,1121 @@ +0: 2023-03-15 23:29:59.831568: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:29:59.831569: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:29:59.831579: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:29:59.831578: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:29:59.831566: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:29:59.831582: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:29:59.831585: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-15 23:29:59.831574: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:29:59.894797: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:29:59.894802: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:29:59.894796: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:29:59.894804: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:29:59.894811: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:29:59.894806: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:29:59.894808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-15 23:29:59.894806: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: 2023-03-15 23:29:59.894929: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:29:59.894944: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:29:59.894941: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:29:59.894952: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:29:59.894953: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:29:59.894950: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:29:59.894958: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +6: 2023-03-15 23:29:59.894963: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:29:59.923141: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:29:59.923146: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:29:59.923165: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:29:59.923172: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:29:59.923159: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:29:59.923170: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:29:59.923178: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +2: 2023-03-15 23:29:59.923158: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:29:59.960716: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:29:59.960717: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:29:59.960707: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:29:59.960715: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:29:59.960720: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:29:59.960712: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:29:59.960711: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +3: 2023-03-15 23:29:59.960712: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:30:00.037069: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:30:00.037078: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:30:00.037080: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:30:00.037071: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:30:00.037072: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:30:00.037078: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:30:00.037082: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +7: 2023-03-15 23:30:00.037080: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:30:00.075210: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:30:00.075219: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:30:00.075224: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:30:00.075213: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:30:00.075213: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:30:00.075218: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:30:00.075233: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +5: 2023-03-15 23:30:00.075221: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:30:00.099074: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:30:00.099082: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:30:00.099086: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:30:00.099082: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:30:00.099094: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:30:00.099081: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:30:00.099076: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:30:00.099088: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +4: 2023-03-15 23:30:01.755096: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:01.755093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:01.755100: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:01.755103: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:01.755104: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:01.755104: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:01.755107: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:01.755099: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:01.755499: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:30:01.755500: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:30:01.755503: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:30:01.755508: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:30:01.755507: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:30:01.755507: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:30:01.755509: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:30:01.755513: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:30:01.757779: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:01.757784: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:01.757784: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:01.757785: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:01.757785: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:01.757791: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:01.757791: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:01.757794: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:01.758171: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:30:01.758175: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:30:01.758180: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:30:01.758181: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:30:01.758183: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:30:01.758185: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-15 23:30:01.758186: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:30:01.758157: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 2023-03-15 23:30:01.758189: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:01.758157: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:01.758168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:01.758165: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:01.758169: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:01.758170: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:01.758163: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:01.758164: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:01.758592: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:30:01.758596: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:30:01.758597: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:30:01.758598: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:30:01.758605: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:30:01.758605: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:30:01.758607: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +3: 2023-03-15 23:30:01.758609: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:30:01.760341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:01.760341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:01.760335: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:01.760335: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:01.760353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:01.760354: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:01.760346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:01.760345: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:01.760792: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:30:01.760793: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:30:01.760797: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:30:01.760800: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:30:01.760801: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:30:01.760800: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:30:01.760803: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-15 23:30:01.760806: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:30:01.764007: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:01.764052: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-15 23:30:01.764007: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:01.764053: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-15 23:30:01.764014: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:01.764057: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-15 23:30:01.764015: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:01.764052: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-15 23:30:01.764021: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:01.764061: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-15 23:30:01.764021: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:01.764061: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-15 23:30:01.764023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:01.764062: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-15 23:30:01.764018: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:01.764228: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:30:01.764233: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:30:01.764234: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:30:01.764234: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:30:01.764065: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +6: 2023-03-15 23:30:01.764238: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:30:01.764238: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:30:01.764240: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +6: 2023-03-15 23:30:01.764241: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:01.764437: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:30:01.764444: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:30:01.764443: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:30:01.764446: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:30:01.764447: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:30:01.764448: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:30:01.764447: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +2: 2023-03-15 23:30:01.764451: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:30:01.896457: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:01.896466: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:01.896472: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:01.896467: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:01.896462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:01.896468: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:01.896472: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:01.896478: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:01.896904: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:30:01.896906: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:30:01.896909: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:30:01.896912: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:30:01.896913: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:30:01.896915: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:30:01.896917: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +5: 2023-03-15 23:30:01.896921: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:30:01.943489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:01.943494: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:01.943491: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:01.943498: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:01.943497: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:01.943497: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:01.943497: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:01.943501: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:01.943902: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:30:01.943906: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:30:01.943909: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:30:01.943915: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:30:01.943917: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:30:01.943919: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:30:01.943918: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +7: 2023-03-15 23:30:01.943924: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +4: 2023-03-15 23:30:05.091132: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.091130: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.091139: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.091141: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.091144: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.091147: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.091150: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.091156: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.093209: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.093214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.093211: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.093224: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:30:05.093216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.093215: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.093220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.093228: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:30:05.093232: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:30:05.093228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.093237: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:30:05.093238: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:30:05.093239: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:30:05.093246: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +4: 2023-03-15 23:30:05.093274: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +4: 2023-03-15 23:30:05.093288: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:30:05.285223: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.285220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.285228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.285231: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.285240: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.285239: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.285250: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.285252: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.287212: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.287213: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.287214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.287210: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.287210: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.287218: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.300556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-15 23:30:05.300589: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.300566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-15 23:30:05.300708: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: 2023-03-15 23:30:05.300595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.300567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.300601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:05.300704: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.300562: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.300604: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.300782: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-15 23:30:05.300714: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.300570: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.300599: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.300790: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-15 23:30:05.300723: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.300573: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.300606: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.300797: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-15 23:30:05.300723: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.300575: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.300607: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.300801: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-15 23:30:05.300730: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.300588: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.300609: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.300802: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-15 23:30:05.300729: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.300808: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-15 23:30:05.300736: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.300820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.300824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.302594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.302596: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.302597: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.302600: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.302598: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.302599: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.302601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.302604: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +2: 2023-03-15 23:30:05.302608: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:30:05.302608: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:30:05.302612: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:30:05.302613: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:30:05.302615: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:30:05.302616: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:30:05.302618: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +2: 2023-03-15 23:30:05.302619: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:30:05.303134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:30:05.303066: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.303135: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:30:05.303065: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.303137: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:30:05.303067: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.303139: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:30:05.303067: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.303141: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:30:05.303068: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.303148: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.303148: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:30:05.303070: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-15 23:30:05.303146: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:05.303082: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-15 23:30:05.303082: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:30:05.303083: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:30:05.303087: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:30:05.303147: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-15 23:30:05.303086: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:30:05.303086: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.303161: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:30:05.303161: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:30:05.303104: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-15 23:30:05.303163: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-15 23:30:05.303169: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.303170: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:30:05.303110: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: 2023-03-15 23:30:05.303172: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-15 23:30:05.303185: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:30:05.303117: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-15 23:30:05.303124: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:30:05.324728: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.324733: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.324735: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.324737: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.324739: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.324743: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:30:05.324740: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.324749: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:30:05.287219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.287230: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:30:05.287231: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:30:05.287230: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:30:05.287233: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:30:05.287233: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:30:05.287234: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:30:05.287240: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +6: 2023-03-15 23:30:05.287245: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +6: 2023-03-15 23:30:05.287258: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:30:05.358709: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.358712: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.358717: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.358717: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.358724: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.358720: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.358719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.358725: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.324743: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.324750: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:30:05.324755: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:30:05.324755: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:30:05.324758: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:30:05.324761: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +5: 2023-03-15 23:30:05.324761: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +5: 2023-03-15 23:30:05.324778: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:30:05.381841: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.381846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.381847: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.381848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.381857: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:30:05.381851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.381854: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.381861: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:30:05.381864: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:30:05.381865: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:30:05.381868: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:30:05.381869: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:30:05.381872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.381872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +3: 2023-03-15 23:30:05.381889: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +3: 2023-03-15 23:30:05.381889: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:30:05.435482: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.435478: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.435487: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.435488: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.435492: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.435493: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.435496: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.435497: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.437046: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.437046: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.437049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.437051: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.437054: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.437056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.437060: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:30:05.437061: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:30:05.437065: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:30:05.437068: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:30:05.437069: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:30:05.437070: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:30:05.437095: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.437096: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +7: 2023-03-15 23:30:05.437108: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +7: 2023-03-15 23:30:05.437110: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +3: Successfully preprocessed all matching files. +5: Successfully preprocessed all matching files. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +7: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +5: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +3: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +6: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +2: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +4: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +7: Building extension module utils... +7: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +1: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: +6: +6: +6: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: +2: +2: +2: +7: Loading extension module utils... +0: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +3: Loading extension module utils... +7: Loading extension module utils...Loading extension module utils... +7: +7: Loading extension module utils... +7: Loading extension module utils...Loading extension module utils... +7: Loading extension module utils... +7: +5: Loading extension module utils... +7: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +1: Loading extension module utils... +5: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +1: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +4: Loading extension module utils... +5: Loading extension module utils... +5: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +6: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +2: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +7: No modifications detected for re-loaded extension module utils, skipping build step... +7: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils...Loading extension module utils... +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils...Loading extension module utils... +0: +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +0: +0: Loading extension module utils... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: +4: +4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Loading extension module utils... +4: +4: +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +4: +4: Loading extension module utils... +4: No modifications detected for re-loaded extension module utils, skipping build step... +4: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils...Loading extension module utils... +5: +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step... +5: +5: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +5: No modifications detected for re-loaded extension module utils, skipping build step... +5: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +3: +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +3: No modifications detected for re-loaded extension module utils, skipping build step... +3: Loading extension module utils... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils...Loading extension module utils... +6: +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +6: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +6: +6: Loading extension module utils...Loading extension module utils... +6: +6: No modifications detected for re-loaded extension module utils, skipping build step... +6: Loading extension module utils... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +2: +2: Loading extension module utils... +2: No modifications detected for re-loaded extension module utils, skipping build step... +2: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/280m5b9400m/3318675.out b/280m5b9400m/3318675.out new file mode 100644 index 0000000000000000000000000000000000000000..1ee6a94bc34db353287844078073086d4b45fd1c --- /dev/null +++ b/280m5b9400m/3318675.out @@ -0,0 +1,6435 @@ +Model parameters: d_model 1024 ffw_size 4096 kv_size 64 n_heads 16 n_layers 18 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 18 --hidden-size 1024 --num-attention-heads 16 --kv-channels 64 --ffn-hidden-size 4096 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-280m5b9400mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-only true --eval-iters 100 --tensorboard-dir tensorboard_280m5b9400mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_280m5b9400m --load checkpoints_280m5b9400m --train-weighted-split-paths-path train400m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3318675.json --zero-stage 0 +START 3318675: Wed 15 Mar 2023 11:29:40 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 64.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 58.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 50.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 52.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 57.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 47.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +3: +3: +3: ======================= ROCm System Management Interface ======================= +3: ================================= Concise Info ================================= +3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +3: 0 52.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 1 55.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 2 48.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 3 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 4 55.0c 100.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: 6 48.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +3: 7 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +3: ================================================================================ +3: ============================= End of ROCm SMI Log ============================== +7: +7: +7: ======================= ROCm System Management Interface ======================= +7: ================================= Concise Info ================================= +7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +7: 0 52.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 1 57.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 2 49.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 3 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 4 54.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 5 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: 6 49.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +7: 7 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +7: ================================================================================ +7: ============================= End of ROCm SMI Log ============================== +4: +4: +4: ======================= ROCm System Management Interface ======================= +4: ================================= Concise Info ================================= +4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +4: 0 57.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 1 59.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 2 50.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 3 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 4 58.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 5 55.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: 6 50.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +4: 7 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +4: ================================================================================ +4: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 59.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 57.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 50.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 55.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 60.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 44.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +2: +2: +2: ======================= ROCm System Management Interface ======================= +2: ================================= Concise Info ================================= +2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +2: 0 61.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 1 60.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 2 44.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 3 54.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 4 48.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 5 56.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: 6 43.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +2: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +2: ================================================================================ +2: ============================= End of ROCm SMI Log ============================== +6: +6: +6: ======================= ROCm System Management Interface ======================= +6: ================================= Concise Info ================================= +6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +6: 0 55.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 1 55.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 2 52.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 3 57.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 4 61.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 5 55.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: 6 51.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +6: 7 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +6: ================================================================================ +6: ============================= End of ROCm SMI Log ============================== +5: +5: +5: ======================= ROCm System Management Interface ======================= +5: ================================= Concise Info ================================= +5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +5: 0 54.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 1 57.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 2 50.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 4 49.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 5 54.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: 6 45.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +5: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +5: ================================================================================ +5: ============================= End of ROCm SMI Log ============================== +5: Launching on nid006714 (5/8), master nid006709 port 9999, GPUs 8, CUDA: True +2: Launching on nid006711 (2/8), master nid006709 port 9999, GPUs 8, CUDA: True +4: Launching on nid006713 (4/8), master nid006709 port 9999, GPUs 8, CUDA: True +7: Launching on nid006716 (7/8), master nid006709 port 9999, GPUs 8, CUDA: True +1: Launching on nid006710 (1/8), master nid006709 port 9999, GPUs 8, CUDA: True +6: Launching on nid006715 (6/8), master nid006709 port 9999, GPUs 8, CUDA: True +3: Launching on nid006712 (3/8), master nid006709 port 9999, GPUs 8, CUDA: True +0: Launching on nid006709 (0/8), master nid006709 port 9999, GPUs 8, CUDA: True +0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 64 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3318675.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... True +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 4096 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 1024 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-280m5b9400mval +0: kv_channels ..................................... 64 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_280m5b9400m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 16 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 18 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_280m5b9400m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_280m5b9400mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 64 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +7: > setting tensorboard ... +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-15 23:30:20,179] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.100 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.o scaled_upper_triang_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: [1/1] c++ scaled_masked_softmax_hip.o scaled_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 22.776 seconds +0: time to initialize megatron (seconds): 6.024 +0: [after megatron is initialized] datetime: 2023-03-15 23:30:46 +0: building GPT model ... +0: [2023-03-15 23:30:46,157] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-15 23:30:46,158] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-15 23:30:46,158] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi +0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 +0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} +0: [2023-03-15 23:30:48,178] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=25 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: undo +0: 22: MixedFusedLayerNorm +0: 23: EmbeddingPipe +0: 24: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-15 23:30:48,371] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-15 23:30:48,372] [INFO] [utils.py:828:see_memory_usage] MA 0.53 GB Max_MA 0.53 GB CA 0.57 GB Max_CA 1 GB +0: [2023-03-15 23:30:48,372] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-15 23:30:48,374] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-15 23:31:01,622] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-15 23:31:01,622] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-15 23:31:01,622] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-15 23:31:01,628] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-15 23:31:01,629] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-15 23:31:01,748] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-15 23:31:01,749] [INFO] [utils.py:828:see_memory_usage] MA 0.52 GB Max_MA 0.53 GB CA 0.57 GB Max_CA 1 GB +0: [2023-03-15 23:31:01,749] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.11 GB, percent = 6.4% +7: ninja: no work to do. +7: Time to load utils op: 0.2889077663421631 seconds +0: Time to load utils op: 0.2107524871826172 seconds +7: Time to load utils op: 0.0005960464477539062 seconds +0: Time to load utils op: 0.10275816917419434 seconds +0: Time to load utils op: 0.10192465782165527 seconds +0: Time to load utils op: 0.10228967666625977 seconds +0: Time to load utils op: 0.10259246826171875 seconds +0: Time to load utils op: 0.10283017158508301 seconds +0: Time to load utils op: 0.102294921875 seconds +0: Time to load utils op: 0.10261964797973633 seconds +7: Time to load utils op: 0.10233664512634277 seconds +7: Time to load utils op: 0.10284781455993652 secondsTime to load utils op: 0.10219216346740723 secondsTime to load utils op: 0.1028599739074707 seconds +7: +7: +7: Time to load utils op: 0.10268115997314453 seconds +7: Time to load utils op: 0.10196781158447266 seconds +7: Time to load utils op: 0.10270261764526367 seconds +3: Time to load utils op: 0.1112675666809082 secondsTime to load utils op: 0.11114072799682617 seconds +3: +3: Time to load utils op: 0.1090538501739502 secondsTime to load utils op: 0.10745120048522949 secondsTime to load utils op: 0.1123654842376709 seconds +3: Time to load utils op: 0.11034584045410156 seconds +3: +3: +3: Time to load utils op: 0.10952448844909668 secondsTime to load utils op: 0.11214447021484375 seconds +3: +5: Time to load utils op: 0.10978937149047852 seconds +5: Time to load utils op: 0.10965275764465332 secondsTime to load utils op: 0.10755538940429688 seconds +5: +5: Time to load utils op: 0.10942506790161133 seconds +5: Time to load utils op: 0.10550498962402344 seconds +5: Time to load utils op: 0.10343074798583984 seconds +5: Time to load utils op: 0.10225152969360352 seconds +5: Time to load utils op: 0.10581684112548828 seconds +1: Time to load utils op: 0.11087989807128906 seconds +1: Time to load utils op: 0.1109156608581543 seconds +1: Time to load utils op: 0.11092615127563477 seconds +1: Time to load utils op: 0.11093306541442871 seconds +1: Time to load utils op: 0.11094450950622559 secondsTime to load utils op: 0.1109457015991211 seconds +1: +1: Time to load utils op: 0.11095237731933594 seconds +1: Time to load utils op: 0.1109459400177002 seconds +4: Time to load utils op: 0.11014556884765625 secondsTime to load utils op: 0.11011672019958496 secondsTime to load utils op: 0.11011242866516113 seconds +4: +4: +4: Time to load utils op: 0.1101531982421875 seconds +4: Time to load utils op: 0.11010384559631348 seconds +4: Time to load utils op: 0.1101386547088623 seconds +4: Time to load utils op: 0.11013674736022949 secondsTime to load utils op: 0.1101691722869873 seconds +4: +6: Time to load utils op: 0.11074709892272949 seconds +6: Time to load utils op: 0.11077022552490234 seconds +6: Time to load utils op: 0.11076927185058594 seconds +6: Time to load utils op: 0.1107783317565918 seconds +6: Time to load utils op: 0.11078763008117676 seconds +6: Time to load utils op: 0.11079859733581543 secondsTime to load utils op: 0.11080026626586914 seconds +6: +6: Time to load utils op: 0.1066431999206543 seconds +7: Time to load utils op: 0.00035834312438964844 seconds +7: Time to load utils op: 0.0003535747528076172 seconds +7: Time to load utils op: 0.00044989585876464844 seconds +7: Time to load utils op: 0.0003814697265625 seconds +7: Time to load utils op: 0.00037741661071777344 seconds +7: Time to load utils op: 0.0003299713134765625 seconds +7: Time to load utils op: 0.0004000663757324219 seconds +2: Time to load utils op: 0.1125643253326416 secondsTime to load utils op: 0.11255884170532227 seconds +2: Time to load utils op: 0.11256170272827148 seconds +2: +2: Time to load utils op: 0.11256527900695801 seconds +2: Time to load utils op: 0.11257553100585938 seconds +2: Time to load utils op: 0.11259317398071289 seconds +2: Time to load utils op: 0.11259746551513672 seconds +2: Time to load utils op: 0.11260032653808594 seconds +0: Time to load utils op: 0.0005970001220703125 secondsTime to load utils op: 0.0005469322204589844 seconds +0: Time to load utils op: 0.000522613525390625 seconds +0: +0: Time to load utils op: 0.0006334781646728516 seconds +0: Time to load utils op: 0.0005993843078613281 seconds +0: Time to load utils op: 0.0005738735198974609 secondsTime to load utils op: 0.0006060600280761719 seconds +0: +4: Time to load utils op: 0.0009882450103759766 seconds +4: Time to load utils op: 0.001157999038696289 seconds +4: Time to load utils op: 0.001379251480102539 seconds +4: Time to load utils op: 0.0013730525970458984 seconds +4: Time to load utils op: 0.0013360977172851562 seconds +4: Time to load utils op: 0.0013914108276367188 seconds +4: Time to load utils op: 0.0013322830200195312 seconds +4: Time to load utils op: 0.0013642311096191406 seconds +5: Time to load utils op: 0.00047278404235839844 secondsTime to load utils op: 0.0004971027374267578 seconds +5: +5: Time to load utils op: 0.0003764629364013672 seconds +5: Time to load utils op: 0.0003600120544433594 seconds +5: Time to load utils op: 0.0003638267517089844 seconds +5: Time to load utils op: 0.00035858154296875 seconds +5: Time to load utils op: 0.0003714561462402344 seconds +5: Time to load utils op: 0.0003528594970703125 seconds +1: Time to load utils op: 0.001001596450805664 seconds +3: Time to load utils op: 0.0004940032958984375 secondsTime to load utils op: 0.0003457069396972656 seconds +3: +1: Time to load utils op: 0.0011758804321289062 seconds +3: Time to load utils op: 0.0003743171691894531 seconds +1: Time to load utils op: 0.001094818115234375 seconds +3: Time to load utils op: 0.000377655029296875 seconds +3: Time to load utils op: 0.0003757476806640625 seconds +3: Time to load utils op: 0.0004062652587890625 seconds +3: Time to load utils op: 0.0003879070281982422 seconds +3: Time to load utils op: 0.0003578662872314453 seconds +1: Time to load utils op: 0.0013854503631591797 seconds +1: Time to load utils op: 0.0012586116790771484 seconds +1: Time to load utils op: 0.0012729167938232422 seconds +1: Time to load utils op: 0.0012924671173095703 seconds +1: Time to load utils op: 0.0012760162353515625 seconds +6: Time to load utils op: 0.0004937648773193359 seconds +6: Time to load utils op: 0.0003898143768310547 seconds +6: Time to load utils op: 0.0003800392150878906 seconds +6: Time to load utils op: 0.00044536590576171875 seconds +6: Time to load utils op: 0.0004353523254394531 seconds +6: Time to load utils op: 0.0004968643188476562 seconds +6: Time to load utils op: 0.00042748451232910156 seconds +6: Time to load utils op: 0.00040602684020996094 seconds +0: [2023-03-15 23:31:02,087] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-15 23:31:02,087] [INFO] [utils.py:828:see_memory_usage] MA 0.52 GB Max_MA 0.52 GB CA 0.57 GB Max_CA 1 GB +0: [2023-03-15 23:31:02,088] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.23 GB, percent = 6.4% +2: Time to load utils op: 0.0008692741394042969 seconds +2: Time to load utils op: 0.0011820793151855469 seconds +2: Time to load utils op: 0.001127481460571289 seconds +2: Time to load utils op: 0.0011513233184814453 seconds +2: Time to load utils op: 0.0011782646179199219 seconds +2: Time to load utils op: 0.001165151596069336 secondsTime to load utils op: 0.0012063980102539062 seconds +2: +2: Time to load utils op: 0.0011916160583496094 seconds +0: [2023-03-15 23:31:02,206] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-15 23:31:02,207] [INFO] [utils.py:828:see_memory_usage] MA 1.14 GB Max_MA 1.14 GB CA 1.48 GB Max_CA 1 GB +0: [2023-03-15 23:31:02,207] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% +0: [2023-03-15 23:31:02,313] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-15 23:31:02,314] [INFO] [utils.py:828:see_memory_usage] MA 1.14 GB Max_MA 1.14 GB CA 1.48 GB Max_CA 1 GB +0: [2023-03-15 23:31:02,314] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% +0: [2023-03-15 23:31:02,421] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-15 23:31:02,422] [INFO] [utils.py:828:see_memory_usage] MA 1.58 GB Max_MA 1.58 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 23:31:02,422] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% +0: [2023-03-15 23:31:02,526] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-15 23:31:02,527] [INFO] [utils.py:828:see_memory_usage] MA 1.58 GB Max_MA 1.58 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 23:31:02,527] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% +0: [2023-03-15 23:31:02,634] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-15 23:31:02,635] [INFO] [utils.py:828:see_memory_usage] MA 1.58 GB Max_MA 1.58 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 23:31:02,635] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% +0: [2023-03-15 23:31:02,740] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-15 23:31:02,740] [INFO] [utils.py:828:see_memory_usage] MA 1.58 GB Max_MA 1.58 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 23:31:02,740] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% +0: [2023-03-15 23:31:02,857] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-15 23:31:02,858] [INFO] [utils.py:828:see_memory_usage] MA 1.62 GB Max_MA 1.62 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 23:31:02,858] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% +0: [2023-03-15 23:31:02,963] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-15 23:31:02,964] [INFO] [utils.py:828:see_memory_usage] MA 1.62 GB Max_MA 1.62 GB CA 2.14 GB Max_CA 2 GB +0: [2023-03-15 23:31:02,964] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% +0: [2023-03-15 23:31:02,964] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-15 23:31:02,965] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-15 23:31:02,965] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-15 23:31:02,965] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-15 23:31:02,965] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-15 23:31:02,965] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-15 23:31:02,965] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-15 23:31:02,965] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-15 23:31:02,966] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] world_size ................... 64 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-15 23:31:02,967] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-15 23:31:02,967] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.00043010711669921875 seconds +0: [2023-03-15 23:31:02,968] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-15 23:31:03,029] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=25 [0, 25) STAGE_PARAMS=280342528 (280.343M) TOTAL_PARAMS=280342528 (280.343M) UNIQUE_PARAMS=280342528 (280.343M) +6: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +0: [2023-03-15 23:31:03,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +0: [2023-03-15 23:31:03,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +5: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +4: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +2: [2023-03-15 23:31:03,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt... +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +1: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +7: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +5: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +6: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +4: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +3: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/mp_rank_00_model_states.pt. +2: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +2: [2023-03-15 23:31:03,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +7: [2023-03-15 23:31:03,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +4: [2023-03-15 23:31:03,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +6: [2023-03-15 23:31:03,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +5: [2023-03-15 23:31:03,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +1: [2023-03-15 23:31:03,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +3: [2023-03-15 23:31:03,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt... +0: [2023-03-15 23:31:03,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +0: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +6: [2023-03-15 23:31:03,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +7: [2023-03-15 23:31:03,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +2: [2023-03-15 23:31:03,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +5: [2023-03-15 23:31:03,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +3: [2023-03-15 23:31:03,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +4: [2023-03-15 23:31:03,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_01-model_00-model_states.pt. +1: [2023-03-15 23:31:03,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +7: [2023-03-15 23:31:03,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +6: [2023-03-15 23:31:03,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +0: [2023-03-15 23:31:03,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +0: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +5: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +4: [2023-03-15 23:31:03,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +2: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +7: [2023-03-15 23:31:03,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +4: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt... +1: [2023-03-15 23:31:03,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +3: [2023-03-15 23:31:03,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +5: [2023-03-15 23:31:03,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +1: [2023-03-15 23:31:03,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_03-model_00-model_states.pt. +6: [2023-03-15 23:31:03,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +1: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +7: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +0: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +5: [2023-03-15 23:31:03,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +7: [2023-03-15 23:31:03,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +2: [2023-03-15 23:31:03,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +4: [2023-03-15 23:31:03,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +5: [2023-03-15 23:31:03,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +1: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +3: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +4: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +3: [2023-03-15 23:31:03,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +0: [2023-03-15 23:31:03,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +0: [2023-03-15 23:31:03,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt... +6: [2023-03-15 23:31:03,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_04-model_00-model_states.pt. +6: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +1: [2023-03-15 23:31:03,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +0: [2023-03-15 23:31:03,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +7: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +5: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +7: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +2: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +3: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +4: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +1: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +4: [2023-03-15 23:31:03,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +2: [2023-03-15 23:31:03,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +5: [2023-03-15 23:31:03,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +3: [2023-03-15 23:31:03,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt... +6: [2023-03-15 23:31:03,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_05-model_00-model_states.pt. +6: [2023-03-15 23:31:03,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +2: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +0: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +4: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +1: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +7: [2023-03-15 23:31:03,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +7: [2023-03-15 23:31:03,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +0: [2023-03-15 23:31:03,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +1: [2023-03-15 23:31:03,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +2: [2023-03-15 23:31:03,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +5: [2023-03-15 23:31:03,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +4: [2023-03-15 23:31:03,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +3: [2023-03-15 23:31:03,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt... +6: [2023-03-15 23:31:03,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_06-model_00-model_states.pt. +6: [2023-03-15 23:31:03,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +1: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +0: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +5: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +0: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +7: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +3: [2023-03-15 23:31:03,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +4: [2023-03-15 23:31:03,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +1: [2023-03-15 23:31:03,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +7: [2023-03-15 23:31:03,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +4: [2023-03-15 23:31:03,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +3: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +2: [2023-03-15 23:31:03,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +5: [2023-03-15 23:31:03,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt... +6: [2023-03-15 23:31:03,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_07-model_00-model_states.pt. +6: [2023-03-15 23:31:03,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +4: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +0: [2023-03-15 23:31:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +7: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +1: [2023-03-15 23:31:03,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +1: [2023-03-15 23:31:03,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +7: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt... +5: [2023-03-15 23:31:03,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +4: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +0: [2023-03-15 23:31:03,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +5: [2023-03-15 23:31:03,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +6: [2023-03-15 23:31:03,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +2: [2023-03-15 23:31:03,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_08-model_00-model_states.pt. +3: [2023-03-15 23:31:03,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +7: [2023-03-15 23:31:03,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +1: [2023-03-15 23:31:03,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +0: [2023-03-15 23:31:03,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +4: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +0: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +3: [2023-03-15 23:31:03,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +7: [2023-03-15 23:31:03,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +4: [2023-03-15 23:31:03,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +3: [2023-03-15 23:31:03,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +2: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +1: [2023-03-15 23:31:03,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +5: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt... +6: [2023-03-15 23:31:03,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +5: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_09-model_00-model_states.pt. +6: [2023-03-15 23:31:03,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +7: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +3: [2023-03-15 23:31:03,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +0: [2023-03-15 23:31:03,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +3: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +4: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +0: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +7: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +4: [2023-03-15 23:31:03,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +1: [2023-03-15 23:31:03,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +1: [2023-03-15 23:31:03,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +5: [2023-03-15 23:31:03,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt... +6: [2023-03-15 23:31:03,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +5: [2023-03-15 23:31:03,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +6: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +5: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +7: [2023-03-15 23:31:03,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +3: [2023-03-15 23:31:03,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +0: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +5: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +3: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +0: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +7: [2023-03-15 23:31:03,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_10-model_00-model_states.pt. +2: [2023-03-15 23:31:03,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +2: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +4: [2023-03-15 23:31:03,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +4: [2023-03-15 23:31:03,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +1: [2023-03-15 23:31:03,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +1: [2023-03-15 23:31:03,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt... +6: [2023-03-15 23:31:03,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +6: [2023-03-15 23:31:03,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +1: [2023-03-15 23:31:03,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +0: [2023-03-15 23:31:03,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +5: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +7: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +5: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +4: [2023-03-15 23:31:03,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +7: [2023-03-15 23:31:03,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +3: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +0: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_11-model_00-model_states.pt. +2: [2023-03-15 23:31:03,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +2: [2023-03-15 23:31:03,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +4: [2023-03-15 23:31:03,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +2: [2023-03-15 23:31:03,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +1: [2023-03-15 23:31:03,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt... +6: [2023-03-15 23:31:03,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +3: [2023-03-15 23:31:03,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_12-model_00-model_states.pt. +6: [2023-03-15 23:31:03,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +5: [2023-03-15 23:31:03,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +5: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +6: [2023-03-15 23:31:03,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +1: [2023-03-15 23:31:03,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +3: [2023-03-15 23:31:03,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +0: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +7: [2023-03-15 23:31:03,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +7: [2023-03-15 23:31:03,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +0: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +3: [2023-03-15 23:31:03,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +2: [2023-03-15 23:31:03,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt... +4: [2023-03-15 23:31:03,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +6: [2023-03-15 23:31:03,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +4: [2023-03-15 23:31:03,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +2: [2023-03-15 23:31:03,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_13-model_00-model_states.pt. +1: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +1: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +3: [2023-03-15 23:31:03,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +5: [2023-03-15 23:31:03,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +5: [2023-03-15 23:31:03,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +7: [2023-03-15 23:31:03,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +3: [2023-03-15 23:31:03,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +7: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +0: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +0: [2023-03-15 23:31:03,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +6: [2023-03-15 23:31:03,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +2: [2023-03-15 23:31:03,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt... +4: [2023-03-15 23:31:03,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +6: [2023-03-15 23:31:03,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +4: [2023-03-15 23:31:03,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +2: [2023-03-15 23:31:03,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_14-model_00-model_states.pt. +1: [2023-03-15 23:31:03,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +5: [2023-03-15 23:31:03,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +3: [2023-03-15 23:31:03,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +5: [2023-03-15 23:31:03,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +1: [2023-03-15 23:31:03,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +1: [2023-03-15 23:31:03,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +3: [2023-03-15 23:31:03,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +6: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +7: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +4: [2023-03-15 23:31:03,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +7: [2023-03-15 23:31:03,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +0: [2023-03-15 23:31:03,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +2: [2023-03-15 23:31:03,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt... +0: [2023-03-15 23:31:03,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +4: [2023-03-15 23:31:03,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +6: [2023-03-15 23:31:03,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_15-model_00-model_states.pt. +2: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +7: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +5: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +0: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +2: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +4: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +3: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +6: [2023-03-15 23:31:03,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt... +1: [2023-03-15 23:31:03,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:03,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +3: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +1: [2023-03-15 23:31:03,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +2: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +4: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +7: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:03,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +5: [2023-03-15 23:31:03,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:03,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:03,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +0: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_16-model_00-model_states.pt. +6: [2023-03-15 23:31:03,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:03,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:03,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:03,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:03,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:03,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:03,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:03,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:03,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:03,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:03,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:04,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:04,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:04,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:04,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:04,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:04,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:04,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +7: [2023-03-15 23:31:04,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +1: [2023-03-15 23:31:04,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +6: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:04,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +0: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:04,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:04,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +5: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +5: [2023-03-15 23:31:04,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt... +3: [2023-03-15 23:31:04,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:04,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +3: [2023-03-15 23:31:04,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +1: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +7: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +6: [2023-03-15 23:31:04,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +2: [2023-03-15 23:31:04,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +0: [2023-03-15 23:31:04,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_17-model_00-model_states.pt. +4: [2023-03-15 23:31:04,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +7: [2023-03-15 23:31:04,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +4: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +3: [2023-03-15 23:31:04,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +5: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +5: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +6: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +1: [2023-03-15 23:31:04,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +1: [2023-03-15 23:31:04,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt... +0: [2023-03-15 23:31:04,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +3: [2023-03-15 23:31:04,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +7: [2023-03-15 23:31:04,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +6: [2023-03-15 23:31:04,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +0: [2023-03-15 23:31:04,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +4: [2023-03-15 23:31:04,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_18-model_00-model_states.pt. +2: [2023-03-15 23:31:04,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +5: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +5: [2023-03-15 23:31:04,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +3: [2023-03-15 23:31:04,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +1: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +3: [2023-03-15 23:31:04,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +7: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +6: [2023-03-15 23:31:04,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +4: [2023-03-15 23:31:04,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +2: [2023-03-15 23:31:04,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt... +0: [2023-03-15 23:31:04,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +0: [2023-03-15 23:31:04,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +7: [2023-03-15 23:31:04,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +4: [2023-03-15 23:31:04,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +2: [2023-03-15 23:31:04,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +1: [2023-03-15 23:31:04,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_19-model_00-model_states.pt. +6: [2023-03-15 23:31:04,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +1: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +0: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +5: [2023-03-15 23:31:04,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +5: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +5: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +3: [2023-03-15 23:31:04,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +0: [2023-03-15 23:31:04,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +0: [2023-03-15 23:31:04,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +7: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +1: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +1: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +1: [2023-03-15 23:31:04,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:04,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +6: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +4: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +2: [2023-03-15 23:31:04,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt... +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +7: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +5: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... +5: [2023-03-15 23:31:04,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +4: [2023-03-15 23:31:04,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +4: [2023-03-15 23:31:04,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +3: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +3: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... +3: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +0: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:04,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +2: [2023-03-15 23:31:04,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_20-model_00-model_states.pt. +6: [2023-03-15 23:31:04,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +2: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +2: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +7: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:04,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:04,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... +7: [2023-03-15 23:31:04,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt... +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +6: [2023-03-15 23:31:04,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/layer_22-model_00-model_states.pt. +4: [2023-03-15 23:31:04,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:04,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:04,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:04,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:04,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:04,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:04,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... +4: [2023-03-15 23:31:04,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... +6: [2023-03-15 23:31:04,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... +2: [2023-03-15 23:31:04,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +0: [2023-03-15 23:31:04,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:04,253] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 4 +1: [2023-03-15 23:31:04,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:04,253] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 10 +3: [2023-03-15 23:31:04,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:04,254] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 24 +1: [2023-03-15 23:31:04,256] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 10 +5: [2023-03-15 23:31:04,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:04,256] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 41 +3: [2023-03-15 23:31:04,256] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 24 +0: [2023-03-15 23:31:04,257] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 4 +1: [2023-03-15 23:31:04,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:04,257] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 8 +5: [2023-03-15 23:31:04,258] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 41 +7: [2023-03-15 23:31:04,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:04,260] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 56 +1: [2023-03-15 23:31:04,260] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 8 +5: [2023-03-15 23:31:04,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:04,263] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 56 +5: [2023-03-15 23:31:04,262] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 46 +5: [2023-03-15 23:31:04,265] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 46 +4: [2023-03-15 23:31:04,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:04,276] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 35 +0: [2023-03-15 23:31:04,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:04,277] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 2 +4: [2023-03-15 23:31:04,278] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 35 +7: [2023-03-15 23:31:04,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:04,278] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 61 +0: [2023-03-15 23:31:04,280] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 2 +7: [2023-03-15 23:31:04,281] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 61 +2: [2023-03-15 23:31:04,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:04,282] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 18 +6: [2023-03-15 23:31:04,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:04,284] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 53 +2: [2023-03-15 23:31:04,285] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 18 +1: [2023-03-15 23:31:04,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:04,286] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 11 +6: [2023-03-15 23:31:04,286] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 53 +1: [2023-03-15 23:31:04,288] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 11 +2: [2023-03-15 23:31:04,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:04,291] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 23 +2: [2023-03-15 23:31:04,293] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 23 +3: [2023-03-15 23:31:04,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:04,295] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 31 +4: [2023-03-15 23:31:04,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:04,295] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 33 +5: [2023-03-15 23:31:04,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:04,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:04,296] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 47 +0: [2023-03-15 23:31:04,296] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 6 +3: [2023-03-15 23:31:04,297] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 31 +4: [2023-03-15 23:31:04,298] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 33 +5: [2023-03-15 23:31:04,298] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 47 +0: [2023-03-15 23:31:04,299] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 6 +6: [2023-03-15 23:31:04,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:04,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:04,306] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 12 +6: [2023-03-15 23:31:04,305] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 48 +6: [2023-03-15 23:31:04,308] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 48 +1: [2023-03-15 23:31:04,308] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 12 +0: [2023-03-15 23:31:04,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:04,311] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 1 +7: [2023-03-15 23:31:04,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:04,313] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 58 +0: [2023-03-15 23:31:04,313] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 1 +7: [2023-03-15 23:31:04,315] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 58 +3: [2023-03-15 23:31:04,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:04,318] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 28 +3: [2023-03-15 23:31:04,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:04,318] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 26 +3: [2023-03-15 23:31:04,320] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 28 +3: [2023-03-15 23:31:04,321] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 26 +2: [2023-03-15 23:31:04,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:04,324] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 22 +2: [2023-03-15 23:31:04,327] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 22 +1: [2023-03-15 23:31:04,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:04,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:04,328] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 13 +5: [2023-03-15 23:31:04,328] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 43 +5: [2023-03-15 23:31:04,330] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 43 +1: [2023-03-15 23:31:04,330] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 13 +4: [2023-03-15 23:31:04,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:04,331] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 36 +4: [2023-03-15 23:31:04,333] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 36 +6: [2023-03-15 23:31:04,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:04,330] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 55 +6: [2023-03-15 23:31:04,333] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 55 +2: [2023-03-15 23:31:04,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:04,336] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 19 +2: [2023-03-15 23:31:04,338] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 19 +5: [2023-03-15 23:31:04,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:04,341] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 44 +7: [2023-03-15 23:31:04,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:04,342] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 59 +5: [2023-03-15 23:31:04,343] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 44 +7: [2023-03-15 23:31:04,344] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 59 +1: [2023-03-15 23:31:04,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:04,348] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 9 +1: [2023-03-15 23:31:04,350] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 9 +4: [2023-03-15 23:31:04,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:04,352] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 37 +4: [2023-03-15 23:31:04,354] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 37 +0: [2023-03-15 23:31:04,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:04,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:04,350] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 52 +6: [2023-03-15 23:31:04,352] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 52 +0: [2023-03-15 23:31:04,355] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 3 +0: [2023-03-15 23:31:04,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:04,357] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 5 +0: [2023-03-15 23:31:04,357] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 3 +0: [2023-03-15 23:31:04,359] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 5 +2: [2023-03-15 23:31:04,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:04,360] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 16 +3: [2023-03-15 23:31:04,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:04,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:04,362] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 30 +3: [2023-03-15 23:31:04,362] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 25 +2: [2023-03-15 23:31:04,362] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 16 +3: [2023-03-15 23:31:04,365] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 25 +3: [2023-03-15 23:31:04,365] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 30 +6: [2023-03-15 23:31:04,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:04,371] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 50 +5: [2023-03-15 23:31:04,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:04,374] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 42 +5: [2023-03-15 23:31:04,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:04,376] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 45 +5: [2023-03-15 23:31:04,376] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 42 +4: [2023-03-15 23:31:04,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:04,377] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 39 +5: [2023-03-15 23:31:04,378] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 45 +6: [2023-03-15 23:31:04,374] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 50 +4: [2023-03-15 23:31:04,379] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 39 +0: [2023-03-15 23:31:04,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:04,380] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 7 +2: [2023-03-15 23:31:04,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:04,380] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 21 +0: [2023-03-15 23:31:04,382] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 7 +2: [2023-03-15 23:31:04,383] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 21 +1: [2023-03-15 23:31:04,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:04,384] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 14 +1: [2023-03-15 23:31:04,386] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 14 +7: [2023-03-15 23:31:04,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:04,388] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 60 +7: [2023-03-15 23:31:04,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:04,390] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 57 +7: [2023-03-15 23:31:04,390] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 60 +7: [2023-03-15 23:31:04,392] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 57 +5: [2023-03-15 23:31:04,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. +5: [2023-03-15 23:31:04,397] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 40 +0: [2023-03-15 23:31:04,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-15 23:31:04,398] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 0 +6: [2023-03-15 23:31:04,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:04,399] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 54 +5: [2023-03-15 23:31:04,399] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 40 +0: [2023-03-15 23:31:04,400] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 0 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +6: [2023-03-15 23:31:04,402] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 54 +3: [2023-03-15 23:31:04,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:04,403] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 27 +3: [2023-03-15 23:31:04,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. +3: [2023-03-15 23:31:04,404] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 29 +3: [2023-03-15 23:31:04,405] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 27 +3: [2023-03-15 23:31:04,407] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 29 +7: [2023-03-15 23:31:04,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:04,409] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 63 +7: [2023-03-15 23:31:04,411] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 63 +4: [2023-03-15 23:31:04,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:04,415] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 34 +4: [2023-03-15 23:31:04,418] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 34 +2: [2023-03-15 23:31:04,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:04,419] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 17 +2: [2023-03-15 23:31:04,421] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 17 +1: [2023-03-15 23:31:04,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-15 23:31:04,427] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 15 +6: [2023-03-15 23:31:04,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:04,429] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 51 +1: [2023-03-15 23:31:04,429] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 15 +6: [2023-03-15 23:31:04,431] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 51 +6: [2023-03-15 23:31:04,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +6: [2023-03-15 23:31:04,438] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 49 +6: [2023-03-15 23:31:04,440] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 49 +2: [2023-03-15 23:31:04,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. +2: [2023-03-15 23:31:04,441] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 20 +4: [2023-03-15 23:31:04,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:04,441] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 38 +2: [2023-03-15 23:31:04,443] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 20 +4: [2023-03-15 23:31:04,444] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 38 +7: [2023-03-15 23:31:04,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. +7: [2023-03-15 23:31:04,451] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 62 +7: [2023-03-15 23:31:04,453] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 62 +4: [2023-03-15 23:31:04,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_280m5b9400m/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. +4: [2023-03-15 23:31:04,461] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 64 ZeRO state_dicts for rank 32 +4: [2023-03-15 23:31:04,464] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 64 zero partition checkpoints for rank 32 +0: successfully loaded checkpoint from checkpoints_280m5b9400m at iteration 0 +7: time (ms) | load-checkpoint: 1442.21 +0: estimated model parameters: 0.280342528 +0: estimated model parameters without embeddings: 0.22673408 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-15 23:31:04 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 25600 +0: test: 25600 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.007880 seconds +0: number of documents: 835726 +0: > dataset split: +0: train: +0: document indices in [0, 835726) total of 835726 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.016 seconds +0: total number of samples: 195101 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.032663 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_25600ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.010 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-15 23:31:18 +0: done with setup ... +0: training ... +7: time (ms) | model-and-optimizer-setup: 18649.74 | train/valid/test-data-iterators-setup: 13001.51 +0: [after training is done] datetime: 2023-03-15 23:31:18 +7: ----------------------------------------------------------------------------------------------------------------- +7: validation loss at the end of training for val data | lm loss value: 3.386692E+00 | lm loss PPL: 2.956797E+01 | +7: ----------------------------------------------------------------------------------------------------------------- +END 3318675: Wed 15 Mar 2023 11:31:42 PM EET diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..078596bf556fa01781027835c5b2a9b5724e1e90 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:27d224d234867410f9ec095c44c22350321871f215409c3627f8e7739a67146d +size 52568791 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b150114a6e6f194c2eb70a55248c11c44ef3159d --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:70a86caecc77f43d02a0709e623e2ce028e6fee3c824f012b71006d7002958c7 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..05b3c7ff84b01b424071c1367be399b94769336f --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1bef6289626ba26c5242f5702e6354efe7954e1eced1da37bd05d7c242194063 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1648f3ae4a6db651f4d3198ba60b7eb062ebaf0 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f1afa6010bba50934bc1c66711e84760daceefca189483efd26df5702e08d8b +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..32b4a376b5e5e1d79c431090913784e3cfa519ba --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:415bf17a73e1d8dd00378bf8680ffd9a97ede3e2f47ec506563fe78faa5440fe +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..18d6a4e1b139bf9abe9f2c0495bcc06d790b56ad --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ad19341766a90026beb31e7b0ef98d546a8a6be2378e48a5d04e100cd0c2e433 +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ca0b011dd665a8087ac6bb9b31fed9156442bc91 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:70ac172c7d5bb98c9147563c0b4fece02c39ee2071435ae3f8a183284b5bf931 +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a29b9126fe36e23cc989d89a44f4c660b5a7cc9c --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e55206e0a8b3a2bc6f5ff29b710d714b164ff77359e6df0550108ea1f878b7f2 +size 52568738 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..86840302d548a82b712c27b196f42b3ab8d4955e --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf0ae1548107094a25399f058b91412531d9715249ee7d97a0bdf2e3694c16a4 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0d428f6229a9d117e53589c27841c973e0274b5d --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56a8b9af63cfa1935170df7bc19f397d4d6ca9dc2f2c092104825cedb2cf7752 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3d436976fec4a62be294fa1870b4b273094ea27a --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:733d1a0fccf3d495d20f31d2e88ce35d9f0b3dca11f37a30611f16f4bf334707 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b6319596cd357d790997e9c5222ca1e43704728e --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dce7e9c3d5ac2463b6af87ad219bad95044accdb6826a811d9402b3c08b5e500 +size 52568855 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5b56711a98f39455ddb3d9490bdc578c414b51f8 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ab3bd263cfb02aec18a1862391fff3a8a970fb7ff89c1615111ab859ceb436f +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..06906848c31051dcca90da4cfaf8ea75a2ff2b7c --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd66eede03ccd2978169f112ab459678c85a300e08e72cc6789a6e466a297b97 +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bc4dc90d67da80087ab888c739d6cced623f2ad8 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:687d1a416f062a869d0fa8b1b8b218b5a0e21532b5e464abe46f7d1924b2f2df +size 52568994 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b2dd5a59718274e1f077b307ab3743ff4e16f54f --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:62a8aef9c525e09272ed5e3debd8ffc3514e95a0f7f80d6658aeb73a7198a891 +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4050777f0f26df88cd880e0cf1319e7426134b24 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c6b6f4be33d103fceb54eaf3a73dfe5b51434ba121278a9c1c5c7eb3ef2511c1 +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4370a5f27f42b4796a9f37a7943951023ad2c221 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2e2096234f9d70e654ed062c22528517cd43243f0d40aa1b0c20e2cbbfb796c0 +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8dcd36984c1c103f8b6b9a4a739dc5fd845cd59e --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6f2879a0b22a301458c23352e4ac2d4c06a20c7a538736f2c81b1c1fd01dd4c7 +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d031328076f9a34796ae42113561c1668878025a --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06998fb95eb66a310bd3ddd6d22dd41ff3795e9730adb1d79dfc98e6dcd84b2b +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7b05d1efbdbb2a934a2aea5ef70d3e2dfbcd4d38 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e18a5a2c66104e8fec49109b19f3cbd5a8bc28a6f0ddaa5fff7f6c29bd47c67c +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fca3be63bf03531dac4443c0ec4e807512193fdb --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9be8d758580ac80728ebea6695c5aa5ecabe026296a31493dd5d0ab43fb2f79 +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b448d3b6aa3cb89dec160b7e4d7f80bf4dc638e --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:844ba0f78f2e01c1a5078f3848bd1274564f0389bf5b020416bbed228646cc6b +size 52568791 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7ca3a895c26d02b5b2878bb92c7c08f2ef958d6f --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b561ec3446773c4c4c492083dab3eddb1911df76168091216ad5e8cc6375dcca +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..34b66045b5f0659bba188fc43ea94eb3f8d1c58f --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9efc293fb4954d4edff3f96d590c04160e58a85835ceead4aaf11c3c6208017d +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c9e5f7364b25e6030529ac41e94a525aecaaee1 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d28005b95d837dc4e3fbf9227d2b51d3e8db2fffcdcc49bbbc3b9434a71c5140 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..95f7fc658b6aff48b76af4898a3f655eaaa728ed --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:15df2e55984de8b7aefbdd674e28996ca3b78a8b82116824136419dfcceb13dd +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..73c49b43af0302c37bb0d5ea40c6b97ec26b6200 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a5327d83377818a799e7f383b56d0316c73f114a0595128e0b5997fa1fed027 +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3ae2df05f593d6542979665237e60c084cb38fea --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:78af702de3a32b20f40504673d7a4864436bdfdd08f83c897c2cd01622980bd1 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4a14a2e5d5fe2b5f825e14bf8791066cf929579a --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d1ce3a63b41fcdf74be7f74a833873f697133bd3ece947ad3d8b019bd9c5d54 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..158aaec282fbf6bd9ef28cb6a0f1dc1c205a1cc8 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2049622ede979fa3ed3ade06077535e721fc54dc3ffece3ce90120f87a764f5d +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5ceaa8f05353200406c3e5cf3f11e4c93a7247b8 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1762130fd0af576eb909619607869a8cacf3f0c4e07ac8661c2f6ea8c6d16403 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..653ba3f3655db8ddddcd9ead0e7562eec3c01de0 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fb454148ae5ab34dc9bd776754ad102618dfc8b961e245d87a51bc848b0c2a84 +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..76623b924533b682c9ccd18bc6720dba6bb52d01 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a30f8f12cd48b4df9a7ab734944fb71fbbaf19a764a31042167970d9c7048d03 +size 52568791 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a12c4f66e167f1bac0beaca793bd67517c3586e6 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d67cacb7430bfe1f4ed5b17eb7fe8befa605c6a3e8a7febad799098ec5eb332 +size 52568994 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..61c232e751f40cd8131fad1f76f970e8189703fe --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:164aed09871cb80217de27190dd1dbf5c017841496b807b9dfcd6f8f714d828b +size 52568738 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e8f90924c39c055cd81bd77e6a26d113630d7c08 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f41e675ab1a6ec2ee97e05241ba7acb5dd5cfd3b2964624780b0f37403e135d4 +size 52568994 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6f51e107209358539a33b5636b725e738a0b022a --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e67fe6d8dfe4c47c669038b74fb019533bf109685051b4411fb88862c1093879 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d91c9d7648de4a3c628a0f35f669b23b05ec68e7 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f5abcc1bb9f2f1e56f18ad61d8855e3f1098f00b167b782561cab939dcbfc2c4 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1950e27b4c1068c095c10cb723066233a9481598 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0a9768763c359bdc0fc9897645316b80bb21237b03cd4dbc2acb0ae2ce904dd +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f2db7bc55e757687a65f3a61d468ee5a0a392235 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0144fc65b0b424408ad6340d256914426344b8a984dad79ae285b8338ecd1392 +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3688fc0d12a1ee9f0f2f572bf199c82d36ac5ae8 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de0f3c4f015092b2a1d72dc6fb4587a288f85c397781e87bbb4c8e862f75af6d +size 52568994 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..db481cffd61b556692f6900350f9ee74a04cfd8a --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e5f4aa67f14c8be83c520119d962448660b913f41cbfa61f7a2cf19d87b12f7f +size 52568610 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..028ab534078fc8e125b28c5d3dfc514f9658e40a --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a18626f3e540e24fccaa1e8ab870c06283fc7311e77c04350c11a892a62a993b +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b3b6773d09cacb905fa0f4eb2262ba1148352664 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:95997bfeac14c167a214a36fe128684b1a9ebdc1ef3cdf0ae2b530c731e2c373 +size 52568855 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..65cc643874f11fbbc4ea82f035e957d81e049711 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:550963ccc0e297c59989c0381d900117d9859539e35f27aef930952850b45664 +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2ba1310faeb29ed1e0aae5c40beb89e651a58e35 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f697d5235d371d7eef06a7798331e91c97511f3d2f06cfb535b1829e29307f30 +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6caecdedd73680437a403b38984a7443c8e85f6b --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a06d364bd4b0a5c2db49dad09dc94526f104a5be03efe31290e2dc90a4063be3 +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..aae8a2cfca59b2b0f8ab142c219ef642df0fa127 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85a0a42ca2363332bfd7111281b679528966f6f89adabdb436f88f08122788b4 +size 52568738 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fd3af7374dd764c4fc9d2f99c986731443af195a --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0608b4479846e954adc19a8fa459235888811fbd6d3321be89384bf84c133ea +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..10e6f1e6b880f171735040ba8559fbbd86a58c50 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6400b2d1849e5c76a5f04ed5be5f4cbe585ce24fc1e94cb36ba5489bdaadfce7 +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d997c91bbe4ecf0308fe8c70c6a7b7587682b129 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:13f3dfa49d50f7e37560bce392ee60a7bae4f325b8dabee8b25b276c6b9b4735 +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b4405c61678ed4e184fe304473e5e3164a54a8a6 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e03f297b0f9a3cd39220df09b67764dc355df4f7dd871c3b089dca7fedf69691 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bd129896ccb6d2499a7805eee975940a1999bb3d --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:24fc25141727f98aeb8cbac41cffb4669c873895f827100307e5afe0b00dd4f1 +size 52568930 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e610984b2571e2a941c2caf626dfc9a877390ac5 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63475d9aad983a070b2bcf37cc7039caac5197b6838cf8be79d173ae13e937ed +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..521eab46f96bdfc6fd5fbbbdfb01bee7b302bc0f --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:feabb8376d7d64eeb5da6114acd16d7e702fd705f7f181fa72edc4b3b4981e99 +size 52568791 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b1c089e2398c79b5670c15e7df8973c2f9da9c47 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d4763de7b972b93615a9125fb7274cc70d84bc2c285acc0fd3f98bce8a98c27 +size 52568866 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ba1c506b667258616a1e5deb32f1ef0a0a38c066 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf69f5e8055cb3c9ad8643708f0675c777101a2cc2533945250a850ea6575d84 +size 52568994 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3ae4600e4fe4cc041d0b9105d32aa79f7922285b --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bfe3c8efb3f9c232fc9d6fdd2a2e1257f93c274ea0d0c241667a257050b2057a +size 52568738 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e0517d750807331248a32a6a44cf3e7c247b8eac --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb5d4e8b991b4d1782c3ad44bef7f973ceec9125379585a9223b8dad4e108e8f +size 52568802 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f83689f09db6aecc943d239f0f894c0747b2185e --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:40c8e069be2e626aebef819c650b21b35d264afd4d9d8ab53f3ccff37b8fa4e6 +size 52568727 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0003e97d8320bf310152114b186ebcdbaaac1af0 --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b1ce5e8f4bb0e2f558451ea329ab0903d5b54896d08e3ddf20f45554e6fa49f8 +size 52568791 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..26034b24357f84ee741d037b6a7bb5f16e16b33a --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:73cb919d2e3525958abb82a7bc7c941df0a6e9bff63dce94883f37855920b9c9 +size 52568855 diff --git a/280m5b9400m/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/280m5b9400m/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..416e1d7752d3c83fb5daf1de07c0ccc6319bb01d --- /dev/null +++ b/280m5b9400m/global_step11269/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f5b2b22140e30d384fa8af67b76cb2cedfd4b3256b9cb2b8b5785f4532701c1a +size 52568791 diff --git a/280m5b9400m/global_step11269/layer_01-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..18e710e15e93e0dfa5c87b87129c5a1f909ed66e --- /dev/null +++ b/280m5b9400m/global_step11269/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb7f1db40ab848f7c3d00a8fead1c3bb16837691bc83c0bb627d66d6e545a511 +size 107218179 diff --git a/280m5b9400m/global_step11269/layer_03-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7ecdb5d3f282c17292cfe874a4761de5e1b065d8 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8f74836df91247ee1609ef3d7ed6b17e2881c91b5401648542dd13bc1c245b4 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_04-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e062981061299aa3136f8767506bd01c959c4d4f --- /dev/null +++ b/280m5b9400m/global_step11269/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d66733adb987478a9589166fd61b38e5ec160965adb715b1f374663b7251379c +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_05-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ce6185bc593443080030f6662742f76119f703bc --- /dev/null +++ b/280m5b9400m/global_step11269/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b46acacfc889473bce4decf0c794f127c661fd68dc195462c3b87c0360cdc7bd +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_06-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a378aeedcb8a64d0052d89b10cdc381444ab7967 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c17027db2b7bf37a471c824d26ba06d2e993a6f712012852680a54e5cc200e0f +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_07-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e8a37fc95109ab44a9687f92d7aff44c8f8d300f --- /dev/null +++ b/280m5b9400m/global_step11269/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1e375407d54f725af370e0377f22e3c99eed52d3cdec2e6fd5190f9a43c13154 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_08-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8297290d4adffbbacae540e901fe8112f6c280a7 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf1a177c4ae49cd46f42c3f7bb52fbf9054c10585ac3ec90f5dabd2a7ddce208 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_09-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f0ee79de412f4640af9c47045eb3853db3b1e959 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:12db06e2905fad5d568fe2e9d8ab7884dfd907979e2ab6a90063415395040f31 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_10-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8bbc25144c052032aff044d8498635c47ab9f02f --- /dev/null +++ b/280m5b9400m/global_step11269/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5a7ee13e37cd45b7452c7b66c84cc0f8f45a32d371d4470680c4b1c13d0062d +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_11-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0a90d1121361723d6730a61491b94275ffd584d3 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56bc927106ebd6181ac4de8ca8801e9fce00dc570949cf52f527d415a3940e1f +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_12-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1e3386a41d829b61e681f2181a509807dff874cd --- /dev/null +++ b/280m5b9400m/global_step11269/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:675da94d59de8c5fb10f4b6f951fd733f82b2ae50175c14b5c14f4a2528651ea +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_13-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9e7e1ec5fba759be18eac13e392bacab14c6b6e4 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:165ef30f87ae9649324c5b5ffa83ba51921cd5d0fcf4fe606e162d22f9acba98 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_14-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..35bdff96d556f6d68d0a4bb4f87bea9399627d44 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1e6decf145384895d0505222a670e2346be5b2cfd1a52a281b1bf0d57d5653ff +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_15-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..397fed86ef15f28f81006239ddfab70fa7d3eaaf --- /dev/null +++ b/280m5b9400m/global_step11269/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5096fb9af0c3cf4df58198fec064164736485fced15e46986ae9f1a376e2a751 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_16-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a66e1a8b66cab972805e6eeae4187a608a690912 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9dff8907ba6ef56040fca54150f6e6dcd69494ec2cc93f24c5a7fb201b579a8 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_17-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8f866c2946ced2e19022693fdbf0ba6fd0abd84b --- /dev/null +++ b/280m5b9400m/global_step11269/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:58b7411ccec9e3f2c0593a0a93753ad3f6ee1f0c4728c87b9a898ff0b062e880 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_18-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0c943f1d0274bba93353b6baf9d408d6f7eef65e --- /dev/null +++ b/280m5b9400m/global_step11269/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48b7d0b2e2be834dd2df5089670902e4ca40234e8e4b3407f5dffed874f6a56b +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_19-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2ec8b45116b11d82243565fa1e2060090d9ca495 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e9cbc0b2a9118ac03e7536ef178eac7fcf47c4ec3f6411705167176a0083406 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_20-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fed4ff29c61810c18a3cecca274efe408e4d1ff4 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d92e25ef63d805eaf9dd7a6d1b73dce8f45cb3f7840200873ddec919c82ffc3 +size 25196803 diff --git a/280m5b9400m/global_step11269/layer_22-model_00-model_states.pt b/280m5b9400m/global_step11269/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..29b0bcd07f3c7c5a40a9998a6f7214742d0336c3 --- /dev/null +++ b/280m5b9400m/global_step11269/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b67dbf4a9e7c85cf3550582578999af29f84d8d7ca1571bb4a0896df9533afa +size 5315 diff --git a/280m5b9400m/global_step11269/mp_rank_00_model_states.pt b/280m5b9400m/global_step11269/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bef52f150f1b91be8c9063839ce707765aa62e3d --- /dev/null +++ b/280m5b9400m/global_step11269/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f6f1dc4900f24061fd1b6de99afb0978d98740ef9bdc45ca3666cf5068303d3 +size 37747 diff --git a/280m5b9400m/sbatch_280m5b9400m.sh b/280m5b9400m/sbatch_280m5b9400m.sh new file mode 100644 index 0000000000000000000000000000000000000000..009f7399b3258607f8d0a3a1ec07fe079ff0faf4 --- /dev/null +++ b/280m5b9400m/sbatch_280m5b9400m.sh @@ -0,0 +1,163 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --exclusive=user +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=280m5b9400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_278M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 5908231000 +# -> Samples: 2884878 +TRAIN_SAMPLES=2_884_878 +#2_884_878 +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 28_849 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/280m5b9400m/sbatch_280m5b9400mval.sh b/280m5b9400m/sbatch_280m5b9400mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..2682d8deaa1a27cf1b69376a846078459f8243ec --- /dev/null +++ b/280m5b9400m/sbatch_280m5b9400mval.sh @@ -0,0 +1,168 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123 +#SBATCH --nodes=8 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --exclusive=user +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=280m5b9400mval +VARIANT_CKPT=280m5b9400m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train400m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_5B9_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=4 +GRADIENT_ACCUMULATION_STEPS=1 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_278M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=1000 + +# Tokens: 5908231000 +# -> Samples: 2884878 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-only true \ + --eval-iters 100 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/280m5b9400m/tensorboard_280m5b9400m/events.out.tfevents.1678910310.nid006716.114432.0 b/280m5b9400m/tensorboard_280m5b9400m/events.out.tfevents.1678910310.nid006716.114432.0 new file mode 100644 index 0000000000000000000000000000000000000000..04726a903be8e0bbd49cab9b74d5ec578ffb8adc --- /dev/null +++ b/280m5b9400m/tensorboard_280m5b9400m/events.out.tfevents.1678910310.nid006716.114432.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee23098fddc285191583d9039cffdd3ab0868229b8c6f50e808adee769fae654 +size 20062188 diff --git a/280m5b9400m/tensorboard_280m5b9400mval/events.out.tfevents.1678915820.nid006716.1402.0 b/280m5b9400m/tensorboard_280m5b9400mval/events.out.tfevents.1678915820.nid006716.1402.0 new file mode 100644 index 0000000000000000000000000000000000000000..9744020f1410f560380010d50a434840c653739a --- /dev/null +++ b/280m5b9400m/tensorboard_280m5b9400mval/events.out.tfevents.1678915820.nid006716.1402.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b631f3b9e1c54b587df3db6ed5a3b1bf5129e312a5b259beea2ffe584cb4fa73 +size 980 diff --git a/2b8100m100m/3322183.err b/2b8100m100m/3322183.err new file mode 100644 index 0000000000000000000000000000000000000000..8698382d8cb28029329e87c9e6c9bf9e162bbb76 --- /dev/null +++ b/2b8100m100m/3322183.err @@ -0,0 +1,304 @@ +0: 2023-03-16 15:26:06.353444: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 15:26:06.353450: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 15:26:06.353459: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 15:26:06.353443: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 15:26:06.353463: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 15:26:06.353457: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 15:26:06.353458: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 15:26:06.353457: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 15:26:06.423231: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 15:26:06.423242: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 15:26:06.423243: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 15:26:06.423242: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 15:26:06.423251: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 15:26:06.423250: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 15:26:06.423252: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 15:26:06.423249: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 15:26:07.996506: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:07.996515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:07.996507: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:07.996520: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:07.996523: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:07.996517: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:07.996522: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:07.996515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:07.996930: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 15:26:07.996935: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 15:26:07.996940: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 15:26:07.996940: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 15:26:07.996946: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 15:26:07.996944: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 15:26:07.996950: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 15:26:07.996953: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 15:26:08.085995: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:08.086002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:08.086001: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:08.086002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:08.086009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:08.086006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:08.086010: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:08.086006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:08.086463: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 15:26:08.086466: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 15:26:08.086470: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 15:26:08.086469: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 15:26:08.086471: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 15:26:08.086473: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 15:26:08.086474: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 15:26:08.086476: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 15:26:11.425910: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.425907: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.425920: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.425922: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.425919: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.425918: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.425918: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.425932: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.427525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.427522: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.427529: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.427528: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.427533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.427533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.427540: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 15:26:11.427541: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 15:26:11.427542: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 15:26:11.427538: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.427547: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 15:26:11.427548: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 15:26:11.427549: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 15:26:11.427556: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 15:26:11.427574: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 15:26:11.427591: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 15:26:11.444583: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.444587: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.444592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.444599: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.444596: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.444596: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.444594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.444594: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.446802: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.446803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.446803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.446806: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.446805: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.446808: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.446810: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.446818: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 15:26:11.446818: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 15:26:11.446818: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 15:26:11.446823: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 15:26:11.446824: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 15:26:11.446825: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 15:26:11.446827: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 15:26:11.446827: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 15:26:11.446841: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +1: Successfully preprocessed all matching files. +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/2b8100m100m/3322183.out b/2b8100m100m/3322183.out new file mode 100644 index 0000000000000000000000000000000000000000..d69a3a73e515799fcb8a57a5c6d8eaa28cb51fc0 --- /dev/null +++ b/2b8100m100m/3322183.out @@ -0,0 +1,805 @@ +Model parameters: d_model 2560 ffw_size 10240 kv_size 128 n_heads 20 n_layers 34 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 34 --hidden-size 2560 --num-attention-heads 20 --kv-channels 128 --ffn-hidden-size 10240 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 16 --global-batch-size 256 --train-samples 48_828 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-2b8100m100m --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 48_828 --lr-warmup-samples 488 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 10000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_2b8100m100m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_2b8100m100m --load checkpoints_2b8100m100m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3322183.json --zero-stage 0 +START 3322183: Thu 16 Mar 2023 03:25:46 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 47.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 39.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 45.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 36.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 41.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 53.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 40.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +0: Launching on nid005169 (0/2), master nid005169 port 9999, GPUs 8, CUDA: True +1: Launching on nid005170 (1/2), master nid005169 port 9999, GPUs 8, CUDA: True +0: using world size: 16, data-parallel-size: 16, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... True +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 16 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3322183.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1000 +0: eval_iters ...................................... 1 +0: eval_only ....................................... None +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 10240 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 256 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 2560 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-2b8100m100m +0: kv_channels ..................................... 128 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_2b8100m100m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... 12.0 +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 48828 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 488 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 16 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... None +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 20 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 34 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... False +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. None +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_2b8100m100m +0: save_interval ................................... 10000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_2b8100m100m +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 48828 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 16 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +1: > setting tensorboard ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 15:26:27,953] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.099 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.cuda.o scaled_upper_triang_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: ninja: no work to do. +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: ninja: no work to do. +0: >>> done with compiling and loading fused kernels. Compilation time: 26.137 seconds +0: time to initialize megatron (seconds): 30.863 +0: [after megatron is initialized] datetime: 2023-03-16 15:26:54 +0: building GPT model ... +0: [2023-03-16 15:26:54,996] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 15:26:54,996] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 15:26:54,997] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.31 GB, percent = 6.2% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15} +0: [2023-03-16 15:26:55,475] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=41 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: ParallelTransformerLayerPipe +0: 22: ParallelTransformerLayerPipe +0: 23: ParallelTransformerLayerPipe +0: 24: ParallelTransformerLayerPipe +0: 25: ParallelTransformerLayerPipe +0: 26: ParallelTransformerLayerPipe +0: 27: ParallelTransformerLayerPipe +0: 28: ParallelTransformerLayerPipe +0: 29: ParallelTransformerLayerPipe +0: 30: ParallelTransformerLayerPipe +0: 31: ParallelTransformerLayerPipe +0: 32: ParallelTransformerLayerPipe +0: 33: ParallelTransformerLayerPipe +0: 34: ParallelTransformerLayerPipe +0: 35: ParallelTransformerLayerPipe +0: 36: ParallelTransformerLayerPipe +0: 37: undo +0: 38: MixedFusedLayerNorm +0: 39: EmbeddingPipe +0: 40: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 15:26:55,741] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 15:26:55,742] [INFO] [utils.py:828:see_memory_usage] MA 5.26 GB Max_MA 5.26 GB CA 5.31 GB Max_CA 5 GB +0: [2023-03-16 15:26:55,742] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.35 GB, percent = 6.2% +0: setting training iterations to 190 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 15:26:55,745] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 15:27:02,628] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 15:27:02,629] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 15:27:02,629] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 15:27:02,647] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 15:27:02,648] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 15:27:02,765] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 15:27:02,766] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.27 GB CA 5.32 GB Max_CA 5 GB +0: [2023-03-16 15:27:02,766] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.03 GB, percent = 6.4% +0: ninja: no work to do. +0: Time to load utils op: 0.14981555938720703 seconds +0: Time to load utils op: 0.20407581329345703 seconds +0: Time to load utils op: 0.2037675380706787 seconds +0: Time to load utils op: 0.2036752700805664 seconds +0: Time to load utils op: 0.20459389686584473 seconds +0: Time to load utils op: 0.20436382293701172 seconds +0: Time to load utils op: 0.20408248901367188 seconds +0: Time to load utils op: 0.0007174015045166016 seconds +1: Time to load utils op: 0.21178174018859863 seconds +1: Time to load utils op: 0.21123671531677246 secondsTime to load utils op: 0.21194171905517578 secondsTime to load utils op: 0.21151328086853027 seconds +1: +1: Time to load utils op: 0.21166419982910156 secondsTime to load utils op: 0.21152496337890625 seconds +1: +1: Time to load utils op: 0.2113497257232666 seconds +1: +1: Time to load utils op: 0.21087169647216797 seconds +0: Time to load utils op: 0.10217547416687012 seconds +0: Time to load utils op: 0.00035953521728515625 seconds +0: Time to load utils op: 0.0004405975341796875 seconds +0: Time to load utils op: 0.00041294097900390625 seconds +0: Time to load utils op: 0.00039076805114746094 seconds +0: Time to load utils op: 0.0004162788391113281 seconds +0: Time to load utils op: 0.0004038810729980469 seconds +1: Time to load utils op: 0.0010042190551757812 seconds +1: Time to load utils op: 0.001140594482421875 seconds +1: Time to load utils op: 0.0011377334594726562 seconds +1: Time to load utils op: 0.0013499259948730469 seconds +1: Time to load utils op: 0.0012950897216796875 seconds +1: Time to load utils op: 0.0012574195861816406 seconds +1: Time to load utils op: 0.0013058185577392578 seconds +1: Time to load utils op: 0.0013606548309326172 seconds +0: [2023-03-16 15:27:02,997] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 15:27:02,997] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.25 GB CA 5.32 GB Max_CA 5 GB +0: [2023-03-16 15:27:02,998] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.19 GB, percent = 6.4% +0: [2023-03-16 15:27:03,113] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 15:27:03,113] [INFO] [utils.py:828:see_memory_usage] MA 10.96 GB Max_MA 10.96 GB CA 13.73 GB Max_CA 14 GB +0: [2023-03-16 15:27:03,113] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.19 GB, percent = 6.4% +0: [2023-03-16 15:27:03,216] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 15:27:03,217] [INFO] [utils.py:828:see_memory_usage] MA 10.96 GB Max_MA 10.96 GB CA 13.73 GB Max_CA 14 GB +0: [2023-03-16 15:27:03,217] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.19 GB, percent = 6.4% +0: [2023-03-16 15:27:03,321] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 15:27:03,322] [INFO] [utils.py:828:see_memory_usage] MA 16.35 GB Max_MA 16.35 GB CA 21.67 GB Max_CA 22 GB +0: [2023-03-16 15:27:03,322] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.19 GB, percent = 6.4% +0: [2023-03-16 15:27:03,422] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 15:27:03,423] [INFO] [utils.py:828:see_memory_usage] MA 16.35 GB Max_MA 16.35 GB CA 21.67 GB Max_CA 22 GB +0: [2023-03-16 15:27:03,423] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.19 GB, percent = 6.4% +0: [2023-03-16 15:27:03,528] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 15:27:03,529] [INFO] [utils.py:828:see_memory_usage] MA 16.35 GB Max_MA 16.35 GB CA 21.67 GB Max_CA 22 GB +0: [2023-03-16 15:27:03,529] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.19 GB, percent = 6.4% +0: [2023-03-16 15:27:03,629] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 15:27:03,630] [INFO] [utils.py:828:see_memory_usage] MA 16.35 GB Max_MA 16.35 GB CA 21.67 GB Max_CA 22 GB +0: [2023-03-16 15:27:03,630] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.19 GB, percent = 6.4% +0: [2023-03-16 15:27:03,736] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 15:27:03,736] [INFO] [utils.py:828:see_memory_usage] MA 17.66 GB Max_MA 17.66 GB CA 22.98 GB Max_CA 23 GB +0: [2023-03-16 15:27:03,736] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.19 GB, percent = 6.4% +0: [2023-03-16 15:27:03,838] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 15:27:03,838] [INFO] [utils.py:828:see_memory_usage] MA 17.66 GB Max_MA 17.66 GB CA 22.98 GB Max_CA 23 GB +0: [2023-03-16 15:27:03,838] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.19 GB, percent = 6.4% +0: [2023-03-16 15:27:03,839] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 15:27:03,839] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 15:27:03,839] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 15:27:03,839] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 15:27:03,839] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 15:27:03,840] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] train_batch_size ............. 256 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 16 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] world_size ................... 16 +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 15:27:03,841] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 15:27:03,842] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 15:27:03,842] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 16, +0: "train_batch_size": 256, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0003993511199951172 seconds +0: [2023-03-16 15:27:03,842] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=16 +0: [2023-03-16 15:27:03,896] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=41 [0, 41) STAGE_PARAMS=2809026560 (2809.027M) TOTAL_PARAMS=2809026560 (2809.027M) UNIQUE_PARAMS=2809026560 (2809.027M) +0: [2023-03-16 15:27:03,898] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: WARNING: could not find the metadata file checkpoints_2b8100m100m +0: will not load any checkpoints and will start from random +0: [2023-03-16 15:27:03,898] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 15:27:03,898] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 15:27:03,898] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 15:27:03,898] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 15:27:03,898] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 15:27:03,898] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +0: [2023-03-16 15:27:03,898] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 15:27:03,899] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 15:27:03,899] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 15:27:03,899] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 15:27:03,899] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 15:27:03,899] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 15:27:03,899] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 15:27:03,899] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: [2023-03-16 15:27:03,900] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +1: time (ms) | load-checkpoint: 1.00 +0: estimated model parameters: 2.80902656 +0: estimated model parameters without embeddings: 2.67500544 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 15:27:04 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 48828 +0: validation: 256 +0: test: 256 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.006950 seconds +0: number of documents: 208931 +0: > dataset split: +0: train: +0: document indices in [0, 208931) total of 208931 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_48828ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_48828ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_48828ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.011 seconds +0: total number of samples: 97610 +0: total number of epochs: 2 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.047978 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_256ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_256ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_256ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.037 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... +0: [after dataloaders are built] datetime: 2023-03-16 15:27:15 +0: done with setup ... +0: training ... +0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: +1: time (ms) | model-and-optimizer-setup: 9261.39 | train/valid/test-data-iterators-setup: 11410.31 +0: [000-000] 2.8090B / 2.6750B +0: [before the start of training step] datetime: 2023-03-16 15:27:15 +0: [2023-03-16 15:27:16,154] [INFO] [checkpointing.py:553:forward] Activation Checkpointing Information +0: [2023-03-16 15:27:16,154] [INFO] [checkpointing.py:554:forward] ----Partition Activations False, CPU CHECKPOINTING False +0: [2023-03-16 15:27:16,154] [INFO] [checkpointing.py:557:forward] ----contiguous Memory Checkpointing False with None total layers +0: [2023-03-16 15:27:16,154] [INFO] [checkpointing.py:560:forward] ----Synchronization False +0: [2023-03-16 15:27:16,154] [INFO] [checkpointing.py:561:forward] ----Profiling time in checkpointing False +0: [Rank 0] (after 10 iterations) memory (MB) | allocated: 29754.34814453125 | max allocated: 45031.09765625 | reserved: 59296.0 | max reserved: 59296.0 +1: iteration 10/ 190 | consumed samples: 2560 | consumed tokens: 5242880 | elapsed time per iteration (s): 13.50 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 1.102291E+01 | grad norm: 3.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 18.963 | TFLOPs: 60.72 | +1: iteration 20/ 190 | consumed samples: 5120 | consumed tokens: 10485760 | elapsed time per iteration (s): 12.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 8.132835E+00 | grad norm: 2.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.603 | TFLOPs: 65.97 | +1: iteration 30/ 190 | consumed samples: 7680 | consumed tokens: 15728640 | elapsed time per iteration (s): 12.44 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 7.794791E+00 | grad norm: 1.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.579 | TFLOPs: 65.89 | +1: iteration 40/ 190 | consumed samples: 10240 | consumed tokens: 20971520 | elapsed time per iteration (s): 12.44 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 7.705334E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.575 | TFLOPs: 65.88 | +1: iteration 50/ 190 | consumed samples: 12800 | consumed tokens: 26214400 | elapsed time per iteration (s): 12.45 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 7.650288E+00 | grad norm: 0.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.566 | TFLOPs: 65.85 | +1: iteration 60/ 190 | consumed samples: 15360 | consumed tokens: 31457280 | elapsed time per iteration (s): 12.47 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 7.601243E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.534 | TFLOPs: 65.75 | +1: iteration 70/ 190 | consumed samples: 17920 | consumed tokens: 36700160 | elapsed time per iteration (s): 12.52 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 7.491938E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.441 | TFLOPs: 65.45 | +1: iteration 80/ 190 | consumed samples: 20480 | consumed tokens: 41943040 | elapsed time per iteration (s): 12.57 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 7.370172E+00 | grad norm: 2.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.365 | TFLOPs: 65.21 | +1: iteration 90/ 190 | consumed samples: 23040 | consumed tokens: 47185920 | elapsed time per iteration (s): 12.57 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 7.262464E+00 | grad norm: 0.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.363 | TFLOPs: 65.20 | +1: iteration 100/ 190 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (s): 12.59 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 7.182502E+00 | grad norm: 1.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.331 | TFLOPs: 65.10 | +1: iteration 110/ 190 | consumed samples: 28160 | consumed tokens: 57671680 | elapsed time per iteration (s): 12.59 | learning rate: 8.969E-05 | global batch size: 256 | lm loss: 7.093162E+00 | grad norm: 0.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.333 | TFLOPs: 65.11 | +1: iteration 120/ 190 | consumed samples: 30720 | consumed tokens: 62914560 | elapsed time per iteration (s): 12.60 | learning rate: 7.545E-05 | global batch size: 256 | lm loss: 7.032133E+00 | grad norm: 0.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.319 | TFLOPs: 65.06 | +1: iteration 130/ 190 | consumed samples: 33280 | consumed tokens: 68157440 | elapsed time per iteration (s): 12.60 | learning rate: 6.217E-05 | global batch size: 256 | lm loss: 6.954518E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.311 | TFLOPs: 65.04 | +1: iteration 140/ 190 | consumed samples: 35840 | consumed tokens: 73400320 | elapsed time per iteration (s): 12.61 | learning rate: 5.020E-05 | global batch size: 256 | lm loss: 6.878767E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.308 | TFLOPs: 65.03 | +1: iteration 150/ 190 | consumed samples: 38400 | consumed tokens: 78643200 | elapsed time per iteration (s): 12.61 | learning rate: 3.989E-05 | global batch size: 256 | lm loss: 6.830427E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.308 | TFLOPs: 65.03 | +1: iteration 160/ 190 | consumed samples: 40960 | consumed tokens: 83886080 | elapsed time per iteration (s): 12.62 | learning rate: 3.151E-05 | global batch size: 256 | lm loss: 6.807468E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.285 | TFLOPs: 64.95 | +1: iteration 170/ 190 | consumed samples: 43520 | consumed tokens: 89128960 | elapsed time per iteration (s): 12.63 | learning rate: 2.530E-05 | global batch size: 256 | lm loss: 6.775826E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.267 | TFLOPs: 64.90 | +1: iteration 180/ 190 | consumed samples: 46080 | consumed tokens: 94371840 | elapsed time per iteration (s): 12.62 | learning rate: 2.143E-05 | global batch size: 256 | lm loss: 6.742275E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.286 | TFLOPs: 64.96 | +1: iteration 190/ 190 | consumed samples: 48640 | consumed tokens: 99614720 | elapsed time per iteration (s): 12.62 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 6.733065E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 20.287 | TFLOPs: 64.96 | +0: [after training is done] datetime: 2023-03-16 16:07:10 +0: saving checkpoint at iteration 190 to checkpoints_2b8100m100m +1: ----------------------------------------------------------------------------------------------------------------- +1: validation loss at the end of training for val data | lm loss value: 6.680873E+00 | lm loss PPL: 7.970145E+02 | +1: ----------------------------------------------------------------------------------------------------------------- +0: [2023-03-16 16:07:13,562] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step190 is begin to save! +0: [2023-03-16 16:07:13,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 16:07:13,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 16:07:13,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 16:07:14,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 16:07:14,117] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 16:07:14,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 16:07:14,286] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 16:07:14,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 16:07:14,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 16:07:14,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 16:07:14,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 16:07:14,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 16:07:14,801] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 16:07:14,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 16:07:14,972] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 16:07:15,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 16:07:15,136] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 16:07:15,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 16:07:15,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 16:07:15,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 16:07:15,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 16:07:15,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 16:07:15,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 16:07:15,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 16:07:15,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 16:07:15,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 16:07:15,990] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 16:07:16,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 16:07:16,160] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 16:07:16,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 16:07:16,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 16:07:16,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 16:07:16,500] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 16:07:16,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 16:07:16,670] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 16:07:16,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 16:07:16,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 16:07:17,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 16:07:17,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 16:07:17,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 16:07:17,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 16:07:17,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 16:07:17,351] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 16:07:17,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 16:07:17,517] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 16:07:17,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 16:07:17,690] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 16:07:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 16:07:17,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 16:07:18,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 16:07:18,024] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 16:07:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 16:07:18,197] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 16:07:18,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 16:07:18,366] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 16:07:18,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 16:07:18,532] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 16:07:18,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 16:07:18,701] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 16:07:18,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 16:07:18,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 16:07:19,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 16:07:19,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 16:07:19,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 16:07:19,209] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 16:07:19,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 16:07:19,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 16:07:19,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 16:07:19,546] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 16:07:19,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 16:07:19,720] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 16:07:19,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 16:07:19,723] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt +0: [2023-03-16 16:07:19,723] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 16:07:19,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +1: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +0: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +1: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +1: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +0: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +1: [2023-03-16 16:07:19,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +0: [2023-03-16 16:07:22,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-16 16:07:22,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt +0: [2023-03-16 16:07:22,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +0: [2023-03-16 16:07:22,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-16 16:07:22,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt +0: [2023-03-16 16:07:22,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +1: [2023-03-16 16:07:22,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-16 16:07:22,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt +1: [2023-03-16 16:07:22,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +0: [2023-03-16 16:07:22,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-16 16:07:22,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-16 16:07:22,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt +0: [2023-03-16 16:07:22,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +1: [2023-03-16 16:07:22,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-16 16:07:22,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt +1: [2023-03-16 16:07:22,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +0: [2023-03-16 16:07:22,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-16 16:07:22,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt +0: [2023-03-16 16:07:22,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +1: [2023-03-16 16:07:23,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-16 16:07:23,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt +1: [2023-03-16 16:07:23,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +0: [2023-03-16 16:07:23,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt +0: [2023-03-16 16:07:23,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +1: [2023-03-16 16:07:23,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-16 16:07:23,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt +1: [2023-03-16 16:07:23,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +1: [2023-03-16 16:07:23,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-16 16:07:23,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt +1: [2023-03-16 16:07:23,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +1: [2023-03-16 16:07:24,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-16 16:07:24,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt +1: [2023-03-16 16:07:24,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +1: [2023-03-16 16:07:24,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-16 16:07:24,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt +1: [2023-03-16 16:07:24,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +1: [2023-03-16 16:07:24,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-16 16:07:24,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt +1: [2023-03-16 16:07:24,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +0: [2023-03-16 16:07:24,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-16 16:07:24,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt +0: [2023-03-16 16:07:24,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +0: [2023-03-16 16:07:24,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-16 16:07:24,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt +0: [2023-03-16 16:07:24,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +0: [2023-03-16 16:07:24,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-16 16:07:24,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt +0: [2023-03-16 16:07:24,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step190 is ready now! +0: successfully saved checkpoint at iteration 190 to checkpoints_2b8100m100m +END 3322183: Thu 16 Mar 2023 04:07:31 PM EET diff --git a/2b8100m100m/3324313.err b/2b8100m100m/3324313.err new file mode 100644 index 0000000000000000000000000000000000000000..28847fd3e5f41d734bb419d851e64f203b8c72b8 --- /dev/null +++ b/2b8100m100m/3324313.err @@ -0,0 +1,389 @@ +1: 2023-03-16 18:50:51.425844: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:51.425852: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:51.425852: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:51.425858: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:51.425862: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:51.425856: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:51.425855: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +1: 2023-03-16 18:50:51.425864: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:51.615391: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:51.615402: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:51.615391: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:51.615401: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:51.615405: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:51.615408: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:51.615412: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:51.615406: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +0: 2023-03-16 18:50:55.411586: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.411608: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.411640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.411652: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.411665: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.411694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.411670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.411698: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:50:55.430031: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.430057: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.430081: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.430096: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.430111: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.430125: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.430129: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:50:55.430141: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.435419: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435440: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435464: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435483: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435473: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.435489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:50:55.436248: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436277: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436275: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436288: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436302: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436315: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436327: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +1: 2023-03-16 18:50:55.436358: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +0: 2023-03-16 18:51:23.222449: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.222480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.222491: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: 2023-03-16 18:51:23.222587: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.222527: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.222529: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.222536: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.222560: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222614: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +0: 2023-03-16 18:51:23.222600: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222635: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222672: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222680: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.222686: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.224616: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: 2023-03-16 18:51:23.225643: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225642: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225646: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225646: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225649: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225662: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225664: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225668: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225670: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +1: 2023-03-16 18:51:23.225669: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +1: 2023-03-16 18:51:23.225684: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.224619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.224622: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.224621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.224624: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.224632: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.224621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.224633: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.224637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.224641: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.224640: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.224643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.224689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.224696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +0: 2023-03-16 18:51:23.224704: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: 2023-03-16 18:51:23.224711: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_upper_triang_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_upper_triang_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module scaled_masked_softmax_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module scaled_masked_softmax_cuda... +0: Successfully preprocessed all matching files. +0: Detected CUDA files, patching ldflags +0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... +0: Building extension module fused_mix_prec_layer_norm_cuda... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Loading extension module fused_mix_prec_layer_norm_cuda... +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +0: Successfully preprocessed all matching files. +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +1: warnings.warn( +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +0: warnings.warn( +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: +1: +0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... +0: Building extension module utils... +0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +0: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +1: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +0: +0: Loading extension module utils... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +0: +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: +1: +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +1: +1: Loading extension module utils... +1: No modifications detected for re-loaded extension module utils, skipping build step... +1: Loading extension module utils... +0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +0: No modifications detected for re-loaded extension module utils, skipping build step... +0: Loading extension module utils... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings +0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +0: Inconsistency detected by ld.so: dl-open.c: 881: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed! +0: Traceback (most recent call last): +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +0: main() +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +0: return f(*args, **kwargs) +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 178, in pretrain +0: timers('train/valid/test-data-iterators-setup').stop() +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/global_vars.py", line 290, in stop +0: torch.cuda.synchronize() +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/cuda/__init__.py", line 566, in synchronize +0: return torch._C._cuda_synchronize() +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler +0: _error_if_any_worker_fails() +0: RuntimeError: DataLoader worker (pid 65702) exited unexpectedly with exit code 127. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace. +0: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 62493 closing signal SIGTERM +0: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 62494 closing signal SIGTERM +0: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 62496 closing signal SIGTERM +0: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 62497 closing signal SIGTERM +0: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 62498 closing signal SIGTERM +0: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 62499 closing signal SIGTERM +0: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 62500 closing signal SIGTERM +0: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 62495) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python +0: ERROR:torch.distributed.elastic.agent.server.api:Error waiting on exit barrier. Elapsed: 305.9714357852936 seconds +0: Traceback (most recent call last): +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 906, in _exit_barrier +0: store_util.barrier( +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/utils/store.py", line 78, in barrier +0: synchronize(store, data, rank, world_size, key_prefix, barrier_timeout) +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/utils/store.py", line 64, in synchronize +0: agent_data = get_all(store, rank, key_prefix, world_size) +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/utils/store.py", line 34, in get_all +0: data = store.get(f"{prefix}{idx}") +0: RuntimeError: Socket Timeout +0: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_x07qs21o/none_bft7aw0g/attempt_0/2/error.json) +0: Traceback (most recent call last): +0: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main +0: return _run_code(code, main_globals, None, +0: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code +0: exec(code, run_globals) +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in +0: main() +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +0: return f(*args, **kwargs) +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main +0: run(args) +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run +0: elastic_launch( +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ +0: return launch_agent(self._config, self._entrypoint, list(args)) +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent +0: raise ChildFailedError( +0: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: +0: ============================================================ +0: Megatron-DeepSpeed/pretrain_gpt.py FAILED +0: ------------------------------------------------------------ +0: Failures: +0: +0: ------------------------------------------------------------ +0: Root Cause (first observed failure): +0: [0]: +0: time : 2023-03-16_18:53:56 +0: host : nid007373 +0: rank : 2 (local_rank: 2) +0: exitcode : 1 (pid: 62495) +0: error_file: /tmp/torchelastic_x07qs21o/none_bft7aw0g/attempt_0/2/error.json +0: traceback : Traceback (most recent call last): +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +0: return f(*args, **kwargs) +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 178, in pretrain +0: timers('train/valid/test-data-iterators-setup').stop() +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/global_vars.py", line 290, in stop +0: torch.cuda.synchronize() +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/cuda/__init__.py", line 566, in synchronize +0: return torch._C._cuda_synchronize() +0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler +0: _error_if_any_worker_fails() +0: RuntimeError: DataLoader worker (pid 65702) exited unexpectedly with exit code 127. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace. +0: +0: ============================================================ +srun: error: nid007373: task 0: Exited with exit code 1 +srun: launch/slurm: _step_signal: Terminating StepId=3324313.0 +srun: error: nid007374: task 1: Terminated +srun: Force Terminated StepId=3324313.0 diff --git a/2b8100m100m/3324313.out b/2b8100m100m/3324313.out new file mode 100644 index 0000000000000000000000000000000000000000..df5f35d2b094ec73dc82c38cbade643e138c88da --- /dev/null +++ b/2b8100m100m/3324313.out @@ -0,0 +1,3048 @@ +Model parameters: d_model 2560 ffw_size 10240 kv_size 128 n_heads 20 n_layers 34 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 34 --hidden-size 2560 --num-attention-heads 20 --kv-channels 128 --ffn-hidden-size 10240 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 64 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-2b8100m100mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --override-lr-scheduler --no-load-optim --reset-progress --log-interval 10 --save-interval 1000 --eval-interval 1 --eval-iters 100 --tensorboard-dir tensorboard_2b8100m100mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_2b8100m100m --load checkpoints_2b8100m100m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3324313.json --zero-stage 0 +START 3324313: Thu 16 Mar 2023 06:50:27 PM EET +0: +0: +0: ======================= ROCm System Management Interface ======================= +0: ================================= Concise Info ================================= +0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +0: 0 47.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 2 35.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 4 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: 6 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +0: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +0: ================================================================================ +0: ============================= End of ROCm SMI Log ============================== +1: +1: +1: ======================= ROCm System Management Interface ======================= +1: ================================= Concise Info ================================= +1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +1: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 2 36.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 4 47.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: 6 38.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +1: ================================================================================ +1: ============================= End of ROCm SMI Log ============================== +1: Launching on nid007374 (1/2), master nid007373 port 9999, GPUs 8, CUDA: True +0: Launching on nid007373 (0/2), master nid007373 port 9999, GPUs 8, CUDA: True +0: using world size: 16, data-parallel-size: 16, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 +0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. +0: using torch.bfloat16 for parameters ... +0: ------------------------ arguments ------------------------ +0: abort_on_unmet_fused_kernel_constraints ......... False +0: accumulate_allreduce_grads_in_fp32 .............. True +0: adam_beta1 ...................................... 0.9 +0: adam_beta2 ...................................... 0.999 +0: adam_eps ........................................ 1e-08 +0: adlr_autoresume ................................. False +0: adlr_autoresume_interval ........................ 1000 +0: apply_query_key_layer_scaling ................... True +0: apply_residual_connection_post_layernorm ........ False +0: attention_dropout ............................... 0.1 +0: attention_softmax_in_fp32 ....................... False +0: bert_binary_head ................................ True +0: bert_load ....................................... None +0: bf16 ............................................ True +0: bias_dropout_fusion ............................. True +0: bias_gelu_fusion ................................ True +0: biencoder_projection_dim ........................ 0 +0: biencoder_shared_query_context_model ............ False +0: block_data_path ................................. None +0: checkpoint_activations .......................... False +0: checkpoint_in_cpu ............................... False +0: checkpoint_num_layers ........................... 1 +0: clip_grad ....................................... 1.0 +0: codecarbon_dir .................................. None +0: consumed_train_samples .......................... 0 +0: consumed_train_tokens ........................... 0 +0: consumed_valid_samples .......................... 0 +0: contigious_checkpointing ........................ False +0: cpu_optimizer ................................... False +0: cpu_torch_adam .................................. False +0: curriculum_learning ............................. False +0: data_impl ....................................... mmap +0: data_parallel_size .............................. 16 +0: data_path ....................................... None +0: dataloader_type ................................. single +0: DDP_impl ........................................ local +0: decoder_seq_length .............................. None +0: deepscale ....................................... False +0: deepscale_config ................................ None +0: deepspeed ....................................... True +0: deepspeed_activation_checkpointing .............. False +0: deepspeed_config ................................ ds_configs/3324313.json +0: deepspeed_mpi ................................... False +0: distribute_checkpointed_activations ............. False +0: distributed_backend ............................. nccl +0: embed_layernorm ................................. False +0: embedding_path .................................. None +0: encoder_seq_length .............................. 2048 +0: eod_mask_loss ................................... False +0: eval_interval ................................... 1 +0: eval_iters ...................................... 100 +0: eval_only ....................................... None +0: evidence_data_path .............................. None +0: exit_duration_in_mins ........................... None +0: exit_interval ................................... None +0: ffn_hidden_size ................................. 10240 +0: finetune ........................................ False +0: fp16 ............................................ False +0: fp16_lm_cross_entropy ........................... False +0: fp32_residual_connection ........................ False +0: gigaflos_no_embeds .............................. 0 +0: global_batch_size ............................... 64 +0: glu_activation .................................. None +0: hidden_dropout .................................. 0.1 +0: hidden_size ..................................... 2560 +0: hysteresis ...................................... 2 +0: ict_head_size ................................... None +0: ict_load ........................................ None +0: img_dim ......................................... 224 +0: indexer_batch_size .............................. 128 +0: indexer_log_interval ............................ 1000 +0: inference ....................................... False +0: init_method_std ................................. 0.02 +0: init_method_xavier_uniform ...................... False +0: initial_loss_scale .............................. 4294967296 +0: kill_switch_path ................................ kill-switch-2b8100m100mval +0: kv_channels ..................................... 128 +0: layer_norm_fusion ............................... True +0: layernorm_epsilon ............................... 1e-05 +0: lazy_mpu_init ................................... None +0: load ............................................ checkpoints_2b8100m100m +0: local_rank ...................................... None +0: log_batch_size_to_tensorboard ................... True +0: log_interval .................................... 10 +0: log_learning_rate_to_tensorboard ................ True +0: log_level ....................................... None +0: log_level_replica ............................... None +0: log_loss_scale_to_tensorboard ................... True +0: log_num_zeros_in_grad ........................... False +0: log_params_norm ................................. False +0: log_path ........................................ None +0: log_timers_to_tensorboard ....................... True +0: log_validation_ppl_to_tensorboard ............... True +0: loss_on_targets_only ............................ False +0: loss_scale ...................................... None +0: loss_scale_window ............................... 1000 +0: lr .............................................. 0.0002 +0: lr_decay_iters .................................. None +0: lr_decay_samples ................................ 1 +0: lr_decay_style .................................. cosine +0: lr_decay_tokens ................................. None +0: lr_warmup_fraction .............................. None +0: lr_warmup_iters ................................. 0 +0: lr_warmup_samples ............................... 0 +0: make_vocab_size_divisible_by .................... 128 +0: mask_prob ....................................... 0.15 +0: masked_softmax_fusion ........................... True +0: max_position_embeddings ......................... 2048 +0: mean_noise_span_length .......................... None +0: memory_centric_tiled_linear ..................... False +0: merge_file ...................................... gpt2/merges.txt +0: micro_batch_size ................................ 4 +0: min_loss_scale .................................. 1.0 +0: min_lr .......................................... 2e-05 +0: mmap_warmup ..................................... False +0: no_load_optim ................................... True +0: no_load_rng ..................................... None +0: no_save_optim ................................... None +0: no_save_rng ..................................... None +0: noise_density ................................... None +0: num_attention_heads ............................. 20 +0: num_channels .................................... 3 +0: num_classes ..................................... 1000 +0: num_layers ...................................... 34 +0: num_layers_per_virtual_pipeline_stage ........... None +0: num_workers ..................................... 2 +0: onnx_safe ....................................... None +0: openai_gelu ..................................... False +0: optimizer ....................................... adam +0: optimizer_fusion ................................ True +0: override_lr_scheduler ........................... True +0: pad_vocab_size_to ............................... None +0: params_dtype .................................... torch.bfloat16 +0: partition_activations ........................... False +0: patch_dim ....................................... 16 +0: pipeline_model_parallel_size .................... 1 +0: position_embedding_type ......................... PositionEmbeddingType.absolute +0: pp_partition_method ............................. None +0: profile_backward ................................ False +0: query_in_block_prob ............................. 0.1 +0: rampup_batch_size ............................... None +0: rank ............................................ 0 +0: remote_device ................................... none +0: reset_attention_mask ............................ False +0: reset_position_ids .............................. False +0: reset_progress .................................. True +0: retriever_report_topk_accuracies ................ [] +0: retriever_score_scaling ......................... False +0: retriever_seq_length ............................ 256 +0: reweight_loss_based_on_position_frequency ....... False +0: sample_rate ..................................... 1.0 +0: save ............................................ checkpoints_2b8100m100m +0: save_interval ................................... 1000 +0: scatter_gather_tensors_in_pipeline .............. True +0: scattered_embeddings ............................ False +0: seed ............................................ 1234 +0: seq_length ...................................... 2048 +0: sgd_momentum .................................... 0.9 +0: short_seq_prob .................................. 0.1 +0: skip_train_iteration_range ...................... None +0: split ........................................... None +0: split_transformers .............................. False +0: sync_tp_duplicated_parameters ................... False +0: synchronize_each_layer .......................... False +0: tensor_model_parallel_size ...................... 1 +0: tensorboard_dir ................................. tensorboard_2b8100m100mval +0: tensorboard_log_interval ........................ 1 +0: tensorboard_queue_size .......................... 5 +0: test_weighted_split_paths ....................... None +0: test_weighted_split_paths_path .................. None +0: tile_factor ..................................... 1 +0: titles_data_path ................................ None +0: tokenizer_name_or_path .......................... None +0: tokenizer_type .................................. GPT2BPETokenizer +0: train_iters ..................................... None +0: train_samples ................................... 1 +0: train_tokens .................................... None +0: train_weighted_split_names ...................... ['train'] +0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] +0: train_weighted_split_paths_path ................. None +0: train_weighted_split_splits ..................... [['0:1']] +0: train_weighted_split_weights .................... [['1.0']] +0: universal_checkpoint ............................ False +0: use_bnb_optimizer ............................... False +0: use_checkpoint_lr_scheduler ..................... False +0: use_contiguous_buffers_in_ddp ................... True +0: use_cpu_initialization .......................... None +0: use_one_sent_docs ............................... False +0: use_pin_memory .................................. False +0: valid_num_workers ............................... 2 +0: valid_weighted_split_names ...................... ['validation'] +0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] +0: valid_weighted_split_paths_path ................. None +0: valid_weighted_split_splits ..................... [['0:1']] +0: valid_weighted_split_weights .................... [['1.0']] +0: virtual_pipeline_model_parallel_size ............ None +0: vocab_extra_ids ................................. 0 +0: vocab_file ...................................... gpt2/vocab.json +0: weight_decay .................................... 0.1 +0: world_size ...................................... 16 +0: zero_allgather_bucket_size ...................... 0.0 +0: zero_contigious_gradients ....................... False +0: zero_reduce_bucket_size ......................... 0.0 +0: zero_reduce_scatter ............................. False +0: zero_stage ...................................... 0 +0: -------------------- end of arguments --------------------- +0: setting number of micro-batches to constant 1 +0: > building GPT2BPETokenizer tokenizer ... +0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) +0: DeepSpeed general environment info: +0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] +0: torch version .................... 1.13.0+rocm5.2 +0: torch cuda version ............... None +0: torch hip version ................ 5.2.21151-afdc89f8 +0: nvcc version ..................... None +0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] +0: deepspeed info ................... 0.7.5, unknown, unknown +0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 +0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** +0: > initializing torch distributed ... +0: [2023-03-16 18:52:40,817] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +1: > setting tensorboard ... +0: > initializing tensor model parallel with size 1 +0: > initializing pipeline model parallel with size 1 +0: > setting random seeds to 1234 ... +0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 +0: > compiling dataset index builder ... +0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: make: Nothing to be done for 'default'. +0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' +0: >>> done with dataset index builder. Compilation time: 0.118 seconds +0: > compiling and loading fused kernels ... +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 87 +0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.cuda.o scaled_upper_triang_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 63 +0: [1/1] c++ scaled_masked_softmax_hip.cuda.o scaled_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] +0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] +0: Total number of unsupported CUDA function calls: 0 +0: +0: +0: Total number of replaced kernel launches: 67 +0: [1/1] c++ layer_norm_hip_kernel.cuda.o layer_norm_cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so +0: >>> done with compiling and loading fused kernels. Compilation time: 26.368 seconds +0: time to initialize megatron (seconds): -11.973 +0: [after megatron is initialized] datetime: 2023-03-16 18:53:08 +0: building GPT model ... +0: [2023-03-16 18:53:08,160] [INFO] [utils.py:827:see_memory_usage] Before Building Model +0: [2023-03-16 18:53:08,160] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +0: [2023-03-16 18:53:08,160] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.3 GB, percent = 6.0% +0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15} +0: [2023-03-16 18:53:08,643] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer +0: stage=0 layers=41 +0: 0: _to_float16 +0: 1: EmbeddingPipe +0: 2: +0: 3: ParallelTransformerLayerPipe +0: 4: ParallelTransformerLayerPipe +0: 5: ParallelTransformerLayerPipe +0: 6: ParallelTransformerLayerPipe +0: 7: ParallelTransformerLayerPipe +0: 8: ParallelTransformerLayerPipe +0: 9: ParallelTransformerLayerPipe +0: 10: ParallelTransformerLayerPipe +0: 11: ParallelTransformerLayerPipe +0: 12: ParallelTransformerLayerPipe +0: 13: ParallelTransformerLayerPipe +0: 14: ParallelTransformerLayerPipe +0: 15: ParallelTransformerLayerPipe +0: 16: ParallelTransformerLayerPipe +0: 17: ParallelTransformerLayerPipe +0: 18: ParallelTransformerLayerPipe +0: 19: ParallelTransformerLayerPipe +0: 20: ParallelTransformerLayerPipe +0: 21: ParallelTransformerLayerPipe +0: 22: ParallelTransformerLayerPipe +0: 23: ParallelTransformerLayerPipe +0: 24: ParallelTransformerLayerPipe +0: 25: ParallelTransformerLayerPipe +0: 26: ParallelTransformerLayerPipe +0: 27: ParallelTransformerLayerPipe +0: 28: ParallelTransformerLayerPipe +0: 29: ParallelTransformerLayerPipe +0: 30: ParallelTransformerLayerPipe +0: 31: ParallelTransformerLayerPipe +0: 32: ParallelTransformerLayerPipe +0: 33: ParallelTransformerLayerPipe +0: 34: ParallelTransformerLayerPipe +0: 35: ParallelTransformerLayerPipe +0: 36: ParallelTransformerLayerPipe +0: 37: undo +0: 38: MixedFusedLayerNorm +0: 39: EmbeddingPipe +0: 40: float16_to_fp32 +0: loss: CrossEntropy +0: [2023-03-16 18:53:08,977] [INFO] [utils.py:827:see_memory_usage] After Building Model +0: [2023-03-16 18:53:08,977] [INFO] [utils.py:828:see_memory_usage] MA 5.26 GB Max_MA 5.26 GB CA 5.31 GB Max_CA 5 GB +0: [2023-03-16 18:53:08,978] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.34 GB, percent = 6.0% +0: setting training iterations to 0 +0: > learning rate decay style: cosine +0: DeepSpeed is enabled. +0: [2023-03-16 18:53:08,981] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown +0: [2023-03-16 18:53:19,167] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +0: [2023-03-16 18:53:19,167] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +0: [2023-03-16 18:53:19,167] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer +0: [2023-03-16 18:53:19,187] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam +0: [2023-03-16 18:53:19,187] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer +0: [2023-03-16 18:53:19,308] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer +0: [2023-03-16 18:53:19,308] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.27 GB CA 5.32 GB Max_CA 5 GB +0: [2023-03-16 18:53:19,308] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.01 GB, percent = 6.2% +0: ninja: no work to do. +0: Time to load utils op: 0.12478852272033691 seconds +0: Time to load utils op: 0.24249887466430664 secondsTime to load utils op: 0.24191784858703613 seconds +0: Time to load utils op: 0.2425546646118164 seconds +0: Time to load utils op: 0.24234628677368164 seconds +0: Time to load utils op: 0.24261116981506348 seconds +0: +0: Time to load utils op: 0.24233317375183105 secondsTime to load utils op: 0.24265503883361816 seconds +0: +1: Time to load utils op: 0.23778748512268066 seconds +1: Time to load utils op: 0.23779582977294922 secondsTime to load utils op: 0.23778843879699707 secondsTime to load utils op: 0.2377941608428955 seconds +1: +1: +1: Time to load utils op: 0.2378098964691162 secondsTime to load utils op: 0.23781061172485352 seconds +1: Time to load utils op: 0.23781633377075195 seconds +1: +1: Time to load utils op: 0.2378239631652832 seconds +0: [2023-03-16 18:53:19,544] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 +0: [2023-03-16 18:53:19,545] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.25 GB CA 5.32 GB Max_CA 5 GB +0: [2023-03-16 18:53:19,545] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.02 GB, percent = 6.2% +0: Time to load utils op: 0.0006015300750732422 secondsTime to load utils op: 0.0005679130554199219 seconds +0: +0: Time to load utils op: 0.0005857944488525391 seconds +0: Time to load utils op: 0.0005743503570556641 seconds +0: Time to load utils op: 0.00055694580078125 seconds +0: Time to load utils op: 0.0004904270172119141 seconds +0: Time to load utils op: 0.00066375732421875 seconds +1: Time to load utils op: 0.0008466243743896484 seconds +1: Time to load utils op: 0.0013000965118408203 seconds +1: Time to load utils op: 0.0012829303741455078 seconds +1: Time to load utils op: 0.0011763572692871094 seconds +1: Time to load utils op: 0.0011892318725585938 seconds +1: Time to load utils op: 0.0011539459228515625 seconds +1: Time to load utils op: 0.0011832714080810547 seconds +1: Time to load utils op: 0.0012388229370117188 seconds +0: [2023-03-16 18:53:19,700] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 +0: [2023-03-16 18:53:19,701] [INFO] [utils.py:828:see_memory_usage] MA 10.96 GB Max_MA 10.96 GB CA 13.73 GB Max_CA 14 GB +0: [2023-03-16 18:53:19,701] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.18 GB, percent = 6.2% +0: [2023-03-16 18:53:19,809] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 +0: [2023-03-16 18:53:19,809] [INFO] [utils.py:828:see_memory_usage] MA 10.96 GB Max_MA 10.96 GB CA 13.73 GB Max_CA 14 GB +0: [2023-03-16 18:53:19,809] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.18 GB, percent = 6.2% +0: [2023-03-16 18:53:19,915] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 +0: [2023-03-16 18:53:19,915] [INFO] [utils.py:828:see_memory_usage] MA 16.35 GB Max_MA 16.35 GB CA 21.67 GB Max_CA 22 GB +0: [2023-03-16 18:53:19,916] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.18 GB, percent = 6.2% +0: [2023-03-16 18:53:20,018] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 +0: [2023-03-16 18:53:20,018] [INFO] [utils.py:828:see_memory_usage] MA 16.35 GB Max_MA 16.35 GB CA 21.67 GB Max_CA 22 GB +0: [2023-03-16 18:53:20,018] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.18 GB, percent = 6.2% +0: [2023-03-16 18:53:20,124] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 +0: [2023-03-16 18:53:20,124] [INFO] [utils.py:828:see_memory_usage] MA 16.35 GB Max_MA 16.35 GB CA 21.67 GB Max_CA 22 GB +0: [2023-03-16 18:53:20,125] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.18 GB, percent = 6.2% +0: [2023-03-16 18:53:20,225] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer +0: [2023-03-16 18:53:20,226] [INFO] [utils.py:828:see_memory_usage] MA 16.35 GB Max_MA 16.35 GB CA 21.67 GB Max_CA 22 GB +0: [2023-03-16 18:53:20,226] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.18 GB, percent = 6.2% +0: [2023-03-16 18:53:20,332] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer +0: [2023-03-16 18:53:20,333] [INFO] [utils.py:828:see_memory_usage] MA 17.66 GB Max_MA 17.66 GB CA 22.98 GB Max_CA 23 GB +0: [2023-03-16 18:53:20,333] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.18 GB, percent = 6.2% +0: [2023-03-16 18:53:20,434] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer +0: [2023-03-16 18:53:20,435] [INFO] [utils.py:828:see_memory_usage] MA 17.66 GB Max_MA 17.66 GB CA 22.98 GB Max_CA 23 GB +0: [2023-03-16 18:53:20,435] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.18 GB, percent = 6.2% +0: [2023-03-16 18:53:20,435] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +0: [2023-03-16 18:53:20,435] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler +0: [2023-03-16 18:53:20,435] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +0: [2023-03-16 18:53:20,435] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +0: [2023-03-16 18:53:20,436] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: +0: [2023-03-16 18:53:20,436] [INFO] [config.py:1011:print] activation_checkpointing_config { +0: "partition_activations": false, +0: "contiguous_memory_optimization": false, +0: "cpu_checkpointing": false, +0: "number_checkpoints": null, +0: "synchronize_checkpoint_boundary": false, +0: "profile": false +0: } +0: [2023-03-16 18:53:20,436] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] amp_enabled .................. False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] amp_params ................... False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] autotuning_config ............ { +0: "enabled": false, +0: "start_step": null, +0: "end_step": null, +0: "metric_path": null, +0: "arg_mappings": null, +0: "metric": "throughput", +0: "model_info": null, +0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", +0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", +0: "overwrite": true, +0: "fast": true, +0: "start_profile_step": 3, +0: "end_profile_step": 5, +0: "tuner_type": "gridsearch", +0: "tuner_early_stopping": 5, +0: "tuner_num_trials": 50, +0: "model_info_path": null, +0: "mp_size": 1, +0: "max_train_batch_size": null, +0: "min_train_batch_size": 1, +0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, +0: "min_train_micro_batch_size_per_gpu": 1, +0: "num_tuning_micro_batch_sizes": 3 +0: } +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] bfloat16_enabled ............. True +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] comms_config ................. +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] communication_data_type ...... None +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa +0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] curriculum_enabled ........... False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] curriculum_params ............ False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] dataloader_drop_last ......... False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] disable_allgather ............ False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] dump_state ................... False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] elasticity_enabled ........... False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] flops_profiler_config ........ { +0: "enabled": false, +0: "profile_step": 1, +0: "module_depth": -1, +0: "top_modules": 1, +0: "detailed": true, +0: "output_file": null +0: } +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] fp16_auto_cast ............... None +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] fp16_enabled ................. False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] global_rank .................. 0 +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 +0: [2023-03-16 18:53:20,437] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] load_universal_checkpoint .... False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] loss_scale ................... 1.0 +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] memory_breakdown ............. False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] monitor_config ............... +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] nebula_config ................ { +0: "enabled": false, +0: "persistent_storage_path": null, +0: "persistent_time_interval": 100, +0: "num_of_version_in_retention": 2, +0: "enable_nebula_load": true, +0: "load_path": null +0: } +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] optimizer_name ............... None +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] optimizer_params ............. None +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] pld_enabled .................. False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] pld_params ................... False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] prescale_gradients ........... False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] scheduler_name ............... None +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] scheduler_params ............. None +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] sparse_attention ............. None +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] steps_per_print .............. 2000 +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] train_batch_size ............. 64 +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] use_node_local_storage ....... False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] world_size ................... 16 +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] zero_enabled ................. False +0: [2023-03-16 18:53:20,438] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 +0: [2023-03-16 18:53:20,438] [INFO] [config.py:996:print_user_config] json = { +0: "train_micro_batch_size_per_gpu": 4, +0: "train_batch_size": 64, +0: "gradient_clipping": 1.0, +0: "zero_optimization": { +0: "stage": 0 +0: }, +0: "bf16": { +0: "enabled": true +0: }, +0: "steps_per_print": 2.000000e+03, +0: "wall_clock_breakdown": false +0: } +0: Time to load utils op: 0.0006129741668701172 seconds +0: [2023-03-16 18:53:20,439] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 +0: [2023-03-16 18:53:20,490] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=41 [0, 41) STAGE_PARAMS=2809026560 (2809.027M) TOTAL_PARAMS=2809026560 (2809.027M) UNIQUE_PARAMS=2809026560 (2809.027M) +0: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +0: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +1: [2023-03-16 18:53:20,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:20,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:20,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:20,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:20,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:20,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:20,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:20,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:20,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +0: [2023-03-16 18:53:20,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +1: [2023-03-16 18:53:21,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +0: [2023-03-16 18:53:21,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +1: [2023-03-16 18:53:21,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +1: [2023-03-16 18:53:21,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +0: [2023-03-16 18:53:21,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +1: [2023-03-16 18:53:21,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +0: [2023-03-16 18:53:21,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +0: [2023-03-16 18:53:21,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +1: [2023-03-16 18:53:21,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +1: [2023-03-16 18:53:21,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +0: [2023-03-16 18:53:21,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:21,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:21,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:21,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:21,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:21,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:21,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:21,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:21,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +1: [2023-03-16 18:53:22,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +0: [2023-03-16 18:53:22,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +1: [2023-03-16 18:53:22,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +0: [2023-03-16 18:53:22,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +0: [2023-03-16 18:53:22,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +1: [2023-03-16 18:53:22,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +1: [2023-03-16 18:53:22,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +0: [2023-03-16 18:53:22,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:22,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:22,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:23,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:23,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:23,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:23,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:23,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:23,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:23,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:23,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:23,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:23,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:23,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:23,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:23,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:23,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:23,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +1: [2023-03-16 18:53:23,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +0: [2023-03-16 18:53:23,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +1: [2023-03-16 18:53:23,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +0: [2023-03-16 18:53:23,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +0: [2023-03-16 18:53:23,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +1: [2023-03-16 18:53:23,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +0: [2023-03-16 18:53:23,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +1: [2023-03-16 18:53:23,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +1: [2023-03-16 18:53:23,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +0: [2023-03-16 18:53:23,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:23,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:23,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:24,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:24,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:24,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:24,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +1: [2023-03-16 18:53:24,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +0: [2023-03-16 18:53:24,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +0: [2023-03-16 18:53:24,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +1: [2023-03-16 18:53:24,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +1: [2023-03-16 18:53:24,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +0: [2023-03-16 18:53:24,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +0: [2023-03-16 18:53:24,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +1: [2023-03-16 18:53:24,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +0: [2023-03-16 18:53:24,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +1: [2023-03-16 18:53:24,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:24,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:24,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:25,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:25,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:25,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:25,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:25,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:25,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:25,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:25,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:25,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:25,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:25,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:25,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:25,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:25,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:25,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +0: [2023-03-16 18:53:25,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +1: [2023-03-16 18:53:25,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +0: [2023-03-16 18:53:25,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +1: [2023-03-16 18:53:25,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +0: [2023-03-16 18:53:25,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +1: [2023-03-16 18:53:25,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:25,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:25,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:25,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:25,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +1: [2023-03-16 18:53:25,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:25,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:25,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:25,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:25,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +0: [2023-03-16 18:53:25,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:25,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:25,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:25,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:25,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:25,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:25,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:25,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:26,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:26,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:26,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:26,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:26,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:26,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:26,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:26,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:26,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:26,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:26,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:26,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:26,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:26,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:26,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +0: [2023-03-16 18:53:26,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +1: [2023-03-16 18:53:26,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +1: [2023-03-16 18:53:26,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +0: [2023-03-16 18:53:26,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +0: [2023-03-16 18:53:26,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +1: [2023-03-16 18:53:26,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +0: [2023-03-16 18:53:26,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +1: [2023-03-16 18:53:26,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +0: [2023-03-16 18:53:26,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +1: [2023-03-16 18:53:26,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:26,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:26,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +1: [2023-03-16 18:53:26,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +0: [2023-03-16 18:53:26,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:26,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:26,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:26,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:26,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:26,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:26,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:26,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:26,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:26,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:26,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:26,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:26,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:26,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:27,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:27,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:27,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:27,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:27,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:27,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:27,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:27,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:27,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:27,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:27,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:27,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:27,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:27,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:27,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +1: [2023-03-16 18:53:27,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +0: [2023-03-16 18:53:27,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +0: [2023-03-16 18:53:27,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +1: [2023-03-16 18:53:27,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +1: [2023-03-16 18:53:27,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +0: [2023-03-16 18:53:27,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:27,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:27,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:27,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +1: [2023-03-16 18:53:27,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +0: [2023-03-16 18:53:27,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:27,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:27,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:27,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:27,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:27,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:27,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:27,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:27,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:27,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:27,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:27,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:27,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:28,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:28,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:28,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:28,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:28,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:28,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:28,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:28,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:28,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +0: [2023-03-16 18:53:28,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:28,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:28,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:28,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:28,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:28,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:28,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +1: [2023-03-16 18:53:28,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +0: [2023-03-16 18:53:28,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +1: [2023-03-16 18:53:28,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +1: [2023-03-16 18:53:28,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +0: [2023-03-16 18:53:28,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +1: [2023-03-16 18:53:28,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +0: [2023-03-16 18:53:28,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:28,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:28,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:28,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:28,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:28,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:28,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:28,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:28,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:28,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:28,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:28,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:28,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:28,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:28,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:28,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:28,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +0: [2023-03-16 18:53:28,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +1: [2023-03-16 18:53:28,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:28,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:29,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:29,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:29,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:29,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:29,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:29,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +1: [2023-03-16 18:53:29,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:29,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:29,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:29,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:29,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:29,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:29,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:29,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +0: [2023-03-16 18:53:29,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +0: [2023-03-16 18:53:29,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +1: [2023-03-16 18:53:29,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +1: [2023-03-16 18:53:29,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +0: [2023-03-16 18:53:29,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +0: [2023-03-16 18:53:29,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +1: [2023-03-16 18:53:29,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:29,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:29,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +0: [2023-03-16 18:53:29,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +1: [2023-03-16 18:53:29,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:29,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:29,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:29,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:29,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:30,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:30,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:30,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:30,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:30,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:30,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:30,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:30,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:30,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:30,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:30,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:30,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +0: [2023-03-16 18:53:30,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +1: [2023-03-16 18:53:30,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +0: [2023-03-16 18:53:30,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +1: [2023-03-16 18:53:30,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +0: [2023-03-16 18:53:30,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +1: [2023-03-16 18:53:30,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:30,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:30,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:30,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +1: [2023-03-16 18:53:30,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +0: [2023-03-16 18:53:30,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:30,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:30,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:30,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:30,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:30,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:30,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:30,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:30,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:30,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:30,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:30,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:30,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:31,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:31,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:31,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:31,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:31,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:31,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:31,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:31,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:31,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:31,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:31,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:31,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:31,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:31,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:31,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +0: [2023-03-16 18:53:31,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +1: [2023-03-16 18:53:31,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +1: [2023-03-16 18:53:31,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +0: [2023-03-16 18:53:31,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +0: [2023-03-16 18:53:31,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +1: [2023-03-16 18:53:31,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +1: [2023-03-16 18:53:31,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +0: [2023-03-16 18:53:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:31,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:31,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:31,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:31,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:31,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:31,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:31,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:31,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:31,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:31,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:31,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:31,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:31,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:31,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:31,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:32,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:32,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:32,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:32,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:32,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:32,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:32,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +1: [2023-03-16 18:53:32,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +0: [2023-03-16 18:53:32,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:32,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +1: [2023-03-16 18:53:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +0: [2023-03-16 18:53:32,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +0: [2023-03-16 18:53:32,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +1: [2023-03-16 18:53:32,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +1: [2023-03-16 18:53:32,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +0: [2023-03-16 18:53:32,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +0: [2023-03-16 18:53:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +1: [2023-03-16 18:53:32,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:32,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:32,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +0: [2023-03-16 18:53:32,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +1: [2023-03-16 18:53:32,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:32,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:32,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:32,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:32,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:32,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:32,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:32,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:32,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:32,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:32,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:32,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:32,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:32,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:33,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:33,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:33,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:33,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:33,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:33,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:33,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:33,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:33,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:33,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +0: [2023-03-16 18:53:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +1: [2023-03-16 18:53:33,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +0: [2023-03-16 18:53:33,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +1: [2023-03-16 18:53:33,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +0: [2023-03-16 18:53:33,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +1: [2023-03-16 18:53:33,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:33,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:33,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +1: [2023-03-16 18:53:33,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:33,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +0: [2023-03-16 18:53:33,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:33,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:33,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:33,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:33,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:33,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:33,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:33,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:33,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:33,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:33,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:33,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:33,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:34,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:34,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:34,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:34,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:34,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:34,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:34,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:34,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:34,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:34,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:34,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:34,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:34,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:34,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +0: [2023-03-16 18:53:34,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +1: [2023-03-16 18:53:34,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +0: [2023-03-16 18:53:34,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +1: [2023-03-16 18:53:34,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +1: [2023-03-16 18:53:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +0: [2023-03-16 18:53:34,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +1: [2023-03-16 18:53:34,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +0: [2023-03-16 18:53:34,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:34,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:34,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:34,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:34,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:34,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:34,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:34,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +0: [2023-03-16 18:53:34,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +1: [2023-03-16 18:53:34,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:34,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:35,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:35,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:35,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:35,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:35,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:35,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:35,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:35,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:35,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:35,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:35,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:35,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:35,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +0: [2023-03-16 18:53:35,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +1: [2023-03-16 18:53:35,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +0: [2023-03-16 18:53:35,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +1: [2023-03-16 18:53:35,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:35,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:35,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +0: [2023-03-16 18:53:35,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +1: [2023-03-16 18:53:35,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +1: [2023-03-16 18:53:35,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:35,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:35,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... +1: [2023-03-16 18:53:35,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +1: [2023-03-16 18:53:35,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: [2023-03-16 18:53:35,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +0: [2023-03-16 18:53:35,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +0: > overriding learning rate value to 0.0002 +0: > overriding minimum learning rate value to 2e-05 +0: > overriding warmup iterations value to 0 +0: > overriding total number of iterations value to 1 +0: > overriding decay style value to cosine +0: [2023-03-16 18:53:35,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:35,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... +0: [2023-03-16 18:53:41,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:41,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 0 +1: [2023-03-16 18:53:41,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:41,831] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 9 +0: [2023-03-16 18:53:41,877] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 0 +1: [2023-03-16 18:53:41,883] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 9 +0: could not find arguments in the checkpoint ... +0: checkpoint version 3.0 +0: [2023-03-16 18:53:42,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,172] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 5 +1: [2023-03-16 18:53:42,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:42,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 12 +0: [2023-03-16 18:53:42,228] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 5 +1: [2023-03-16 18:53:42,243] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 12 +0: [2023-03-16 18:53:42,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,268] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 7 +1: [2023-03-16 18:53:42,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:42,283] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 8 +1: [2023-03-16 18:53:42,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:42,310] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 13 +0: [2023-03-16 18:53:42,327] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 7 +1: [2023-03-16 18:53:42,340] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 8 +1: [2023-03-16 18:53:42,370] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 13 +1: [2023-03-16 18:53:42,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:42,375] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 15 +0: [2023-03-16 18:53:42,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,397] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 2 +1: [2023-03-16 18:53:42,434] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 15 +0: [2023-03-16 18:53:42,463] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 2 +1: [2023-03-16 18:53:42,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:42,481] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 10 +1: [2023-03-16 18:53:42,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:42,516] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 14 +1: [2023-03-16 18:53:42,542] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 10 +1: [2023-03-16 18:53:42,574] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 14 +0: [2023-03-16 18:53:42,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,640] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 3 +0: [2023-03-16 18:53:42,703] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 3 +0: [2023-03-16 18:53:42,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,754] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 1 +0: [2023-03-16 18:53:42,820] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 1 +0: [2023-03-16 18:53:42,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,868] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 4 +0: [2023-03-16 18:53:42,930] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 4 +0: [2023-03-16 18:53:42,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. +0: [2023-03-16 18:53:42,945] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 6 +0: [2023-03-16 18:53:43,008] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 6 +1: [2023-03-16 18:53:45,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. +1: [2023-03-16 18:53:45,421] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 11 +1: [2023-03-16 18:53:45,484] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 16 zero partition checkpoints for rank 11 +0: successfully loaded checkpoint from checkpoints_2b8100m100m at iteration 0 +1: time (ms) | load-checkpoint: 25063.14 +0: estimated model parameters: 2.80902656 +0: estimated model parameters without embeddings: 2.67500544 +0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 18:53:45 +0: > building train, validation, and test datasets ... +0: > datasets target sizes (minimum size): +0: train: 1 +0: validation: 6400 +0: test: 6400 +0: > building train, validation, and test datasets for GPT ... +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.007372 seconds +0: number of documents: 208931 +0: > dataset split: +0: train: +0: document indices in [0, 208931) total of 208931 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.010 seconds +0: total number of samples: 48805 +0: total number of epochs: 1 +0: > building dataset index ... +0: reading sizes... +0: reading pointers... +0: reading document index... +0: creating numpy buffer of mmap... +0: creating memory view of numpy buffer... +0: > finished creating indexed dataset in 0.030295 seconds +0: number of documents: 364608 +0: > dataset split: +0: validation: +0: document indices in [0, 364608) total of 364608 documents +0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_6400ns_2048sl_1234s_doc_idx.npy +0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_6400ns_2048sl_1234s_sample_idx.npy +0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_6400ns_2048sl_1234s_shuffle_idx.npy +0: loaded indexed file in 0.009 seconds +0: total number of samples: 84978 +0: total number of epochs: 1 +0: > finished creating GPT datasets ... diff --git a/2b8100m100m/3324669.err b/2b8100m100m/3324669.err new file mode 100644 index 0000000000000000000000000000000000000000..90afc1d91ee10950eebd1f64a3249751c8416032 --- /dev/null +++ b/2b8100m100m/3324669.err @@ -0,0 +1,7337 @@ + 8: 2023-03-16 19:19:52.520733: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 19:19:52.520748: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 19:19:52.520996: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 19:19:52.520775: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 19:19:52.520800: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 19:19:52.521085: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 19:19:52.520782: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 19:19:52.520822: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 19:19:52.520827: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: 2023-03-16 19:19:52.521289: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 19:19:52.521313: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 19:19:52.521108: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 19:19:52.521121: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 19:19:52.521124: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 19:19:52.520777: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 19:19:52.521330: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 19:19:52.521344: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 19:19:52.521310: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 19:19:52.521134: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 19:19:52.521140: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 19:19:52.521374: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 19:19:52.521239: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 19:19:52.521105: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 19:19:52.521115: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 19:19:52.521213: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 19:19:52.521226: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 19:19:52.521232: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: 2023-03-16 19:19:52.521042: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 19:19:52.521037: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 19:19:52.521074: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: 2023-03-16 19:19:52.521192: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 19:19:52.521201: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 19:19:52.521217: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 19:19:52.521402: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 19:19:52.521420: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 19:19:52.521439: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: 2023-03-16 19:19:52.521412: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 19:19:52.521409: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 19:19:52.521431: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 19:19:52.521089: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 19:19:52.521103: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 19:19:52.521388: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 19:19:52.521389: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 19:19:52.521408: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: 2023-03-16 19:19:52.521219: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 19:19:52.521222: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 19:19:52.521338: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 19:19:52.521119: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 19:19:52.521388: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 19:19:52.521412: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 19:19:52.521243: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 19:19:52.521251: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 19:19:52.521334: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 19:19:52.521454: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 19:19:52.521467: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 19:19:52.521474: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: 2023-03-16 19:19:52.521431: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 19:19:52.521432: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 19:19:52.521443: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: 2023-03-16 19:19:52.521745: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 19:19:52.521748: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 19:19:52.521395: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 19:19:52.521400: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 19:19:52.521621: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 19:19:52.521648: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 19:19:52.521466: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 19:19:52.521420: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 19:19:52.521478: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: 2023-03-16 19:19:52.521476: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 19:19:52.521473: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 19:19:52.521451: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 19:19:52.521575: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 19:19:52.521799: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 19:19:52.521820: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 19:19:52.521841: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 19:19:52.521855: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 19:19:52.521914: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 19:19:52.521916: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 19:19:52.521938: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 19:19:52.522022: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 19:19:52.522040: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 19:19:52.522027: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: 2023-03-16 19:19:52.521956: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 19:19:52.522049: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 19:19:52.522062: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 19:19:52.522064: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 19:19:52.522068: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 19:19:52.522076: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 19:19:52.522326: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 19:19:52.522344: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 19:19:52.522356: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 19:19:52.522382: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 19:19:52.522392: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 19:19:52.522383: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 19:19:52.522399: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 19:19:52.522382: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 19:19:52.522458: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 19:19:52.522462: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 19:19:52.522483: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 19:19:52.522502: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 19:19:52.522516: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 19:19:52.522519: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 19:19:52.522525: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 19:19:52.522539: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 19:19:52.523437: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 19:19:52.523445: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 19:19:52.523475: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 19:19:52.523292: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 19:19:52.523303: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 19:19:52.523312: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: 2023-03-16 19:19:52.523473: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 19:19:52.523488: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 19:19:52.523494: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 19:19:52.523330: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 19:19:52.523501: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 19:19:52.523515: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 19:19:52.523367: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 19:19:52.523366: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 19:19:52.523379: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 19:19:52.523361: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 19:19:52.523802: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 19:19:52.523818: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 19:19:52.523822: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 19:19:52.523876: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 19:19:52.523904: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 19:19:52.523916: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 19:19:52.523933: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 19:19:52.523938: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 19:20:04.759306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:04.759293: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 2023-03-16 19:20:04.759565: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:04.759367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:04.759375: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:04.759379: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 2023-03-16 19:20:04.759589: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:04.759348: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:04.759945: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:04.759596: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 2023-03-16 19:20:04.759397: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:04.759614: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 2023-03-16 19:20:04.759421: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:04.759969: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:04.759638: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:04.759987: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.759796: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 2023-03-16 19:20:04.759642: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:04.759995: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 19:20:04.759664: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:04.760322: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:04.760025: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 19:20:04.760037: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.759811: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 2023-03-16 19:20:04.760340: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 19:20:04.759673: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:04.760053: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 19:20:04.760089: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.760091: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 19:20:04.760354: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 19:20:04.760363: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 19:20:04.760368: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.759826: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:04.759831: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 2023-03-16 19:20:04.760385: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 19:20:04.760389: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:04.760104: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 19:20:04.760416: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.759843: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:04.759853: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:04.760120: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.759848: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:04.760123: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.759856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:04.760136: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.760140: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.760151: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 19:20:04.760156: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 19:20:04.760701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:04.760732: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:04.760755: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:04.760761: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:04.760733: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:04.760760: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:04.760771: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:04.760765: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:04.761219: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 19:20:04.761239: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 19:20:04.761245: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 19:20:04.761252: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 19:20:04.761261: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 19:20:04.761268: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 19:20:04.761273: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 19:20:04.761279: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 19:20:04.762241: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:04.762261: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:04.762514: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 19:20:04.762270: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:04.762280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:04.762530: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 19:20:04.762292: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:04.762539: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 19:20:04.762294: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:04.762551: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 19:20:04.762306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:04.762302: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:04.762559: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 19:20:04.762567: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 19:20:04.762576: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 19:20:04.762580: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 19:20:04.762472: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:04.762619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 19:20:04.762491: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762500: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762743: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 19:20:04.762615: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762516: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 19:20:04.762630: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762757: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762519: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 19:20:04.762759: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 19:20:04.762629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762766: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762526: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 19:20:04.762775: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 19:20:04.762641: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762528: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 19:20:04.762849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 19:20:04.762819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 19:20:04.762808: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 19:20:04.762632: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762776: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 19:20:04.762787: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 19:20:04.762792: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 19:20:04.762795: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:04.762796: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:04.762799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 19:20:04.762643: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 19:20:04.762851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:04.762824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 19:20:04.762815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 19:20:04.762665: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 19:20:04.762857: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:04.762790: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 19:20:04.762819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 19:20:04.763049: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 19:20:04.763051: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 19:20:04.763054: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 19:20:04.762862: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:04.763057: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 19:20:04.763057: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 19:20:04.763060: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 19:20:04.763062: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:04.762808: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 19:20:04.762818: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 19:20:04.763070: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 19:20:04.762862: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:04.762819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 19:20:04.762820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 19:20:04.762855: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:04.762933: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:04.763269: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 19:20:04.763274: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 19:20:04.763276: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:04.762823: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 19:20:04.762842: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 19:20:04.763279: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 19:20:04.763280: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.762963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:04.763286: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 19:20:04.763291: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 19:20:04.763288: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:04.762847: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 19:20:04.762871: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 19:20:04.762982: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:04.763108: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 19:20:04.763101: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 19:20:04.763104: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 19:20:04.763114: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:04.762846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 19:20:04.763117: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 19:20:04.763118: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 19:20:04.763124: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 19:20:04.763129: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.762985: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:04.763241: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.762951: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 19:20:04.763246: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 19:20:04.763249: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 19:20:04.763250: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 19:20:04.763251: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 19:20:04.763253: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:04.763254: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 19:20:04.763258: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.762986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:04.762992: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:04.763314: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 19:20:04.762996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:04.763360: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.763385: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.763377: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.763397: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.763394: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.763411: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 19:20:04.763414: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:04.763418: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 19:20:04.763338: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:04.763334: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:04.763365: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:04.763357: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:04.763360: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:04.763369: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:04.763370: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:04.763785: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 19:20:04.763798: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 19:20:04.763803: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 19:20:04.763803: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 19:20:04.763809: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 19:20:04.763813: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 19:20:04.763821: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 19:20:04.763826: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 19:20:04.766351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:04.766364: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:04.766355: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:04.766367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:04.766378: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:04.766389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:04.766374: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:04.766363: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:04.766733: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 19:20:04.766736: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 19:20:04.766742: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 19:20:04.766743: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 19:20:04.766745: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 19:20:04.766747: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 19:20:04.766748: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 19:20:04.766752: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 19:20:04.777181: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:04.777200: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:04.777211: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:04.777220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:04.777224: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:04.777234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:04.777237: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:04.777244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:04.777788: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 19:20:04.777808: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 19:20:04.777815: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 19:20:04.777826: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 19:20:04.777835: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 19:20:04.777844: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 19:20:04.777848: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 19:20:04.777860: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 19:20:04.779062: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:04.779083: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:04.779346: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 19:20:04.779095: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:04.779110: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:04.779364: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 19:20:04.779126: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:04.779112: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:04.779376: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 19:20:04.779140: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:04.779129: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:04.779390: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 19:20:04.779400: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 19:20:04.779402: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 19:20:04.779409: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 19:20:04.779416: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 19:20:04.789994: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:04.790013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:04.790028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:04.790465: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 19:20:04.790023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:04.790476: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 19:20:04.790041: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:04.790045: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:04.790488: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 19:20:04.790492: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 19:20:04.790053: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:04.790061: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:04.790509: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 19:20:04.790508: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 19:20:04.790517: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 19:20:04.790523: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 19:20:34.518394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 19:20:34.518365: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.518432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 19:20:34.518381: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: 2023-03-16 19:20:34.518631: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 19:20:34.518608: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:34.518394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.518648: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.518447: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 19:20:34.518412: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.518626: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.518660: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.518480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 19:20:34.518409: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.518637: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.518669: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.518486: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 19:20:34.518420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 19:20:34.518836: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.518656: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.518677: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.518480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 19:20:34.518422: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.518657: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.518679: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.518507: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 19:20:34.518434: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 19:20:34.518856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.518666: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.518689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.518518: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 19:20:34.518865: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.518670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.518846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.518696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.518889: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.518671: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.518887: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.518904: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.518907: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: 2023-03-16 19:20:34.518863: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.518912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: 2023-03-16 19:20:34.518880: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.518876: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.518888: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.518891: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.518901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.518919: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.524815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.524831: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.524844: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.524859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.524862: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.524869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.524878: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.524883: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.533250: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.533277: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.533296: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.533311: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.533328: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.533335: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.533351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.533346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.556580: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.556582: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:34.556690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 19:20:34.556580: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:34.556696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 19:20:34.556584: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:34.556696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 19:20:34.556589: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:34.556698: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 19:20:34.556589: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:34.556706: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.556596: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556699: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 19:20:34.556592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.556597: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 19:20:34.556601: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 19:20:34.556604: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 19:20:34.556607: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 19:20:34.556607: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.556612: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 19:20:34.556639: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 19:20:34.556653: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556703: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 19:20:34.556716: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556716: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556718: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556721: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556721: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556724: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 19:20:34.556723: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 19:20:34.557178: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.557179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.557184: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 19:20:34.557341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.557183: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.557184: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 19:20:34.557346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.557194: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.557186: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 19:20:34.557349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.557200: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 19:20:34.557201: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 19:20:34.557205: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 19:20:34.557346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 19:20:34.557205: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 19:20:34.557209: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.557243: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 19:20:34.557348: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.557357: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 19:20:34.557247: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 19:20:34.557351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 19:20:34.557257: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 19:20:34.557261: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 19:20:34.557351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.557365: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 19:20:34.557366: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 19:20:34.557369: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 19:20:34.557370: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 19:20:34.557373: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 19:20:34.557374: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 19:20:34.557398: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 19:20:34.557413: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 19:20:34.557633: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.557634: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.557640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.557641: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.557641: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.557645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.557645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.557650: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 19:20:34.557662: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 19:20:34.557662: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 19:20:34.557666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 19:20:34.557669: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 19:20:34.557669: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 19:20:34.557669: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 19:20:34.557671: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 19:20:34.557672: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 19:20:34.557928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.557931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.557931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.557933: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.558113: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: 2023-03-16 19:20:34.557935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.557937: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.557939: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: 2023-03-16 19:20:34.558113: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.557944: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 19:20:34.557944: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.557948: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 19:20:34.557950: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 19:20:34.557950: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 19:20:34.558121: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: 2023-03-16 19:20:34.557953: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 19:20:34.557955: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.557980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: 2023-03-16 19:20:34.558120: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 19:20:34.557994: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.558120: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.558122: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.558122: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.558132: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 19:20:34.558132: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 19:20:34.558137: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 19:20:34.558141: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 19:20:34.558142: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 19:20:34.558141: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 19:20:34.558143: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 19:20:34.558177: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 19:20:34.558191: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 19:20:34.558611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.558616: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.558619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.558617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.558618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.558623: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.558621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.558627: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 19:20:34.558632: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 19:20:34.558635: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 19:20:34.558629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 19:20:34.558639: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 19:20:34.558639: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 19:20:34.558639: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 19:20:34.558642: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 19:20:34.558648: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 19:20:34.571980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.572007: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.572026: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.572038: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.572052: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.572065: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.572074: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.572146: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.573294: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.573308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.573319: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.573329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.573333: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.573341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.573346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.573394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.573965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.574009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.574026: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.574046: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.574083: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.574086: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.574090: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.574109: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.574360: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.574384: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.574394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.574414: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: 2023-03-16 19:20:34.574440: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.574423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.574425: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: 2023-03-16 19:20:34.574442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.574610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: 2023-03-16 19:20:34.574443: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.574613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.574444: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.574445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.574447: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.574457: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 19:20:34.574457: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 19:20:34.574461: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.574463: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 19:20:34.574465: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 19:20:34.574466: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 19:20:34.574467: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 19:20:34.574475: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 19:20:34.574475: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: 2023-03-16 19:20:34.574764: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 19:20:34.574495: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.574788: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.574795: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.574805: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.574807: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.574814: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.574822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.574838: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.575121: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.575164: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.575152: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.575187: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.575191: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.575194: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.575206: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.575208: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.575420: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.575424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.575427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.575423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.575430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.575630: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 19:20:34.575431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.575432: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 19:20:34.575645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.575430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 19:20:34.575660: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.575442: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 19:20:34.575444: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 19:20:34.575444: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 19:20:34.575446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 19:20:34.575446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 19:20:34.575666: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 19:20:34.575449: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 19:20:34.575451: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 19:20:34.575451: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.575676: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.575680: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.575686: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.575799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: 2023-03-16 19:20:34.575662: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.575689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.575691: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.575700: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.575706: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.575714: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.575719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.575740: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576257: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576273: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 19:20:34.576269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576271: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576274: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576275: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576275: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 19:20:34.576598: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576294: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 19:20:34.576295: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 19:20:34.576297: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 19:20:34.576299: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 19:20:34.576300: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 19:20:34.576302: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 19:20:34.576596: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 19:20:34.576351: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.576601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.576615: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 19:20:34.576603: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.576624: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 19:20:34.576610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.576613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.576627: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 19:20:34.576628: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 19:20:34.576636: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 19:20:34.576637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 19:20:34.576679: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.576687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 19:20:34.576698: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 19:20:34.576708: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 19:20:34.577013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.577017: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.577017: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.577019: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.577021: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.577026: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.577029: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 19:20:34.577032: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 19:20:34.577038: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 19:20:34.577042: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 19:20:34.577043: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 19:20:34.577038: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.577045: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 19:20:34.577038: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 19:20:34.577056: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 19:20:34.577057: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 19:20:34.577344: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.577343: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.577349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.577349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.577353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.577354: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.577361: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 19:20:34.577363: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 19:20:34.577366: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 19:20:34.577367: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 19:20:34.577371: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 19:20:34.577371: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 19:20:34.577391: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.577392: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 19:20:34.577409: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 19:20:34.577410: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 19:20:34.577687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 19:20:34.577587: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.577690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.577689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.577590: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 19:20:34.577689: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.577589: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 19:20:34.577693: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.577593: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 19:20:34.577695: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.577591: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 19:20:34.577696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.577707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 19:20:34.577707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 19:20:34.577709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 19:20:34.577711: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 19:20:34.577711: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.577712: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 19:20:34.577715: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577590: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 19:20:34.577764: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 19:20:34.577781: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577599: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 19:20:34.577608: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577609: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577610: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577610: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577612: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577612: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577614: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 19:20:34.577615: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module scaled_upper_triang_masked_softmax_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module scaled_upper_triang_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module scaled_masked_softmax_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module scaled_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module fused_mix_prec_layer_norm_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module fused_mix_prec_layer_norm_cuda... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 1: Building extension module utils... + 1: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: + 2: + 2: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: + 4: + 4: + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: + 6: + 6: + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: + 7: + 7: + 7: + 7: + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: + 8: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: + 9: + 9: + 9: + 9: + 9: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: +10: +10: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: +11: +11: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: +12: +12: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: +14: +14: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +15: +15: +15: +15: +15: + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 3: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 3: Building extension module utils... + 3: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 3: Loading extension module utils... + 0: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 0: Loading extension module utils... + 1: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 6: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 5: Loading extension module utils... + 6: Loading extension module utils... + 5: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... +13: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... +13: Loading extension module utils... + 7: Loading extension module utils... +13: Loading extension module utils... + 7: Loading extension module utils... +13: Loading extension module utils... + 7: Loading extension module utils... +13: Loading extension module utils... + 8: Loading extension module utils... +13: Loading extension module utils... +13: Loading extension module utils... + 8: Loading extension module utils... + 8: Loading extension module utils... + 9: Loading extension module utils... + 8: Loading extension module utils... + 9: Loading extension module utils... + 8: Loading extension module utils... + 8: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 8: Loading extension module utils... + 9: Loading extension module utils... + 8: Loading extension module utils... +10: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +11: Loading extension module utils... +10: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +12: Loading extension module utils... +11: Loading extension module utils... +12: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... +12: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 1: + 1: + 1: Loading extension module utils...Loading extension module utils... + 1: + 1: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 1: + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Loading extension module utils...Loading extension module utils... + 1: + 1: +12: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... +12: Loading extension module utils... +12: Loading extension module utils... +12: Loading extension module utils... +12: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... + 3: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +15: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... + 0: Loading extension module utils... +13: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 3: + 3: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 3: + 3: + 3: Loading extension module utils...Loading extension module utils... + 3: + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: + 9: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Loading extension module utils... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: + 0: + 0: + 0: + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 0: + 0: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 0: + 0: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +13: +13: Loading extension module utils...Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... +13: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +13: +13: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +13: +13: +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 6: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 6: + 6: + 6: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 6: + 6: + 2: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 2: + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 2: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 6: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 6: Loading extension module utils... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 7: + 7: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 7: + 7: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 7: + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 8: + 8: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 5: + 5: Loading extension module utils...Loading extension module utils... + 5: + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 5: + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +11: +11: Loading extension module utils...Loading extension module utils... +11: +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +10: +10: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +10: +10: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +10: +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +10: +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: +14: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +14: +14: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +12: +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +14: +14: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +15: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +15: +15: Loading extension module utils...Loading extension module utils... +15: +15: No modifications detected for re-loaded extension module utils, skipping build step... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +14: +14: Loading extension module utils... +12: Loading extension module utils... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +15: +15: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... +10: Traceback (most recent call last): +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: Traceback (most recent call last): +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: Traceback (most recent call last): +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: Traceback (most recent call last): +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: main() +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: Traceback (most recent call last): +10: main() +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: Traceback (most recent call last): +10: Traceback (most recent call last): +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: main() +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: main() +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: return f(*args, **kwargs) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +10: Traceback (most recent call last): + 3: Traceback (most recent call last): + 3: Traceback (most recent call last): + 3: Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: return f(*args, **kwargs) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +10: main() + 3: Traceback (most recent call last): + 3: Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: main()main() +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: +10: return f(*args, **kwargs) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: Traceback (most recent call last): +10: +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: return f(*args, **kwargs) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +10: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: Traceback (most recent call last): +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 3: main()main()main() + 3: + 3: + 3: main()main() File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: + 3: +10: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: main() + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: main() +10: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: +10: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: return f(*args, **kwargs)return f(*args, **kwargs)return f(*args, **kwargs) +10: +10: + 3: Traceback (most recent call last): +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +10: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: main() + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: return f(*args, **kwargs) +10: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: return f(*args, **kwargs) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: return f(*args, **kwargs) return f(*args, **kwargs) + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: + 3: return f(*args, **kwargs) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +10: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +10: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step,return f(*args, **kwargs) +10: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +10: +10: +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: main() + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: args.iteration = load_checkpoint(model, optimizer, lr_scheduler)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +10: + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +10: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: +10: +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +10: + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +10: +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: args.iteration = load_checkpoint(model, optimizer, lr_scheduler)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +10: +10: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +10: +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +10: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +10: +10: args.iteration = load_checkpoint(model, optimizer, lr_scheduler)loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +10: +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: success = self._load_zero_checkpoint( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: success = self._load_zero_checkpoint( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: success = self._load_zero_checkpoint( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: success = self._load_zero_checkpoint( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: success = self._load_zero_checkpoint( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: success = self._load_zero_checkpoint( +10: success = self._load_zero_checkpoint( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: success = self._load_zero_checkpoint( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: self.optimizer.load_state_dict( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +10: self.optimizer.load_state_dict( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: self.optimizer.load_state_dict(self.optimizer.load_state_dict( +10: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: self.optimizer.load_state_dict( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: self.optimizer.load_state_dict( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: self.optimizer.load_state_dict( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: self.optimizer.load_state_dict( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: self._load_legacy_checkpoint(state_dict_list, +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: self._load_legacy_checkpoint(state_dict_list, +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: self._load_legacy_checkpoint(state_dict_list, +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: self._load_legacy_checkpoint(state_dict_list, + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +10: self._load_legacy_checkpoint(state_dict_list, + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: self._load_legacy_checkpoint(state_dict_list, +10: self._load_legacy_checkpoint(state_dict_list, File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +10: +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: success = self._load_zero_checkpoint( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: self._load_legacy_checkpoint(state_dict_list, +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: success = self._load_zero_checkpoint( +10: current_rank_sd = state_dict_list[dp_rank] + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: IndexError: list index out of range + 3: success = self._load_zero_checkpoint( +10: current_rank_sd = state_dict_list[dp_rank] + 3: success = self._load_zero_checkpoint(success = self._load_zero_checkpoint( File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: + 3: +10: IndexError: list index out of range + 3: success = self._load_zero_checkpoint( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: current_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank] +10: + 3: success = self._load_zero_checkpoint( +10: current_rank_sd = state_dict_list[dp_rank] +10: IndexErrorIndexError : : current_rank_sd = state_dict_list[dp_rank]list index out of rangecurrent_rank_sd = state_dict_list[dp_rank]list index out of rangeIndexError +10: +10: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: +10: : list index out of range +10: IndexErrorIndexError: : list index out of rangelist index out of range +10: +10: current_rank_sd = state_dict_list[dp_rank] +10: IndexError: list index out of range + 3: success = self._load_zero_checkpoint( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self.optimizer.load_state_dict(self.optimizer.load_state_dict( + 3: + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self.optimizer.load_state_dict(self.optimizer.load_state_dict( + 3: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: self._load_legacy_checkpoint(state_dict_list,self._load_legacy_checkpoint(state_dict_list,self._load_legacy_checkpoint(state_dict_list, + 3: + 3: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: self._load_legacy_checkpoint(state_dict_list, + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: current_rank_sd = state_dict_list[dp_rank] + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range + 3: IndexError: list index out of range + 3: current_rank_sd = state_dict_list[dp_rank] + 3: current_rank_sd = state_dict_list[dp_rank] + 3: current_rank_sd = state_dict_list[dp_rank]IndexError + 3: IndexError: : list index out of range current_rank_sd = state_dict_list[dp_rank] + 3: list index out of rangecurrent_rank_sd = state_dict_list[dp_rank] + 3: IndexError + 3: + 3: : list index out of range + 3: IndexErrorIndexError: : list index out of rangelist index out of range + 3: + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range +11: Traceback (most recent call last): +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +11: Traceback (most recent call last): +11: Traceback (most recent call last): +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +11: Traceback (most recent call last): +11: Traceback (most recent call last): + 6: Traceback (most recent call last): + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +11: Traceback (most recent call last): +11: Traceback (most recent call last): +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +11: Traceback (most recent call last): +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +11: main() +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: main() +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: main() +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: main() +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: main()main()main() +11: +11: + 6: main() +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: main() +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: return f(*args, **kwargs)return f(*args, **kwargs)return f(*args, **kwargs) +11: +11: +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +11: +11: +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: +11: +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: success = self._load_zero_checkpoint(success = self._load_zero_checkpoint( +11: +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: self.optimizer.load_state_dict( +11: self.optimizer.load_state_dict(self.optimizer.load_state_dict( +11: +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self.optimizer.load_state_dict( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self.optimizer.load_state_dict( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self.optimizer.load_state_dict( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self.optimizer.load_state_dict( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self.optimizer.load_state_dict( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: current_rank_sd = state_dict_list[dp_rank] +11: IndexError: list index out of range +11: current_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank] +11: +11: IndexError IndexError: current_rank_sd = state_dict_list[dp_rank] : +11: list index out of rangecurrent_rank_sd = state_dict_list[dp_rank]list index out of range +11: +11: +11: IndexError: list index out of rangeIndexError +11: : list index out of range +11: current_rank_sd = state_dict_list[dp_rank] +11: IndexError : current_rank_sd = state_dict_list[dp_rank]list index out of range +11: +11: IndexError: list index out of range +11: current_rank_sd = state_dict_list[dp_rank] +11: IndexError: list index out of range + 6: Traceback (most recent call last): + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 6: main() + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: success = self._load_zero_checkpoint( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 6: success = self._load_zero_checkpoint( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 6: self.optimizer.load_state_dict( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 6: self.optimizer.load_state_dict( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 6: current_rank_sd = state_dict_list[dp_rank] + 6: IndexError: list index out of range + 6: current_rank_sd = state_dict_list[dp_rank] + 6: IndexError: list index out of range + 7: Traceback (most recent call last): + 7: Traceback (most recent call last): + 7: Traceback (most recent call last): + 7: Traceback (most recent call last): + 7: Traceback (most recent call last): + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 7: Traceback (most recent call last): + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 7: Traceback (most recent call last): + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 7: Traceback (most recent call last): + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 7: main()main() + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: main()main() + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: main() main() + 7: main() + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: main() + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: return f(*args, **kwargs)return f(*args, **kwargs) + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: return f(*args, **kwargs) + 7: return f(*args, **kwargs) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: return f(*args, **kwargs)return f(*args, **kwargs) + 7: + 7: return f(*args, **kwargs) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 7: + 7: return f(*args, **kwargs) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 7: + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 7: + 7: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: args.iteration = load_checkpoint(model, optimizer, lr_scheduler)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: args.iteration = load_checkpoint(model, optimizer, lr_scheduler)args.iteration = load_checkpoint(model, optimizer, lr_scheduler)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 7: + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: success = self._load_zero_checkpoint( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: success = self._load_zero_checkpoint( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: success = self._load_zero_checkpoint( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: success = self._load_zero_checkpoint(success = self._load_zero_checkpoint( + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: success = self._load_zero_checkpoint(success = self._load_zero_checkpoint( + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: success = self._load_zero_checkpoint( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self._load_legacy_checkpoint(state_dict_list, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: self.optimizer.load_state_dict( + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self._load_legacy_checkpoint(state_dict_list, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: self._load_legacy_checkpoint(state_dict_list,self._load_legacy_checkpoint(state_dict_list, + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: self._load_legacy_checkpoint(state_dict_list, + 7: self._load_legacy_checkpoint(state_dict_list, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: self._load_legacy_checkpoint(state_dict_list, + 7: current_rank_sd = state_dict_list[dp_rank] File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: self._load_legacy_checkpoint(state_dict_list, + 7: + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: IndexError: list index out of range + 7: current_rank_sd = state_dict_list[dp_rank] + 7: IndexError: list index out of range + 7: current_rank_sd = state_dict_list[dp_rank] + 7: IndexErrorcurrent_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank]: + 7: + 7: list index out of rangecurrent_rank_sd = state_dict_list[dp_rank] + 7: + 7: current_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank]IndexErrorIndexError + 7: + 7: : : list index out of rangelist index out of rangeIndexError + 7: + 7: : list index out of rangeIndexErrorIndexError + 7: : : list index out of rangelist index out of range + 7: +12: Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +12: main() +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +12: Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +12: main() +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: Traceback (most recent call last): + 2: Traceback (most recent call last): + 2: Traceback (most recent call last): + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 2: Traceback (most recent call last): + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 2: Traceback (most recent call last): + 2: Traceback (most recent call last): + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 2: Traceback (most recent call last): + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +12: success = self._load_zero_checkpoint( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: main()main()main() + 2: + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: Traceback (most recent call last): + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 2: main() + 2: main() File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: main() + 2: + 2: main() File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: return f(*args, **kwargs) return f(*args, **kwargs) + 2: return f(*args, **kwargs) + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: main() + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,return f(*args, **kwargs) + 2: return f(*args, **kwargs) + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: + 2: return f(*args, **kwargs) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: return f(*args, **kwargs) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: success = self._load_zero_checkpoint( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: self.optimizer.load_state_dict( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 2: + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,return f(*args, **kwargs) + 2: + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +12: self._load_legacy_checkpoint(state_dict_list,self.optimizer.load_state_dict( +12: +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: +12: self._load_legacy_checkpoint(state_dict_list, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: current_rank_sd = state_dict_list[dp_rank] +12: IndexError: list index out of range + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +12: current_rank_sd = state_dict_list[dp_rank] +12: IndexError: list index out of range + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: success = self._load_zero_checkpoint( + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self._load_legacy_checkpoint(state_dict_list, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: self._load_legacy_checkpoint(state_dict_list, + 2: self._load_legacy_checkpoint(state_dict_list, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: self._load_legacy_checkpoint(state_dict_list, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: self._load_legacy_checkpoint(state_dict_list,self._load_legacy_checkpoint(state_dict_list, + 2: + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: self._load_legacy_checkpoint(state_dict_list, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: self._load_legacy_checkpoint(state_dict_list, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: current_rank_sd = state_dict_list[dp_rank] + 2: IndexError: list index out of range + 2: current_rank_sd = state_dict_list[dp_rank] + 2: current_rank_sd = state_dict_list[dp_rank] + 2: IndexError: IndexErrorlist index out of range: + 2: list index out of range + 2: current_rank_sd = state_dict_list[dp_rank] + 2: IndexError: list index out of range + 2: current_rank_sd = state_dict_list[dp_rank] + 2: current_rank_sd = state_dict_list[dp_rank] + 2: IndexError: IndexErrorlist index out of range + 2: : list index out of range + 2: current_rank_sd = state_dict_list[dp_rank] + 2: IndexError: list index out of rangecurrent_rank_sd = state_dict_list[dp_rank] + 2: + 2: IndexError: list index out of range + 6: Traceback (most recent call last): + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 6: main() + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: success = self._load_zero_checkpoint( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 6: self.optimizer.load_state_dict( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 6: current_rank_sd = state_dict_list[dp_rank] + 6: IndexError: list index out of range + 6: Traceback (most recent call last): + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 6: main() + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: success = self._load_zero_checkpoint( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 6: Traceback (most recent call last): + 6: Traceback (most recent call last): + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 6: main() + 6: main() + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: return f(*args, **kwargs) + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: self.optimizer.load_state_dict(Traceback (most recent call last): + 6: + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 6: main() + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: current_rank_sd = state_dict_list[dp_rank] + 6: IndexError: list index out of range + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: success = self._load_zero_checkpoint(success = self._load_zero_checkpoint( + 6: + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +12: main() +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 6: success = self._load_zero_checkpoint( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: main() +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: self.optimizer.load_state_dict( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: self.optimizer.load_state_dict( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: current_rank_sd = state_dict_list[dp_rank] +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: IndexError: list index out of range +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: current_rank_sd = state_dict_list[dp_rank] + 6: IndexError: list index out of range +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: Traceback (most recent call last): + 6: self.optimizer.load_state_dict( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 6: main() +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: current_rank_sd = state_dict_list[dp_rank] + 6: IndexError: list index out of range + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: success = self._load_zero_checkpoint( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: success = self._load_zero_checkpoint( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +12: self.optimizer.load_state_dict( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 6: success = self._load_zero_checkpoint( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: self.optimizer.load_state_dict( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: self._load_legacy_checkpoint(state_dict_list, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: self._load_legacy_checkpoint(state_dict_list, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: current_rank_sd = state_dict_list[dp_rank] +12: IndexError: list index out of range +12: current_rank_sd = state_dict_list[dp_rank] +12: IndexError: list index out of range + 6: self.optimizer.load_state_dict( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 6: current_rank_sd = state_dict_list[dp_rank] + 6: IndexError: list index out of range +12: Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +12: main() +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +12: success = self._load_zero_checkpoint( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: self.optimizer.load_state_dict( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: self._load_legacy_checkpoint(state_dict_list, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: current_rank_sd = state_dict_list[dp_rank] +12: IndexError: list index out of range +12: Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +12: main() +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +12: success = self._load_zero_checkpoint( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: self.optimizer.load_state_dict( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: self._load_legacy_checkpoint(state_dict_list, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: current_rank_sd = state_dict_list[dp_rank] +12: IndexError: list index out of range +12: Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +12: main() +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +12: Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: main() +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: success = self._load_zero_checkpoint( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +12: self.optimizer.load_state_dict( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: success = self._load_zero_checkpoint( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: self._load_legacy_checkpoint(state_dict_list, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: current_rank_sd = state_dict_list[dp_rank] +12: IndexError: list index out of range +12: self.optimizer.load_state_dict( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: self._load_legacy_checkpoint(state_dict_list, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: current_rank_sd = state_dict_list[dp_rank] +12: IndexError: list index out of range +14: Traceback (most recent call last): +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +14: main() +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +14: return f(*args, **kwargs) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +14: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: success = self._load_zero_checkpoint( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: Traceback (most recent call last): +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +14: self.optimizer.load_state_dict(main() +14: +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +14: return f(*args, **kwargs) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +14: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: success = self._load_zero_checkpoint( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: self.optimizer.load_state_dict( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +14: self._load_legacy_checkpoint(state_dict_list, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +14: self._load_legacy_checkpoint(state_dict_list, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +14: current_rank_sd = state_dict_list[dp_rank] +14: IndexError: list index out of range +14: current_rank_sd = state_dict_list[dp_rank] +14: IndexError: list index out of range +14: Traceback (most recent call last): +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +14: main() +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +14: return f(*args, **kwargs) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +14: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: success = self._load_zero_checkpoint( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: self.optimizer.load_state_dict( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +14: self._load_legacy_checkpoint(state_dict_list, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +14: current_rank_sd = state_dict_list[dp_rank] +14: IndexError: list index out of range +13: Traceback (most recent call last): +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: Traceback (most recent call last): +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: Traceback (most recent call last): +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: Traceback (most recent call last): +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: Traceback (most recent call last): +13: Traceback (most recent call last): +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: Traceback (most recent call last): +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: Traceback (most recent call last): +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: main() +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: main() +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: main() +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: main() +13: main()main() File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: main() +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: main() +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: return f(*args, **kwargs) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: return f(*args, **kwargs) +13: return f(*args, **kwargs) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: +13: return f(*args, **kwargs) +13: return f(*args, **kwargs) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: return f(*args, **kwargs) +14: Traceback (most recent call last): +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: return f(*args, **kwargs) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: return f(*args, **kwargs) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +13: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +13: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +13: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: main() +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: Traceback (most recent call last): +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +13: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +13: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +13: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: return f(*args, **kwargs) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: main() +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +13: +13: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: return f(*args, **kwargs) +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: success = self._load_zero_checkpoint( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: success = self._load_zero_checkpoint( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: self.optimizer.load_state_dict( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: self._load_legacy_checkpoint(state_dict_list, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +14: current_rank_sd = state_dict_list[dp_rank]self.optimizer.load_state_dict( +14: +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +14: IndexError: list index out of range +13: success = self._load_zero_checkpoint( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: self._load_legacy_checkpoint(state_dict_list, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +14: current_rank_sd = state_dict_list[dp_rank] +14: IndexError: list index out of range +13: success = self._load_zero_checkpoint( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: success = self._load_zero_checkpoint(success = self._load_zero_checkpoint( +13: +13: success = self._load_zero_checkpoint(success = self._load_zero_checkpoint(success = self._load_zero_checkpoint( File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: success = self._load_zero_checkpoint( +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: self.optimizer.load_state_dict( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: self.optimizer.load_state_dict( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: self._load_legacy_checkpoint(state_dict_list, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: self._load_legacy_checkpoint(state_dict_list, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: current_rank_sd = state_dict_list[dp_rank] +13: IndexError: list index out of range +13: self.optimizer.load_state_dict( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: self.optimizer.load_state_dict( +13: self.optimizer.load_state_dict( File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: self.optimizer.load_state_dict( self.optimizer.load_state_dict( +13: self.optimizer.load_state_dict( +13: +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: current_rank_sd = state_dict_list[dp_rank] +13: IndexError: list index out of range +13: self._load_legacy_checkpoint(state_dict_list, +13: self._load_legacy_checkpoint(state_dict_list, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: self._load_legacy_checkpoint(state_dict_list, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: self._load_legacy_checkpoint(state_dict_list, +13: self._load_legacy_checkpoint(state_dict_list, File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: +13: self._load_legacy_checkpoint(state_dict_list, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: current_rank_sd = state_dict_list[dp_rank] +13: current_rank_sd = state_dict_list[dp_rank] +13: IndexErrorIndexError: : list index out of rangelist index out of range +13: +13: current_rank_sd = state_dict_list[dp_rank] +13: IndexError: list index out of range +13: current_rank_sd = state_dict_list[dp_rank] +13: current_rank_sd = state_dict_list[dp_rank] +13: IndexError : current_rank_sd = state_dict_list[dp_rank]list index out of range +13: IndexError +13: : list index out of range +13: IndexError: list index out of range +14: Traceback (most recent call last): +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +14: main() +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +14: return f(*args, **kwargs) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +14: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: success = self._load_zero_checkpoint( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: Traceback (most recent call last): +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +14: self.optimizer.load_state_dict( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +14: main() +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +14: self._load_legacy_checkpoint(state_dict_list, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +14: current_rank_sd = state_dict_list[dp_rank] +14: IndexError : return f(*args, **kwargs)list index out of range +14: +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +14: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: success = self._load_zero_checkpoint( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: self.optimizer.load_state_dict( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +14: self._load_legacy_checkpoint(state_dict_list, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +14: current_rank_sd = state_dict_list[dp_rank] +14: IndexError: list index out of range + 8: Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 8: Traceback (most recent call last): + 8: Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 8: Traceback (most recent call last): + 8: Traceback (most recent call last): + 8: Traceback (most recent call last): + 8: Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 8: Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 8: main() + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: main()main() + 8: + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: main()main() + 8: + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: main()main() + 8: + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: main() + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: return f(*args, **kwargs)return f(*args, **kwargs) + 8: + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: return f(*args, **kwargs)return f(*args, **kwargs) + 8: + 8: return f(*args, **kwargs) + 8: return f(*args, **kwargs) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: return f(*args, **kwargs) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: Traceback (most recent call last): +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: main() +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +14: return f(*args, **kwargs) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +14: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: success = self._load_zero_checkpoint( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: self.optimizer.load_state_dict( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: self._load_legacy_checkpoint(state_dict_list, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: current_rank_sd = state_dict_list[dp_rank] + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +14: IndexError: list index out of range + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self.optimizer.load_state_dict(self.optimizer.load_state_dict( + 8: + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: self._load_legacy_checkpoint(state_dict_list, + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: self._load_legacy_checkpoint(state_dict_list, + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: current_rank_sd = state_dict_list[dp_rank] + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: IndexError: list index out of range + 8: current_rank_sd = state_dict_list[dp_rank] + 8: current_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank] + 8: IndexError + 8: : list index out of range + 8: IndexErrorIndexError: : list index out of rangelist index out of range + 8: +15: Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +15: Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +15: Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +15: Traceback (most recent call last): +15: Traceback (most recent call last): +15: Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +15: Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +15: main() +15: main() File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: main()main() +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: main()main() +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: main() +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs)return f(*args, **kwargs) +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: return f(*args, **kwargs)return f(*args, **kwargs) +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: return f(*args, **kwargs)return f(*args, **kwargs) +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: main() +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: success = self._load_zero_checkpoint( +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: success = self._load_zero_checkpoint( +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self.optimizer.load_state_dict(self.optimizer.load_state_dict( +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: self._load_legacy_checkpoint(state_dict_list,self._load_legacy_checkpoint(state_dict_list, +15: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: self._load_legacy_checkpoint(state_dict_list, +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range +15: current_rank_sd = state_dict_list[dp_rank] +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexErrorIndexError: : list index out of rangelist index out of range +15: +15: current_rank_sd = state_dict_list[dp_rank] +15: current_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank]IndexError +15: +15: +15: : list index out of range +15: IndexErrorIndexErrorIndexError: : : list index out of rangelist index out of rangelist index out of range +15: +15: +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range + 9: Traceback (most recent call last): + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 9: Traceback (most recent call last): + 9: Traceback (most recent call last): + 9: Traceback (most recent call last): + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 9: Traceback (most recent call last): + 9: Traceback (most recent call last): + 9: Traceback (most recent call last): + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 9: main() + 9: Traceback (most recent call last): + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 9: main() + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: main()main()main() + 9: + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: main()main() + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: return f(*args, **kwargs) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: main() + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: return f(*args, **kwargs) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: return f(*args, **kwargs)return f(*args, **kwargs)return f(*args, **kwargs) + 9: + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: return f(*args, **kwargs) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: return f(*args, **kwargs) + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: return f(*args, **kwargs) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 9: + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 9: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: success = self._load_zero_checkpoint( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)self.optimizer.load_state_dict( + 9: + 9: + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: success = self._load_zero_checkpoint( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: self._load_legacy_checkpoint(state_dict_list, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: current_rank_sd = state_dict_list[dp_rank] + 9: IndexError: list index out of range + 9: success = self._load_zero_checkpoint( success = self._load_zero_checkpoint(success = self._load_zero_checkpoint( + 9: success = self._load_zero_checkpoint( + 9: + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: success = self._load_zero_checkpoint( File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: success = self._load_zero_checkpoint( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: self.optimizer.load_state_dict( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: self._load_legacy_checkpoint(state_dict_list, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: current_rank_sd = state_dict_list[dp_rank] + 9: IndexErrorself.optimizer.load_state_dict(: + 9: list index out of range + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: self.optimizer.load_state_dict( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: self.optimizer.load_state_dict( + 9: self.optimizer.load_state_dict( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: self.optimizer.load_state_dict( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: self.optimizer.load_state_dict( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: self._load_legacy_checkpoint(state_dict_list,self._load_legacy_checkpoint(state_dict_list, + 9: + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: self._load_legacy_checkpoint(state_dict_list, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: self._load_legacy_checkpoint(state_dict_list, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: self._load_legacy_checkpoint(state_dict_list, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: self._load_legacy_checkpoint(state_dict_list, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: current_rank_sd = state_dict_list[dp_rank] + 9: current_rank_sd = state_dict_list[dp_rank]IndexError + 9: : list index out of range + 9: IndexError: list index out of rangecurrent_rank_sd = state_dict_list[dp_rank] + 9: + 9: current_rank_sd = state_dict_list[dp_rank] + 9: IndexError: IndexErrorlist index out of range + 9: : list index out of range + 9: current_rank_sd = state_dict_list[dp_rank] + 9: current_rank_sd = state_dict_list[dp_rank] + 9: IndexError: list index out of range + 9: IndexError: list index out of range + 4: Traceback (most recent call last): + 4: Traceback (most recent call last): + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 4: Traceback (most recent call last): + 4: Traceback (most recent call last): + 4: Traceback (most recent call last): + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 4: main()main() + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: main()main()main() + 4: + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: Traceback (most recent call last): + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 4: return f(*args, **kwargs)return f(*args, **kwargs) + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 4: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 4: + 4: return f(*args, **kwargs) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 4: return f(*args, **kwargs) File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 4: return f(*args, **kwargs) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 4: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 4: main() + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 4: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 4: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 4: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 4: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 4: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 4: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 4: return f(*args, **kwargs) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 4: args.iteration = load_checkpoint(model, optimizer, lr_scheduler)args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 4: Traceback (most recent call last): + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 4: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 4: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 4: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states)loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 4: Traceback (most recent call last): + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 4: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 4: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 4: main() + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 4: main() + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: return f(*args, **kwargs) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 4: success = self._load_zero_checkpoint( + 4: success = self._load_zero_checkpoint( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 4: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 4: return f(*args, **kwargs) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 4: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 4: success = self._load_zero_checkpoint( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 4: success = self._load_zero_checkpoint( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 4: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 4: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 4: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 4: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 4: success = self._load_zero_checkpoint( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 4: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 4: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 4: self.optimizer.load_state_dict(self.optimizer.load_state_dict( + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 4: success = self._load_zero_checkpoint( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 4: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 4: self._load_legacy_checkpoint(state_dict_list, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 4: self._load_legacy_checkpoint(state_dict_list, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 4: self.optimizer.load_state_dict( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 4: self.optimizer.load_state_dict( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 4: current_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank] + 4: + 4: IndexErrorIndexError: : list index out of rangelist index out of range + 4: + 4: self._load_legacy_checkpoint(state_dict_list, + 4: self._load_legacy_checkpoint(state_dict_list, File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 4: + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 4: current_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank] + 4: + 4: success = self._load_zero_checkpoint(IndexError + 4: IndexError: : list index out of rangelist index out of range File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 4: + 4: + 4: self.optimizer.load_state_dict( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 4: self.optimizer.load_state_dict( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 4: success = self._load_zero_checkpoint( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 4: self._load_legacy_checkpoint(state_dict_list, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 4: self._load_legacy_checkpoint(state_dict_list, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 4: current_rank_sd = state_dict_list[dp_rank] + 4: IndexError: list index out of range + 4: current_rank_sd = state_dict_list[dp_rank] + 4: IndexError: list index out of range + 4: self.optimizer.load_state_dict( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 4: self.optimizer.load_state_dict( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 4: self._load_legacy_checkpoint(state_dict_list, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 4: self._load_legacy_checkpoint(state_dict_list, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 4: current_rank_sd = state_dict_list[dp_rank] + 4: IndexError: list index out of range + 4: current_rank_sd = state_dict_list[dp_rank] + 4: IndexError: list index out of range + 5: Traceback (most recent call last): + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 5: Traceback (most recent call last): + 5: Traceback (most recent call last): + 5: Traceback (most recent call last): + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 5: Traceback (most recent call last): + 5: Traceback (most recent call last): + 5: Traceback (most recent call last): + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 5: Traceback (most recent call last): + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 5: main() + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: main() + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: main()main() + 5: + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: main()main()main() + 5: + 5: + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: main() + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: return f(*args, **kwargs) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: return f(*args, **kwargs) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: return f(*args, **kwargs)return f(*args, **kwargs) + 5: + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: return f(*args, **kwargs)return f(*args, **kwargs)return f(*args, **kwargs) + 5: + 5: + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 5: + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 5: + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: return f(*args, **kwargs) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 5: + 5: + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: success = self._load_zero_checkpoint( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: success = self._load_zero_checkpoint( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: success = self._load_zero_checkpoint( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: self.optimizer.load_state_dict( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: success = self._load_zero_checkpoint( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: success = self._load_zero_checkpoint( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: success = self._load_zero_checkpoint( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: success = self._load_zero_checkpoint( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: success = self._load_zero_checkpoint( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: self.optimizer.load_state_dict( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: self.optimizer.load_state_dict( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: self._load_legacy_checkpoint(state_dict_list, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: self._load_legacy_checkpoint(state_dict_list, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: self._load_legacy_checkpoint(state_dict_list, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: current_rank_sd = state_dict_list[dp_rank] + 5: IndexError: list index out of range + 5: current_rank_sd = state_dict_list[dp_rank] + 5: IndexError: list index out of range + 5: current_rank_sd = state_dict_list[dp_rank] + 5: IndexError: list index out of range + 5: self.optimizer.load_state_dict(self.optimizer.load_state_dict( + 5: + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: self.optimizer.load_state_dict( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: self.optimizer.load_state_dict( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: self.optimizer.load_state_dict( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: self._load_legacy_checkpoint(state_dict_list,self._load_legacy_checkpoint(state_dict_list, + 5: + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: self._load_legacy_checkpoint(state_dict_list, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: self._load_legacy_checkpoint(state_dict_list, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: self._load_legacy_checkpoint(state_dict_list, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: current_rank_sd = state_dict_list[dp_rank]current_rank_sd = state_dict_list[dp_rank] + 5: + 5: IndexErrorIndexError: : list index out of rangelist index out of range + 5: + 5: current_rank_sd = state_dict_list[dp_rank] + 5: current_rank_sd = state_dict_list[dp_rank] + 5: IndexError: list index out of range + 5: IndexError: list index out of range + 5: current_rank_sd = state_dict_list[dp_rank] + 5: IndexError: list index out of range +10: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 113191 closing signal SIGTERM +10: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 113192 closing signal SIGTERM +10: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 113193 closing signal SIGTERM +10: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 113194 closing signal SIGTERM +10: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 113195 closing signal SIGTERM +10: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 113197 closing signal SIGTERM + 9: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 57026 closing signal SIGTERM + 9: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 57027 closing signal SIGTERM + 9: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 57031 closing signal SIGTERM + 9: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 57032 closing signal SIGTERM + 9: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 57033 closing signal SIGTERM +14: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 129059 closing signal SIGTERM +14: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 129060 closing signal SIGTERM +14: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 129061 closing signal SIGTERM +14: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 129062 closing signal SIGTERM +14: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 129063 closing signal SIGTERM +13: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 74701 closing signal SIGTERM +13: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 74702 closing signal SIGTERM +14: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 129064 closing signal SIGTERM +13: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 74704 closing signal SIGTERM +14: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 129065 closing signal SIGTERM +13: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 74705 closing signal SIGTERM +13: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 74706 closing signal SIGTERM +13: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 74707 closing signal SIGTERM + 7: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 63342 closing signal SIGTERM + 7: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 63344 closing signal SIGTERM + 7: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 63346 closing signal SIGTERM + 7: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 63347 closing signal SIGTERM + 4: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 66304 closing signal SIGTERM + 4: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 66305 closing signal SIGTERM + 4: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 66307 closing signal SIGTERM + 4: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 66308 closing signal SIGTERM + 4: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 66309 closing signal SIGTERM +12: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 76486 closing signal SIGTERM +12: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 76488 closing signal SIGTERM + 4: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 66310 closing signal SIGTERM + 4: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 66311 closing signal SIGTERM +12: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 76489 closing signal SIGTERM +12: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 76490 closing signal SIGTERM +12: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 76491 closing signal SIGTERM +12: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 76493 closing signal SIGTERM +11: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 115129 closing signal SIGTERM +11: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 115131 closing signal SIGTERM +11: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 115135 closing signal SIGTERM +11: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 115136 closing signal SIGTERM + 6: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61442 closing signal SIGTERM + 6: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61443 closing signal SIGTERM + 6: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61444 closing signal SIGTERM + 6: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61445 closing signal SIGTERM + 6: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61447 closing signal SIGTERM + 6: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61448 closing signal SIGTERM + 2: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 83969 closing signal SIGTERM + 2: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 83970 closing signal SIGTERM + 2: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 83973 closing signal SIGTERM + 2: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 83982 closing signal SIGTERM + 5: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61059 closing signal SIGTERM + 5: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61060 closing signal SIGTERM + 5: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61062 closing signal SIGTERM + 5: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61063 closing signal SIGTERM + 5: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61065 closing signal SIGTERM + 5: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61066 closing signal SIGTERM +13: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 74700) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 7: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 63343) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python +10: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 113190) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 9: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 57028) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python +14: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 7 (pid: 129066) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 4: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 66306) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python +11: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 115130) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 2: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 83971) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 5: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 61061) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 6: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 4 (pid: 61446) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python +12: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 76487) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python +15: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 119556) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 3: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 67278) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 8: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 54235) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 0: Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 0: main() + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 1: main() + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 0: main() + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 0: main() + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 1: main() + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 1: main() + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 1: main() + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 1: main() + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 1: main() + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: main() + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 0: main() + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 0: main() + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 0: main() + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 0: main() + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 0: main() + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 235, in + 1: main() + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 5 (pid: 71776) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python + 1: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 50849) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python +14: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_rkabsjz5/none_hqjrzdx7/attempt_0/7/error.json) +14: Traceback (most recent call last): +14: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main +14: return _run_code(code, main_globals, None, +14: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code +14: exec(code, run_globals) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 4: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_0g2c9g02/none_aa0m5pvh/attempt_0/2/error.json) + 4: Traceback (most recent call last): + 4: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 4: return _run_code(code, main_globals, None, + 4: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 4: exec(code, run_globals) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in +10: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_i9drix3s/none_ltsgq1vj/attempt_0/0/error.json) + 5: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_zl4l0ptg/none_fvx9qwzu/attempt_0/2/error.json) +10: Traceback (most recent call last): +10: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 5: Traceback (most recent call last): + 5: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main +14: main() +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_5d7fa605/none_9t4kvh7p/attempt_0/1/error.json) +10: return _run_code(code, main_globals, None, +10: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 5: return _run_code(code, main_globals, None, +13: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_imbn6oxf/none_qm5eerir/attempt_0/0/error.json) +12: Traceback (most recent call last): +12: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main +10: exec(code, run_globals) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 5: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code +13: Traceback (most recent call last): +13: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main +14: return f(*args, **kwargs) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main +12: return _run_code(code, main_globals, None, +12: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 5: exec(code, run_globals) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in +13: return _run_code(code, main_globals, None, +13: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code +12: exec(code, run_globals) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in +13: exec(code, run_globals) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 4: main() + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +14: run(args) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 4: return f(*args, **kwargs) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main +14: elastic_launch( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ +10: main() +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_m82gyhsh/none_80p3hzqk/attempt_0/4/error.json) +13: main() +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: run(args) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 6: Traceback (most recent call last): + 6: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main +10: return f(*args, **kwargs) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 6: return _run_code(code, main_globals, None, + 6: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 6: exec(code, run_globals) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in +13: return f(*args, **kwargs) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 4: elastic_launch( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ +14: return launch_agent(self._config, self._entrypoint, list(args)) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent +12: main() +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main +10: run(args) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run +14: raise ChildFailedError( + 5: main() + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +14: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: +14: ============================================================ +14: Megatron-DeepSpeed/pretrain_gpt.py FAILED +14: ------------------------------------------------------------ +14: Failures: +14: +14: ------------------------------------------------------------ +14: Root Cause (first observed failure): +14: [0]: +14: time : 2023-03-16_19:22:59 +14: host : nid005298 +14: rank : 119 (local_rank: 7) +14: exitcode : 1 (pid: 129066) +14: error_file: /tmp/torchelastic_rkabsjz5/none_hqjrzdx7/attempt_0/7/error.json +14: traceback : Traceback (most recent call last): +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +14: return f(*args, **kwargs) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +14: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +14: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +14: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +14: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +14: success = self._load_zero_checkpoint( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +14: self.optimizer.load_state_dict( +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +14: self._load_legacy_checkpoint(state_dict_list, +14: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +14: current_rank_sd = state_dict_list[dp_rank] +14: IndexError: list index out of range +14: +14: ============================================================ + 4: return launch_agent(self._config, self._entrypoint, list(args)) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent +13: run(args) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run +10: elastic_launch( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 5: return f(*args, **kwargs) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main +13: elastic_launch( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 4: raise ChildFailedError( + 6: main() + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: run(args) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 4: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 4: ============================================================ + 4: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 4: ------------------------------------------------------------ + 4: Failures: + 4: + 4: ------------------------------------------------------------ + 4: Root Cause (first observed failure): + 4: [0]: + 4: time : 2023-03-16_19:22:59 + 4: host : nid005288 + 4: rank : 34 (local_rank: 2) + 4: exitcode : 1 (pid: 66306) + 4: error_file: /tmp/torchelastic_0g2c9g02/none_aa0m5pvh/attempt_0/2/error.json + 4: traceback : Traceback (most recent call last): + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 4: return f(*args, **kwargs) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 4: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 4: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 4: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 4: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 4: success = self._load_zero_checkpoint( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 4: self.optimizer.load_state_dict( + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 4: self._load_legacy_checkpoint(state_dict_list, + 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 4: current_rank_sd = state_dict_list[dp_rank] + 4: IndexError: list index out of range + 4: + 4: ============================================================ +10: return launch_agent(self._config, self._entrypoint, list(args)) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 5: run(args) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run +13: return launch_agent(self._config, self._entrypoint, list(args)) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent +12: elastic_launch( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 5: elastic_launch( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ +13: raise ChildFailedError( +10: raise ChildFailedError( +13: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: +13: ============================================================ +13: Megatron-DeepSpeed/pretrain_gpt.py FAILED +13: ------------------------------------------------------------ +13: Failures: +13: [1]: +13: time : 2023-03-16_19:22:59 +13: host : nid005297 +13: rank : 107 (local_rank: 3) +13: exitcode : 1 (pid: 74703) +13: error_file: /tmp/torchelastic_imbn6oxf/none_qm5eerir/attempt_0/3/error.json +13: traceback : Traceback (most recent call last): +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: return f(*args, **kwargs) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: return launch_agent(self._config, self._entrypoint, list(args)) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent +10: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: +10: ============================================================ +10: Megatron-DeepSpeed/pretrain_gpt.py FAILED +10: ------------------------------------------------------------ +10: Failures: +10: [1]: +10: time : 2023-03-16_19:22:58 +10: host : nid005294 +10: rank : 86 (local_rank: 6) +10: exitcode : 1 (pid: 113196) +10: error_file: /tmp/torchelastic_i9drix3s/none_ltsgq1vj/attempt_0/6/error.json +10: traceback : Traceback (most recent call last): +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: return f(*args, **kwargs) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +10: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: run(args) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +13: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +13: success = self._load_zero_checkpoint( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +10: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +10: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: success = self._load_zero_checkpoint( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: self.optimizer.load_state_dict( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: self._load_legacy_checkpoint(state_dict_list, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: current_rank_sd = state_dict_list[dp_rank] +13: IndexError: list index out of range +13: +13: ------------------------------------------------------------ +13: Root Cause (first observed failure): +13: [0]: +13: time : 2023-03-16_19:22:59 +13: host : nid005297 +13: rank : 104 (local_rank: 0) +13: exitcode : 1 (pid: 74700) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: self.optimizer.load_state_dict( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +10: self._load_legacy_checkpoint(state_dict_list, +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +10: current_rank_sd = state_dict_list[dp_rank] +10: IndexError: list index out of range +10: +10: ------------------------------------------------------------ +10: Root Cause (first observed failure): +10: [0]: +10: time : 2023-03-16_19:22:58 +10: host : nid005294 +10: rank : 80 (local_rank: 0) +10: exitcode : 1 (pid: 113190) +10: error_file: /tmp/torchelastic_i9drix3s/none_ltsgq1vj/attempt_0/0/error.json +10: traceback : Traceback (most recent call last): +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +10: return f(*args, **kwargs) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +10: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +10: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_2tl40wpq/none_q6xt2_mo/attempt_0/1/error.json) + 7: Traceback (most recent call last): + 7: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main +10: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +13: error_file: /tmp/torchelastic_imbn6oxf/none_qm5eerir/attempt_0/0/error.json +13: traceback : Traceback (most recent call last): +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +13: return f(*args, **kwargs) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +13: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +13: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +12: raise ChildFailedError( + 7: return _run_code(code, main_globals, None, + 7: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code +10: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +10: success = self._load_zero_checkpoint( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +10: self.optimizer.load_state_dict( +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +10: self._load_legacy_checkpoint(state_dict_list, +10: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +10: current_rank_sd = state_dict_list[dp_rank] + 6: elastic_launch( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 5: return launch_agent(self._config, self._entrypoint, list(args)) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent +13: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: +12: ============================================================ +12: Megatron-DeepSpeed/pretrain_gpt.py FAILED +12: ------------------------------------------------------------ +12: Failures: +12: [1]: +12: time : 2023-03-16_19:22:59 +12: host : nid005296 +12: rank : 102 (local_rank: 6) +12: exitcode : 1 (pid: 76492) +12: error_file: /tmp/torchelastic_5d7fa605/none_9t4kvh7p/attempt_0/6/error.json +12: traceback : Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 7: exec(code, run_globals) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in +10: IndexError: list index out of range +10: +10: ============================================================ +13: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +13: success = self._load_zero_checkpoint( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +13: self.optimizer.load_state_dict( +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +13: self._load_legacy_checkpoint(state_dict_list, +13: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +13: current_rank_sd = state_dict_list[dp_rank] +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +12: success = self._load_zero_checkpoint( +13: IndexError: list index out of range +13: +13: ============================================================ +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: self.optimizer.load_state_dict( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: self._load_legacy_checkpoint(state_dict_list, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: current_rank_sd = state_dict_list[dp_rank] +12: IndexError: list index out of range +12: +12: ------------------------------------------------------------ +12: Root Cause (first observed failure): +12: [0]: +12: time : 2023-03-16_19:22:59 +12: host : nid005296 +12: rank : 97 (local_rank: 1) +12: exitcode : 1 (pid: 76487) +12: error_file: /tmp/torchelastic_5d7fa605/none_9t4kvh7p/attempt_0/1/error.json +12: traceback : Traceback (most recent call last): +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +12: return f(*args, **kwargs) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +12: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +12: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: raise ChildFailedError( +12: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 5: ============================================================ + 5: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 5: ------------------------------------------------------------ + 5: Failures: + 5: [1]: + 5: time : 2023-03-16_19:22:59 + 5: host : nid005289 + 5: rank : 45 (local_rank: 5) + 5: exitcode : 1 (pid: 61064) + 5: error_file: /tmp/torchelastic_zl4l0ptg/none_fvx9qwzu/attempt_0/5/error.json + 5: traceback : Traceback (most recent call last): + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: return f(*args, **kwargs) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +12: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +12: success = self._load_zero_checkpoint( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +12: self.optimizer.load_state_dict( +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +12: self._load_legacy_checkpoint(state_dict_list, +12: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +12: current_rank_sd = state_dict_list[dp_rank] + 6: return launch_agent(self._config, self._entrypoint, list(args)) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: success = self._load_zero_checkpoint( +12: IndexError: list index out of range +12: +12: ============================================================ + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: self.optimizer.load_state_dict( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: self._load_legacy_checkpoint(state_dict_list, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: current_rank_sd = state_dict_list[dp_rank] + 5: IndexError: list index out of range + 5: + 5: ------------------------------------------------------------ + 5: Root Cause (first observed failure): + 5: [0]: + 5: time : 2023-03-16_19:22:59 + 5: host : nid005289 + 5: rank : 42 (local_rank: 2) + 5: exitcode : 1 (pid: 61061) + 5: error_file: /tmp/torchelastic_zl4l0ptg/none_fvx9qwzu/attempt_0/2/error.json + 5: traceback : Traceback (most recent call last): + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 5: return f(*args, **kwargs) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 5: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 5: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_2p2ma9h4/none_u61yk70y/attempt_0/2/error.json) + 5: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 5: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 5: success = self._load_zero_checkpoint( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 5: self.optimizer.load_state_dict( + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 5: self._load_legacy_checkpoint(state_dict_list, + 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 5: current_rank_sd = state_dict_list[dp_rank] + 5: IndexError: list index out of range + 5: + 5: ============================================================ + 9: Traceback (most recent call last): + 9: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 9: return _run_code(code, main_globals, None, + 9: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 6: raise ChildFailedError( + 9: exec(code, run_globals) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 6: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 6: ============================================================ + 6: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 6: ------------------------------------------------------------ + 6: Failures: + 6: [1]: + 6: time : 2023-03-16_19:22:59 + 6: host : nid005290 + 6: rank : 55 (local_rank: 7) + 6: exitcode : 1 (pid: 61449) + 6: error_file: /tmp/torchelastic_m82gyhsh/none_80p3hzqk/attempt_0/7/error.json + 6: traceback : Traceback (most recent call last): + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: success = self._load_zero_checkpoint( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 6: self.optimizer.load_state_dict( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 6: current_rank_sd = state_dict_list[dp_rank] + 6: IndexError: list index out of range + 6: + 6: ------------------------------------------------------------ + 6: Root Cause (first observed failure): + 6: [0]: + 6: time : 2023-03-16_19:22:59 + 6: host : nid005290 + 6: rank : 52 (local_rank: 4) + 6: exitcode : 1 (pid: 61446) + 6: error_file: /tmp/torchelastic_m82gyhsh/none_80p3hzqk/attempt_0/4/error.json + 6: traceback : Traceback (most recent call last): + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: return f(*args, **kwargs) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 6: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 6: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 6: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 6: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 6: success = self._load_zero_checkpoint( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 6: self.optimizer.load_state_dict( + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 6: self._load_legacy_checkpoint(state_dict_list, + 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 6: current_rank_sd = state_dict_list[dp_rank] + 7: main() + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 6: IndexError: list index out of range + 6: + 6: ============================================================ +11: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_2_s491an/none_csg0lvzu/attempt_0/1/error.json) +11: Traceback (most recent call last): +11: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 7: return f(*args, **kwargs) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main +11: return _run_code(code, main_globals, None, +11: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code +11: exec(code, run_globals) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 9: main() + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: run(args) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 9: return f(*args, **kwargs) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 7: elastic_launch( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ +11: main() +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: run(args) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 2: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_61sahj7m/none_zb3emndf/attempt_0/2/error.json) + 2: Traceback (most recent call last): +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 2: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 7: return launch_agent(self._config, self._entrypoint, list(args)) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent + 2: return _run_code(code, main_globals, None, + 2: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 2: exec(code, run_globals) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 9: elastic_launch( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 7: raise ChildFailedError( + 7: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 7: ============================================================ + 7: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 7: ------------------------------------------------------------ + 7: Failures: + 7: [1]: + 7: time : 2023-03-16_19:22:59 + 7: host : nid005291 + 7: rank : 59 (local_rank: 3) + 7: exitcode : 1 (pid: 63345) + 7: error_file: /tmp/torchelastic_2tl40wpq/none_q6xt2_mo/attempt_0/3/error.json + 7: traceback : Traceback (most recent call last): + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: return f(*args, **kwargs) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: success = self._load_zero_checkpoint( +11: run(args) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self._load_legacy_checkpoint(state_dict_list, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: current_rank_sd = state_dict_list[dp_rank] + 7: IndexError: list index out of range + 7: + 7: [2]: + 7: time : 2023-03-16_19:22:59 + 7: host : nid005291 + 7: rank : 62 (local_rank: 6) + 7: exitcode : 1 (pid: 63348) + 7: error_file: /tmp/torchelastic_2tl40wpq/none_q6xt2_mo/attempt_0/6/error.json + 7: traceback : Traceback (most recent call last): +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: return f(*args, **kwargs) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: success = self._load_zero_checkpoint( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self._load_legacy_checkpoint(state_dict_list, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: current_rank_sd = state_dict_list[dp_rank] + 7: IndexError: list index out of range + 7: + 7: [3]: + 7: time : 2023-03-16_19:22:59 + 7: host : nid005291 + 9: return launch_agent(self._config, self._entrypoint, list(args)) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent + 7: rank : 63 (local_rank: 7) + 7: exitcode : 1 (pid: 63349) + 7: error_file: /tmp/torchelastic_2tl40wpq/none_q6xt2_mo/attempt_0/7/error.json + 7: traceback : Traceback (most recent call last): + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: return f(*args, **kwargs) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 7: success = self._load_zero_checkpoint( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: self.optimizer.load_state_dict( +11: elastic_launch( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self._load_legacy_checkpoint(state_dict_list, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: current_rank_sd = state_dict_list[dp_rank] + 7: IndexError: list index out of range + 7: + 7: ------------------------------------------------------------ + 7: Root Cause (first observed failure): + 7: [0]: + 7: time : 2023-03-16_19:22:59 + 7: host : nid005291 + 7: rank : 57 (local_rank: 1) + 7: exitcode : 1 (pid: 63343) + 7: error_file: /tmp/torchelastic_2tl40wpq/none_q6xt2_mo/attempt_0/1/error.json + 7: traceback : Traceback (most recent call last): + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 7: return f(*args, **kwargs) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 7: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 7: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 7: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 7: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: raise ChildFailedError( + 7: success = self._load_zero_checkpoint( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 7: self.optimizer.load_state_dict( + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 7: self._load_legacy_checkpoint(state_dict_list, + 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 7: current_rank_sd = state_dict_list[dp_rank] + 7: IndexError: list index out of range + 7: + 7: ============================================================ + 2: main() + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 9: ============================================================ + 9: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 9: ------------------------------------------------------------ + 9: Failures: + 9: [1]: + 9: time : 2023-03-16_19:22:59 + 9: host : nid005293 + 9: rank : 75 (local_rank: 3) + 9: exitcode : 1 (pid: 57029) + 9: error_file: /tmp/torchelastic_2p2ma9h4/none_u61yk70y/attempt_0/3/error.json + 9: traceback : Traceback (most recent call last): + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: return f(*args, **kwargs) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: success = self._load_zero_checkpoint( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: self.optimizer.load_state_dict( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: self._load_legacy_checkpoint(state_dict_list, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: current_rank_sd = state_dict_list[dp_rank] + 9: IndexError: list index out of range + 9: + 9: [2]: + 9: time : 2023-03-16_19:22:59 + 9: host : nid005293 + 9: rank : 76 (local_rank: 4) + 9: exitcode : 1 (pid: 57030) + 9: error_file: /tmp/torchelastic_2p2ma9h4/none_u61yk70y/attempt_0/4/error.json + 9: traceback : Traceback (most recent call last): + 2: return f(*args, **kwargs) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: return f(*args, **kwargs) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: return launch_agent(self._config, self._entrypoint, list(args)) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: success = self._load_zero_checkpoint( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: self.optimizer.load_state_dict( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: self._load_legacy_checkpoint(state_dict_list, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: current_rank_sd = state_dict_list[dp_rank] + 9: IndexError: list index out of range + 9: + 9: ------------------------------------------------------------ + 9: Root Cause (first observed failure): + 9: [0]: + 9: time : 2023-03-16_19:22:59 + 9: host : nid005293 + 9: rank : 74 (local_rank: 2) + 9: exitcode : 1 (pid: 57028) + 9: error_file: /tmp/torchelastic_2p2ma9h4/none_u61yk70y/attempt_0/2/error.json + 9: traceback : Traceback (most recent call last): + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 9: return f(*args, **kwargs) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 9: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 9: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 9: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 9: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 9: success = self._load_zero_checkpoint( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 9: self.optimizer.load_state_dict( + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 9: self._load_legacy_checkpoint(state_dict_list, + 9: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 9: current_rank_sd = state_dict_list[dp_rank] + 9: IndexError: list index out of range + 9: + 9: ============================================================ +11: raise ChildFailedError( + 2: run(args) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run +11: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: +11: ============================================================ +11: Megatron-DeepSpeed/pretrain_gpt.py FAILED +11: ------------------------------------------------------------ +11: Failures: +11: [1]: +11: time : 2023-03-16_19:22:59 +11: host : nid005295 +11: rank : 91 (local_rank: 3) +11: exitcode : 1 (pid: 115132) +11: error_file: /tmp/torchelastic_2_s491an/none_csg0lvzu/attempt_0/3/error.json +11: traceback : Traceback (most recent call last): +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: self.optimizer.load_state_dict( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: current_rank_sd = state_dict_list[dp_rank] +11: IndexError: list index out of range +11: +11: [2]: +11: time : 2023-03-16_19:22:59 +11: host : nid005295 +11: rank : 92 (local_rank: 4) +11: exitcode : 1 (pid: 115133) +11: error_file: /tmp/torchelastic_2_s491an/none_csg0lvzu/attempt_0/4/error.json +11: traceback : Traceback (most recent call last): +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: self.optimizer.load_state_dict( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: current_rank_sd = state_dict_list[dp_rank] +11: IndexError: list index out of range +11: +11: [3]: +11: time : 2023-03-16_19:22:59 +11: host : nid005295 + 2: elastic_launch( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ +11: rank : 93 (local_rank: 5) +11: exitcode : 1 (pid: 115134) +11: error_file: /tmp/torchelastic_2_s491an/none_csg0lvzu/attempt_0/5/error.json +11: traceback : Traceback (most recent call last): +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: self.optimizer.load_state_dict( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: current_rank_sd = state_dict_list[dp_rank] +11: IndexError: list index out of range +11: +11: ------------------------------------------------------------ +11: Root Cause (first observed failure): +11: [0]: +11: time : 2023-03-16_19:22:59 +11: host : nid005295 +11: rank : 89 (local_rank: 1) +11: exitcode : 1 (pid: 115130) +11: error_file: /tmp/torchelastic_2_s491an/none_csg0lvzu/attempt_0/1/error.json +11: traceback : Traceback (most recent call last): +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +11: return f(*args, **kwargs) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +11: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +11: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +11: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +11: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +11: success = self._load_zero_checkpoint( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +11: self.optimizer.load_state_dict( +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +11: self._load_legacy_checkpoint(state_dict_list, +11: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +11: current_rank_sd = state_dict_list[dp_rank] +11: IndexError: list index out of range +11: +11: ============================================================ + 2: return launch_agent(self._config, self._entrypoint, list(args)) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent + 2: raise ChildFailedError( + 2: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 2: ============================================================ + 2: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 2: ------------------------------------------------------------ + 2: Failures: + 2: [1]: + 2: time : 2023-03-16_19:22:59 + 2: host : nid005286 + 2: rank : 19 (local_rank: 3) + 2: exitcode : 1 (pid: 83972) + 2: error_file: /tmp/torchelastic_61sahj7m/none_zb3emndf/attempt_0/3/error.json + 2: traceback : Traceback (most recent call last): + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: return f(*args, **kwargs) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self._load_legacy_checkpoint(state_dict_list, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: current_rank_sd = state_dict_list[dp_rank] + 2: IndexError: list index out of range + 2: + 2: [2]: + 2: time : 2023-03-16_19:22:59 + 2: host : nid005286 + 2: rank : 21 (local_rank: 5) + 2: exitcode : 1 (pid: 83977) + 2: error_file: /tmp/torchelastic_61sahj7m/none_zb3emndf/attempt_0/5/error.json + 2: traceback : Traceback (most recent call last): + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: return f(*args, **kwargs) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self._load_legacy_checkpoint(state_dict_list, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: current_rank_sd = state_dict_list[dp_rank] + 2: IndexError: list index out of range + 2: + 2: [3]: + 2: time : 2023-03-16_19:22:59 + 2: host : nid005286 + 2: rank : 22 (local_rank: 6) + 2: exitcode : 1 (pid: 83978) + 2: error_file: /tmp/torchelastic_61sahj7m/none_zb3emndf/attempt_0/6/error.json + 2: traceback : Traceback (most recent call last): + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: return f(*args, **kwargs) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self._load_legacy_checkpoint(state_dict_list, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: current_rank_sd = state_dict_list[dp_rank] + 2: IndexError: list index out of range + 2: + 2: ------------------------------------------------------------ + 2: Root Cause (first observed failure): + 2: [0]: + 2: time : 2023-03-16_19:22:59 + 2: host : nid005286 + 2: rank : 18 (local_rank: 2) + 2: exitcode : 1 (pid: 83971) + 2: error_file: /tmp/torchelastic_61sahj7m/none_zb3emndf/attempt_0/2/error.json + 2: traceback : Traceback (most recent call last): + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 2: return f(*args, **kwargs) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 2: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 2: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 2: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 2: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 2: success = self._load_zero_checkpoint( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 2: self.optimizer.load_state_dict( + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 2: self._load_legacy_checkpoint(state_dict_list, + 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 2: current_rank_sd = state_dict_list[dp_rank] + 2: IndexError: list index out of range + 2: + 2: ============================================================ + 8: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_jjl8ixnj/none_ou539mqx/attempt_0/0/error.json) + 8: Traceback (most recent call last): + 8: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 8: return _run_code(code, main_globals, None, + 8: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 8: exec(code, run_globals) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 1: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_ul8ol8sm/none_qbe5rxe1/attempt_0/1/error.json) + 1: Traceback (most recent call last): + 1: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 1: return _run_code(code, main_globals, None, + 1: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 1: exec(code, run_globals) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 8: main() + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 1: main() + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_z_26m98s/none_whrc0g_h/attempt_0/0/error.json) + 3: Traceback (most recent call last): + 3: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 3: return _run_code(code, main_globals, None, + 3: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 3: exec(code, run_globals) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 8: run(args) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 8: elastic_launch( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 1: run(args) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 0: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_3fbtg_r2/none_4xkd1mnp/attempt_0/5/error.json) + 1: elastic_launch( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 0: Traceback (most recent call last): + 0: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 8: return launch_agent(self._config, self._entrypoint, list(args)) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent + 0: return _run_code(code, main_globals, None, + 0: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 0: exec(code, run_globals) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 3: main() + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:no error file defined for parent, to copy child error file (/tmp/torchelastic_l3dkjf8v/none_8asduxst/attempt_0/0/error.json) +15: Traceback (most recent call last): +15: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main + 1: return launch_agent(self._config, self._entrypoint, list(args)) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent +15: return _run_code(code, main_globals, None, +15: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code + 8: raise ChildFailedError( + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main +15: exec(code, run_globals) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in + 8: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 8: ============================================================ + 8: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 8: ------------------------------------------------------------ + 8: Failures: + 8: [1]: + 8: time : 2023-03-16_19:22:59 + 8: host : nid005292 + 8: rank : 65 (local_rank: 1) + 8: exitcode : 1 (pid: 54236) + 8: error_file: /tmp/torchelastic_jjl8ixnj/none_ou539mqx/attempt_0/1/error.json + 8: traceback : Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: + 8: [2]: + 8: time : 2023-03-16_19:22:59 + 8: host : nid005292 + 8: rank : 66 (local_rank: 2) + 8: exitcode : 1 (pid: 54237) + 8: error_file: /tmp/torchelastic_jjl8ixnj/none_ou539mqx/attempt_0/2/error.json + 8: traceback : Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: raise ChildFailedError( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 1: ============================================================ + 1: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 1: ------------------------------------------------------------ + 1: Failures: + 1: [0]: + 1: time : 2023-03-16_19:23:06 + 1: host : nid005285 + 1: rank : 8 (local_rank: 0) + 1: exitcode : 1 (pid: 50848) + 1: error_file: /tmp/torchelastic_ul8ol8sm/none_qbe5rxe1/attempt_0/0/error.json + 1: traceback : Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: + 8: [3]: + 8: time : 2023-03-16_19:22:59 + 8: host : nid005292 + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 3: run(args) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 8: rank : 67 (local_rank: 3) + 8: exitcode : 1 (pid: 54238) + 8: error_file: /tmp/torchelastic_jjl8ixnj/none_ou539mqx/attempt_0/3/error.json + 8: traceback : Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: + 1: [2]: + 1: time : 2023-03-16_19:23:06 + 1: host : nid005285 + 1: rank : 10 (local_rank: 2) + 1: exitcode : 1 (pid: 50850) + 1: error_file: /tmp/torchelastic_ul8ol8sm/none_qbe5rxe1/attempt_0/2/error.json + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: self.optimizer.load_state_dict( + 1: traceback : Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: + 8: [4]: + 8: time : 2023-03-16_19:22:59 + 8: host : nid005292 + 8: rank : 68 (local_rank: 4) + 8: exitcode : 1 (pid: 54239) + 8: error_file: /tmp/torchelastic_jjl8ixnj/none_ou539mqx/attempt_0/4/error.json + 8: traceback : Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: success = self._load_zero_checkpoint( + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: + 1: [3]: + 1: time : 2023-03-16_19:23:08 + 1: host : nid005285 + 1: rank : 11 (local_rank: 3) + 1: exitcode : 1 (pid: 50851) + 1: error_file: /tmp/torchelastic_ul8ol8sm/none_qbe5rxe1/attempt_0/3/error.json + 1: traceback : Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: elastic_launch( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: + 8: [5]: + 8: time : 2023-03-16_19:22:59 + 8: host : nid005292 + 8: rank : 69 (local_rank: 5) + 8: exitcode : 1 (pid: 54240) + 8: error_file: /tmp/torchelastic_jjl8ixnj/none_ou539mqx/attempt_0/5/error.json + 8: traceback : Traceback (most recent call last): + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: + 1: [4]: + 1: time : 2023-03-16_19:23:06 + 1: host : nid005285 + 1: rank : 12 (local_rank: 4) + 1: exitcode : 1 (pid: 50852) + 1: error_file: /tmp/torchelastic_ul8ol8sm/none_qbe5rxe1/attempt_0/4/error.json + 1: traceback : Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) +15: main() +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: + 8: [6]: + 8: time : 2023-03-16_19:22:59 + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: host : nid005292 + 8: rank : 70 (local_rank: 6) + 8: exitcode : 1 (pid: 54241) + 8: error_file: /tmp/torchelastic_jjl8ixnj/none_ou539mqx/attempt_0/6/error.json + 8: traceback : Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: self.optimizer.load_state_dict( + 1: [5]: + 1: time : 2023-03-16_19:23:06 + 1: host : nid005285 + 1: rank : 13 (local_rank: 5) + 1: exitcode : 1 (pid: 50856) + 1: error_file: /tmp/torchelastic_ul8ol8sm/none_qbe5rxe1/attempt_0/5/error.json + 1: traceback : Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: main() + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: + 8: [7]: + 8: time : 2023-03-16_19:22:59 + 8: host : nid005292 + 8: rank : 71 (local_rank: 7) + 8: exitcode : 1 (pid: 54242) + 8: error_file: /tmp/torchelastic_jjl8ixnj/none_ou539mqx/attempt_0/7/error.json + 8: traceback : Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 3: return launch_agent(self._config, self._entrypoint, list(args)) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self._load_legacy_checkpoint(state_dict_list, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: + 8: ------------------------------------------------------------ + 8: Root Cause (first observed failure): + 8: [0]: + 8: time : 2023-03-16_19:22:59 + 8: host : nid005292 + 8: rank : 64 (local_rank: 0) + 8: exitcode : 1 (pid: 54235) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main + 8: error_file: /tmp/torchelastic_jjl8ixnj/none_ou539mqx/attempt_0/0/error.json + 8: traceback : Traceback (most recent call last): + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 8: return f(*args, **kwargs) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 8: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: + 1: [6]: + 1: time : 2023-03-16_19:23:06 + 1: host : nid005285 + 1: rank : 14 (local_rank: 6) + 1: exitcode : 1 (pid: 50857) + 1: error_file: /tmp/torchelastic_ul8ol8sm/none_qbe5rxe1/attempt_0/6/error.json + 1: traceback : Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 8: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 8: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: success = self._load_zero_checkpoint( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 8: self.optimizer.load_state_dict( + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 8: self._load_legacy_checkpoint(state_dict_list, + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 8: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 8: current_rank_sd = state_dict_list[dp_rank] + 8: IndexError: list index out of range + 8: + 8: ============================================================ + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: + 1: [7]: + 1: time : 2023-03-16_19:23:06 + 1: host : nid005285 + 1: rank : 15 (local_rank: 7) + 1: exitcode : 1 (pid: 50861) +15: run(args) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run + 1: error_file: /tmp/torchelastic_ul8ol8sm/none_qbe5rxe1/attempt_0/7/error.json + 1: traceback : Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: raise ChildFailedError( + 3: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 3: ============================================================ + 3: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 3: ------------------------------------------------------------ + 3: Failures: + 3: [1]: + 3: time : 2023-03-16_19:22:58 + 3: host : nid005287 + 3: rank : 25 (local_rank: 1) + 3: exitcode : 1 (pid: 67279) + 3: error_file: /tmp/torchelastic_z_26m98s/none_whrc0g_h/attempt_0/1/error.json + 3: traceback : Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range + 3: + 3: [2]: + 3: time : 2023-03-16_19:22:58 + 3: host : nid005287 + 3: rank : 26 (local_rank: 2) + 3: exitcode : 1 (pid: 67280) + 3: error_file: /tmp/torchelastic_z_26m98s/none_whrc0g_h/attempt_0/2/error.json + 3: traceback : Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: + 1: ------------------------------------------------------------ + 1: Root Cause (first observed failure): + 1: [1]: + 1: time : 2023-03-16_19:23:05 + 1: host : nid005285 + 1: rank : 9 (local_rank: 1) + 1: exitcode : 1 (pid: 50849) + 1: error_file: /tmp/torchelastic_ul8ol8sm/none_qbe5rxe1/attempt_0/1/error.json + 1: traceback : Traceback (most recent call last): + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 1: return f(*args, **kwargs) + 0: run(args) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run +15: elastic_launch( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 1: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 1: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 1: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 1: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 1: success = self._load_zero_checkpoint( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 1: self.optimizer.load_state_dict( + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 1: self._load_legacy_checkpoint(state_dict_list, + 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 1: current.data.copy_(src_tensor.data) + 1: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 1: + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: success = self._load_zero_checkpoint( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range + 3: + 3: [3]: + 3: time : 2023-03-16_19:22:58 + 3: host : nid005287 + 1: ============================================================ + 3: rank : 27 (local_rank: 3) + 3: exitcode : 1 (pid: 67281) + 3: error_file: /tmp/torchelastic_z_26m98s/none_whrc0g_h/attempt_0/3/error.json + 3: traceback : Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: success = self._load_zero_checkpoint( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range + 3: + 3: [4]: + 3: time : 2023-03-16_19:22:58 + 3: host : nid005287 + 3: rank : 28 (local_rank: 4) + 3: exitcode : 1 (pid: 67282) + 3: error_file: /tmp/torchelastic_z_26m98s/none_whrc0g_h/attempt_0/4/error.json + 3: traceback : Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: return f(*args, **kwargs) + 0: elastic_launch( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: success = self._load_zero_checkpoint( +15: return launch_agent(self._config, self._entrypoint, list(args)) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range + 3: + 3: [5]: + 3: time : 2023-03-16_19:22:58 + 3: host : nid005287 + 3: rank : 29 (local_rank: 5) + 3: exitcode : 1 (pid: 67283) + 3: error_file: /tmp/torchelastic_z_26m98s/none_whrc0g_h/attempt_0/5/error.json + 3: traceback : Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: success = self._load_zero_checkpoint( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range + 3: + 3: [6]: + 3: time : 2023-03-16_19:22:58 + 3: host : nid005287 + 3: rank : 30 (local_rank: 6) + 3: exitcode : 1 (pid: 67284) + 3: error_file: /tmp/torchelastic_z_26m98s/none_whrc0g_h/attempt_0/6/error.json + 3: traceback : Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: return launch_agent(self._config, self._entrypoint, list(args)) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent +15: raise ChildFailedError( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: success = self._load_zero_checkpoint( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range + 3: + 3: [7]: + 3: time : 2023-03-16_19:22:58 + 3: host : nid005287 + 3: rank : 31 (local_rank: 7) + 3: exitcode : 1 (pid: 67285) + 3: error_file: /tmp/torchelastic_z_26m98s/none_whrc0g_h/attempt_0/7/error.json + 3: traceback : Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: return f(*args, **kwargs) +15: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: +15: ============================================================ +15: Megatron-DeepSpeed/pretrain_gpt.py FAILED +15: ------------------------------------------------------------ +15: Failures: +15: [1]: +15: time : 2023-03-16_19:22:59 +15: host : nid005299 +15: rank : 121 (local_rank: 1) +15: exitcode : 1 (pid: 119557) +15: error_file: /tmp/torchelastic_l3dkjf8v/none_8asduxst/attempt_0/1/error.json +15: traceback : Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: success = self._load_zero_checkpoint( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range +15: +15: [2]: +15: time : 2023-03-16_19:22:59 +15: host : nid005299 +15: rank : 122 (local_rank: 2) +15: exitcode : 1 (pid: 119558) +15: error_file: /tmp/torchelastic_l3dkjf8v/none_8asduxst/attempt_0/2/error.json +15: traceback : Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range + 3: + 3: ------------------------------------------------------------ + 3: Root Cause (first observed failure): + 3: [0]: + 3: time : 2023-03-16_19:22:58 + 3: host : nid005287 + 3: rank : 24 (local_rank: 0) + 3: exitcode : 1 (pid: 67278) + 0: raise ChildFailedError( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: error_file: /tmp/torchelastic_z_26m98s/none_whrc0g_h/attempt_0/0/error.json + 3: traceback : Traceback (most recent call last): + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 3: return f(*args, **kwargs) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 3: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 3: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 3: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 3: success = self._load_zero_checkpoint( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 3: self.optimizer.load_state_dict( + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 3: self._load_legacy_checkpoint(state_dict_list, + 0: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: + 0: ============================================================ + 0: Megatron-DeepSpeed/pretrain_gpt.py FAILED + 0: ------------------------------------------------------------ + 0: Failures: + 0: [0]: + 0: time : 2023-03-16_19:23:06 + 0: host : nid005284 + 0: rank : 0 (local_rank: 0) + 0: exitcode : 1 (pid: 71771) + 0: error_file: /tmp/torchelastic_3fbtg_r2/none_4xkd1mnp/attempt_0/0/error.json + 0: traceback : Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range +15: +15: [3]: +15: time : 2023-03-16_19:22:59 +15: host : nid005299 + 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint + 3: current_rank_sd = state_dict_list[dp_rank] + 3: IndexError: list index out of range + 3: + 3: ============================================================ + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( +15: rank : 123 (local_rank: 3) +15: exitcode : 1 (pid: 119559) +15: error_file: /tmp/torchelastic_l3dkjf8v/none_8asduxst/attempt_0/3/error.json +15: traceback : Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: + 0: [1]: + 0: time : 2023-03-16_19:23:06 + 0: host : nid005284 + 0: rank : 1 (local_rank: 1) + 0: exitcode : 1 (pid: 71772) + 0: error_file: /tmp/torchelastic_3fbtg_r2/none_4xkd1mnp/attempt_0/1/error.json +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: self.optimizer.load_state_dict( + 0: traceback : Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range +15: +15: [4]: +15: time : 2023-03-16_19:22:59 +15: host : nid005299 +15: rank : 124 (local_rank: 4) +15: exitcode : 1 (pid: 119560) +15: error_file: /tmp/torchelastic_l3dkjf8v/none_8asduxst/attempt_0/4/error.json +15: traceback : Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: success = self._load_zero_checkpoint( + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: + 0: [2]: + 0: time : 2023-03-16_19:23:06 + 0: host : nid005284 + 0: rank : 2 (local_rank: 2) + 0: exitcode : 1 (pid: 71773) + 0: error_file: /tmp/torchelastic_3fbtg_r2/none_4xkd1mnp/attempt_0/2/error.json + 0: traceback : Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range +15: +15: [5]: +15: time : 2023-03-16_19:22:59 +15: host : nid005299 +15: rank : 125 (local_rank: 5) +15: exitcode : 1 (pid: 119561) +15: error_file: /tmp/torchelastic_l3dkjf8v/none_8asduxst/attempt_0/5/error.json +15: traceback : Traceback (most recent call last): + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: + 0: [3]: + 0: time : 2023-03-16_19:23:06 + 0: host : nid005284 + 0: rank : 3 (local_rank: 3) + 0: exitcode : 1 (pid: 71774) + 0: error_file: /tmp/torchelastic_3fbtg_r2/none_4xkd1mnp/attempt_0/3/error.json + 0: traceback : Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range +15: +15: [6]: +15: time : 2023-03-16_19:22:59 + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: host : nid005299 +15: rank : 126 (local_rank: 6) +15: exitcode : 1 (pid: 119562) +15: error_file: /tmp/torchelastic_l3dkjf8v/none_8asduxst/attempt_0/6/error.json +15: traceback : Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: self.optimizer.load_state_dict( + 0: [4]: + 0: time : 2023-03-16_19:23:06 + 0: host : nid005284 + 0: rank : 4 (local_rank: 4) + 0: exitcode : 1 (pid: 71775) + 0: error_file: /tmp/torchelastic_3fbtg_r2/none_4xkd1mnp/attempt_0/4/error.json + 0: traceback : Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range +15: +15: [7]: +15: time : 2023-03-16_19:22:59 +15: host : nid005299 +15: rank : 127 (local_rank: 7) +15: exitcode : 1 (pid: 119563) +15: error_file: /tmp/torchelastic_l3dkjf8v/none_8asduxst/attempt_0/7/error.json +15: traceback : Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: + 0: [6]: + 0: time : 2023-03-16_19:23:06 + 0: host : nid005284 + 0: rank : 6 (local_rank: 6) + 0: exitcode : 1 (pid: 71777) + 0: error_file: /tmp/torchelastic_3fbtg_r2/none_4xkd1mnp/attempt_0/6/error.json + 0: traceback : Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self._load_legacy_checkpoint(state_dict_list, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range +15: +15: ------------------------------------------------------------ +15: Root Cause (first observed failure): +15: [0]: +15: time : 2023-03-16_19:22:59 +15: host : nid005299 +15: rank : 120 (local_rank: 0) +15: exitcode : 1 (pid: 119556) + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: error_file: /tmp/torchelastic_l3dkjf8v/none_8asduxst/attempt_0/0/error.json +15: traceback : Traceback (most recent call last): +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper +15: return f(*args, **kwargs) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main +15: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain +15: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: + 0: [7]: + 0: time : 2023-03-16_19:23:06 + 0: host : nid005284 + 0: rank : 7 (local_rank: 7) + 0: exitcode : 1 (pid: 71778) +15: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint +15: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint +15: success = self._load_zero_checkpoint( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint +15: self.optimizer.load_state_dict( +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict +15: self._load_legacy_checkpoint(state_dict_list, + 0: error_file: /tmp/torchelastic_3fbtg_r2/none_4xkd1mnp/attempt_0/7/error.json + 0: traceback : Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer +15: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 406, in _load_legacy_checkpoint +15: current_rank_sd = state_dict_list[dp_rank] +15: IndexError: list index out of range +15: +15: ============================================================ + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: + 0: ------------------------------------------------------------ + 0: Root Cause (first observed failure): + 0: [5]: + 0: time : 2023-03-16_19:23:05 + 0: host : nid005284 + 0: rank : 5 (local_rank: 5) + 0: exitcode : 1 (pid: 71776) + 0: error_file: /tmp/torchelastic_3fbtg_r2/none_4xkd1mnp/attempt_0/5/error.json + 0: traceback : Traceback (most recent call last): + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper + 0: return f(*args, **kwargs) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/pretrain_gpt.py", line 231, in main + 0: pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain + 0: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/training.py", line 450, in setup_model_and_optimizer + 0: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint + 0: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_optimizer_states=load_optimizer_states) + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2601, in load_checkpoint + 0: success = self._load_zero_checkpoint( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2773, in _load_zero_checkpoint + 0: self.optimizer.load_state_dict( + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 396, in load_state_dict + 0: self._load_legacy_checkpoint(state_dict_list, + 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 420, in _load_legacy_checkpoint + 0: current.data.copy_(src_tensor.data) + 0: RuntimeError: The size of tensor a (11287040) must match the size of tensor b (90296320) at non-singleton dimension 0 + 0: + 0: ============================================================ +srun: error: nid005289: task 5: Exited with exit code 1 +srun: launch/slurm: _step_signal: Terminating StepId=3324669.0 +srun: error: nid005291: task 7: Exited with exit code 1 +srun: error: nid005298: task 14: Exited with exit code 1 +srun: error: nid005284: task 0: Exited with exit code 1 +srun: error: nid005297: task 13: Exited with exit code 1 +srun: error: nid005296: task 12: Exited with exit code 1 +srun: error: nid005294: task 10: Exited with exit code 1 +srun: error: nid005299: task 15: Exited with exit code 1 +srun: error: nid005293: task 9: Exited with exit code 1 +srun: error: nid005288: task 4: Exited with exit code 1 +srun: error: nid005292: task 8: Exited with exit code 1 +srun: error: nid005286: task 2: Exited with exit code 1 +srun: error: nid005287: task 3: Exited with exit code 1 +srun: error: nid005290: task 6: Exited with exit code 1 +srun: error: nid005285: task 1: Exited with exit code 1 +srun: error: nid005295: task 11: Exited with exit code 1 diff --git a/2b8100m100m/3324669.out b/2b8100m100m/3324669.out new file mode 100644 index 0000000000000000000000000000000000000000..34817acdd050ad1e83a4b5e25869196c0518bd7d --- /dev/null +++ b/2b8100m100m/3324669.out @@ -0,0 +1,20126 @@ +Model parameters: d_model 2560 ffw_size 10240 kv_size 128 n_heads 20 n_layers 34 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 34 --hidden-size 2560 --num-attention-heads 20 --kv-channels 128 --ffn-hidden-size 10240 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 128 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-2b8100m100mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 10000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_2b8100m100mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_2b8100m100m --load checkpoints_2b8100m100m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3324669.json --zero-stage 0 +START 3324669: Thu 16 Mar 2023 07:18:10 PM EET + 0: + 0: + 0: ======================= ROCm System Management Interface ======================= + 0: ================================= Concise Info ================================= + 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 0: 0 45.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 2 38.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 4 44.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 6 38.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: ================================================================================ + 0: ============================= End of ROCm SMI Log ============================== + 4: + 4: + 4: ======================= ROCm System Management Interface ======================= + 4: ================================= Concise Info ================================= + 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 4: 0 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 2 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 4 44.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 6 46.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: ================================================================================ + 4: ============================= End of ROCm SMI Log ============================== +15: +15: +15: ======================= ROCm System Management Interface ======================= +15: ================================= Concise Info ================================= +15: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +15: 0 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 2 39.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 4 47.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 6 38.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: ================================================================================ +15: ============================= End of ROCm SMI Log ============================== + 5: + 5: + 5: ======================= ROCm System Management Interface ======================= + 5: ================================= Concise Info ================================= + 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 5: 0 49.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 2 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 4 42.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 6 38.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: ================================================================================ + 5: ============================= End of ROCm SMI Log ============================== + 6: + 6: + 6: ======================= ROCm System Management Interface ======================= + 6: ================================= Concise Info ================================= + 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 6: 0 42.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 2 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 4 46.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 6 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: ================================================================================ + 6: ============================= End of ROCm SMI Log ============================== + 3: + 3: + 3: ======================= ROCm System Management Interface ======================= + 3: ================================= Concise Info ================================= + 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 3: 0 43.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 2 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 4 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 6 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: ================================================================================ + 3: ============================= End of ROCm SMI Log ============================== + 2: + 2: + 2: ======================= ROCm System Management Interface ======================= + 2: ================================= Concise Info ================================= + 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 2: 0 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 2 44.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 4 41.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 6 38.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: ================================================================================ + 2: ============================= End of ROCm SMI Log ============================== +13: +13: +13: ======================= ROCm System Management Interface ======================= +13: ================================= Concise Info ================================= +13: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +13: 0 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 2 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 4 44.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 6 47.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: ================================================================================ +13: ============================= End of ROCm SMI Log ============================== + 7: + 7: + 7: ======================= ROCm System Management Interface ======================= + 7: ================================= Concise Info ================================= + 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 7: 0 45.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 2 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 4 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 6 40.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: ================================================================================ + 7: ============================= End of ROCm SMI Log ============================== +14: +14: +14: ======================= ROCm System Management Interface ======================= +14: ================================= Concise Info ================================= +14: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +14: 0 46.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 2 41.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 4 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 6 42.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 7 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: ================================================================================ +14: ============================= End of ROCm SMI Log ============================== +12: +12: +12: ======================= ROCm System Management Interface ======================= +12: ================================= Concise Info ================================= +12: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +12: 0 48.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 4 48.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 6 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: ================================================================================ +12: ============================= End of ROCm SMI Log ============================== + 9: + 9: + 9: ======================= ROCm System Management Interface ======================= + 9: ================================= Concise Info ================================= + 9: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 9: 0 48.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 2 45.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 4 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 6 38.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: ================================================================================ + 9: ============================= End of ROCm SMI Log ============================== +10: +10: +10: ======================= ROCm System Management Interface ======================= +10: ================================= Concise Info ================================= +10: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +10: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 2 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 4 42.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 6 38.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: ================================================================================ +10: ============================= End of ROCm SMI Log ============================== +11: +11: +11: ======================= ROCm System Management Interface ======================= +11: ================================= Concise Info ================================= +11: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +11: 0 49.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 2 39.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 4 43.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 6 43.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: ================================================================================ +11: ============================= End of ROCm SMI Log ============================== + 1: + 1: + 1: ======================= ROCm System Management Interface ======================= + 1: ================================= Concise Info ================================= + 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 1: 0 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 2 46.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 4 41.0c 99.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 6 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: ================================================================================ + 1: ============================= End of ROCm SMI Log ============================== + 8: + 8: + 8: ======================= ROCm System Management Interface ======================= + 8: ================================= Concise Info ================================= + 8: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 8: 0 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 1 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 2 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 4 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 6 42.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: ================================================================================ + 8: ============================= End of ROCm SMI Log ============================== +12: Launching on nid005296 (12/16), master nid005284 port 9999, GPUs 8, CUDA: True + 0: Launching on nid005284 (0/16), master nid005284 port 9999, GPUs 8, CUDA: True + 9: Launching on nid005293 (9/16), master nid005284 port 9999, GPUs 8, CUDA: True +13: Launching on nid005297 (13/16), master nid005284 port 9999, GPUs 8, CUDA: True + 2: Launching on nid005286 (2/16), master nid005284 port 9999, GPUs 8, CUDA: True + 1: Launching on nid005285 (1/16), master nid005284 port 9999, GPUs 8, CUDA: True + 8: Launching on nid005292 (8/16), master nid005284 port 9999, GPUs 8, CUDA: True + 6: Launching on nid005290 (6/16), master nid005284 port 9999, GPUs 8, CUDA: True +11: Launching on nid005295 (11/16), master nid005284 port 9999, GPUs 8, CUDA: True + 7: Launching on nid005291 (7/16), master nid005284 port 9999, GPUs 8, CUDA: True + 4: Launching on nid005288 (4/16), master nid005284 port 9999, GPUs 8, CUDA: True + 5: Launching on nid005289 (5/16), master nid005284 port 9999, GPUs 8, CUDA: True +10: Launching on nid005294 (10/16), master nid005284 port 9999, GPUs 8, CUDA: True +14: Launching on nid005298 (14/16), master nid005284 port 9999, GPUs 8, CUDA: True +15: Launching on nid005299 (15/16), master nid005284 port 9999, GPUs 8, CUDA: True + 3: Launching on nid005287 (3/16), master nid005284 port 9999, GPUs 8, CUDA: True + 0: using world size: 128, data-parallel-size: 128, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 + 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. + 0: using torch.bfloat16 for parameters ... + 0: ------------------------ arguments ------------------------ + 0: abort_on_unmet_fused_kernel_constraints ......... False + 0: accumulate_allreduce_grads_in_fp32 .............. True + 0: adam_beta1 ...................................... 0.9 + 0: adam_beta2 ...................................... 0.999 + 0: adam_eps ........................................ 1e-08 + 0: adlr_autoresume ................................. False + 0: adlr_autoresume_interval ........................ 1000 + 0: apply_query_key_layer_scaling ................... True + 0: apply_residual_connection_post_layernorm ........ False + 0: attention_dropout ............................... 0.1 + 0: attention_softmax_in_fp32 ....................... False + 0: bert_binary_head ................................ True + 0: bert_load ....................................... None + 0: bf16 ............................................ True + 0: bias_dropout_fusion ............................. True + 0: bias_gelu_fusion ................................ True + 0: biencoder_projection_dim ........................ 0 + 0: biencoder_shared_query_context_model ............ False + 0: block_data_path ................................. None + 0: checkpoint_activations .......................... False + 0: checkpoint_in_cpu ............................... False + 0: checkpoint_num_layers ........................... 1 + 0: clip_grad ....................................... 1.0 + 0: codecarbon_dir .................................. None + 0: consumed_train_samples .......................... 0 + 0: consumed_train_tokens ........................... 0 + 0: consumed_valid_samples .......................... 0 + 0: contigious_checkpointing ........................ False + 0: cpu_optimizer ................................... False + 0: cpu_torch_adam .................................. False + 0: curriculum_learning ............................. False + 0: data_impl ....................................... mmap + 0: data_parallel_size .............................. 128 + 0: data_path ....................................... None + 0: dataloader_type ................................. single + 0: DDP_impl ........................................ local + 0: decoder_seq_length .............................. None + 0: deepscale ....................................... False + 0: deepscale_config ................................ None + 0: deepspeed ....................................... True + 0: deepspeed_activation_checkpointing .............. False + 0: deepspeed_config ................................ ds_configs/3324669.json + 0: deepspeed_mpi ................................... False + 0: distribute_checkpointed_activations ............. False + 0: distributed_backend ............................. nccl + 0: embed_layernorm ................................. False + 0: embedding_path .................................. None + 0: encoder_seq_length .............................. 2048 + 0: eod_mask_loss ................................... False + 0: eval_interval ................................... 1 + 0: eval_iters ...................................... 100 + 0: eval_only ....................................... True + 0: evidence_data_path .............................. None + 0: exit_duration_in_mins ........................... None + 0: exit_interval ................................... None + 0: ffn_hidden_size ................................. 10240 + 0: finetune ........................................ False + 0: fp16 ............................................ False + 0: fp16_lm_cross_entropy ........................... False + 0: fp32_residual_connection ........................ False + 0: gigaflos_no_embeds .............................. 0 + 0: global_batch_size ............................... 128 + 0: glu_activation .................................. None + 0: hidden_dropout .................................. 0.1 + 0: hidden_size ..................................... 2560 + 0: hysteresis ...................................... 2 + 0: ict_head_size ................................... None + 0: ict_load ........................................ None + 0: img_dim ......................................... 224 + 0: indexer_batch_size .............................. 128 + 0: indexer_log_interval ............................ 1000 + 0: inference ....................................... False + 0: init_method_std ................................. 0.02 + 0: init_method_xavier_uniform ...................... False + 0: initial_loss_scale .............................. 4294967296 + 0: kill_switch_path ................................ kill-switch-2b8100m100mval + 0: kv_channels ..................................... 128 + 0: layer_norm_fusion ............................... True + 0: layernorm_epsilon ............................... 1e-05 + 0: lazy_mpu_init ................................... None + 0: load ............................................ checkpoints_2b8100m100m + 0: local_rank ...................................... None + 0: log_batch_size_to_tensorboard ................... True + 0: log_interval .................................... 10 + 0: log_learning_rate_to_tensorboard ................ True + 0: log_level ....................................... None + 0: log_level_replica ............................... None + 0: log_loss_scale_to_tensorboard ................... True + 0: log_num_zeros_in_grad ........................... False + 0: log_params_norm ................................. False + 0: log_path ........................................ None + 0: log_timers_to_tensorboard ....................... True + 0: log_validation_ppl_to_tensorboard ............... True + 0: loss_on_targets_only ............................ False + 0: loss_scale ...................................... None + 0: loss_scale_window ............................... 1000 + 0: lr .............................................. 0.0002 + 0: lr_decay_iters .................................. None + 0: lr_decay_samples ................................ 1 + 0: lr_decay_style .................................. cosine + 0: lr_decay_tokens ................................. None + 0: lr_warmup_fraction .............................. None + 0: lr_warmup_iters ................................. 0 + 0: lr_warmup_samples ............................... 0 + 0: make_vocab_size_divisible_by .................... 128 + 0: mask_prob ....................................... 0.15 + 0: masked_softmax_fusion ........................... True + 0: max_position_embeddings ......................... 2048 + 0: mean_noise_span_length .......................... None + 0: memory_centric_tiled_linear ..................... False + 0: merge_file ...................................... gpt2/merges.txt + 0: micro_batch_size ................................ 1 + 0: min_loss_scale .................................. 1.0 + 0: min_lr .......................................... 2e-05 + 0: mmap_warmup ..................................... False + 0: no_load_optim ................................... True + 0: no_load_rng ..................................... None + 0: no_save_optim ................................... None + 0: no_save_rng ..................................... None + 0: noise_density ................................... None + 0: num_attention_heads ............................. 20 + 0: num_channels .................................... 3 + 0: num_classes ..................................... 1000 + 0: num_layers ...................................... 34 + 0: num_layers_per_virtual_pipeline_stage ........... None + 0: num_workers ..................................... 2 + 0: onnx_safe ....................................... None + 0: openai_gelu ..................................... False + 0: optimizer ....................................... adam + 0: optimizer_fusion ................................ True + 0: override_lr_scheduler ........................... True + 0: pad_vocab_size_to ............................... None + 0: params_dtype .................................... torch.bfloat16 + 0: partition_activations ........................... False + 0: patch_dim ....................................... 16 + 0: pipeline_model_parallel_size .................... 1 + 0: position_embedding_type ......................... PositionEmbeddingType.absolute + 0: pp_partition_method ............................. None + 0: profile_backward ................................ False + 0: query_in_block_prob ............................. 0.1 + 0: rampup_batch_size ............................... None + 0: rank ............................................ 0 + 0: remote_device ................................... none + 0: reset_attention_mask ............................ False + 0: reset_position_ids .............................. False + 0: reset_progress .................................. True + 0: retriever_report_topk_accuracies ................ [] + 0: retriever_score_scaling ......................... False + 0: retriever_seq_length ............................ 256 + 0: reweight_loss_based_on_position_frequency ....... False + 0: sample_rate ..................................... 1.0 + 0: save ............................................ checkpoints_2b8100m100m + 0: save_interval ................................... 10000 + 0: scatter_gather_tensors_in_pipeline .............. True + 0: scattered_embeddings ............................ False + 0: seed ............................................ 1234 + 0: seq_length ...................................... 2048 + 0: sgd_momentum .................................... 0.9 + 0: short_seq_prob .................................. 0.1 + 0: skip_train_iteration_range ...................... None + 0: split ........................................... None + 0: split_transformers .............................. False + 0: sync_tp_duplicated_parameters ................... False + 0: synchronize_each_layer .......................... False + 0: tensor_model_parallel_size ...................... 1 + 0: tensorboard_dir ................................. tensorboard_2b8100m100mval + 0: tensorboard_log_interval ........................ 1 + 0: tensorboard_queue_size .......................... 5 + 0: test_weighted_split_paths ....................... None + 0: test_weighted_split_paths_path .................. None + 0: tile_factor ..................................... 1 + 0: titles_data_path ................................ None + 0: tokenizer_name_or_path .......................... None + 0: tokenizer_type .................................. GPT2BPETokenizer + 0: train_iters ..................................... None + 0: train_samples ................................... 1 + 0: train_tokens .................................... None + 0: train_weighted_split_names ...................... ['train'] + 0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] + 0: train_weighted_split_paths_path ................. None + 0: train_weighted_split_splits ..................... [['0:1']] + 0: train_weighted_split_weights .................... [['1.0']] + 0: universal_checkpoint ............................ False + 0: use_bnb_optimizer ............................... False + 0: use_checkpoint_lr_scheduler ..................... False + 0: use_contiguous_buffers_in_ddp ................... True + 0: use_cpu_initialization .......................... None + 0: use_one_sent_docs ............................... False + 0: use_pin_memory .................................. False + 0: valid_num_workers ............................... 2 + 0: valid_weighted_split_names ...................... ['validation'] + 0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] + 0: valid_weighted_split_paths_path ................. None + 0: valid_weighted_split_splits ..................... [['0:1']] + 0: valid_weighted_split_weights .................... [['1.0']] + 0: virtual_pipeline_model_parallel_size ............ None + 0: vocab_extra_ids ................................. 0 + 0: vocab_file ...................................... gpt2/vocab.json + 0: weight_decay .................................... 0.1 + 0: world_size ...................................... 128 + 0: zero_allgather_bucket_size ...................... 0.0 + 0: zero_contigious_gradients ....................... False + 0: zero_reduce_bucket_size ......................... 0.0 + 0: zero_reduce_scatter ............................. False + 0: zero_stage ...................................... 0 + 0: -------------------- end of arguments --------------------- + 0: setting number of micro-batches to constant 1 + 0: > building GPT2BPETokenizer tokenizer ... + 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) + 0: DeepSpeed general environment info: + 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] + 0: torch version .................... 1.13.0+rocm5.2 + 0: torch cuda version ............... None + 0: torch hip version ................ 5.2.21151-afdc89f8 + 0: nvcc version ..................... None + 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] + 0: deepspeed info ................... 0.7.5, unknown, unknown + 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 + 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** + 0: > initializing torch distributed ... + 0: [2023-03-16 19:21:53,125] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +15: > setting tensorboard ... + 0: > initializing tensor model parallel with size 1 + 0: > initializing pipeline model parallel with size 1 + 0: > setting random seeds to 1234 ... + 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 + 0: > compiling dataset index builder ... + 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: make: Nothing to be done for 'default'. + 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: >>> done with dataset index builder. Compilation time: 0.090 seconds + 0: > compiling and loading fused kernels ... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 87 + 0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.cuda.o scaled_upper_triang_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 63 + 0: ninja: no work to do. + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 67 + 0: [1/1] c++ layer_norm_hip_kernel.cuda.o layer_norm_cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so + 0: >>> done with compiling and loading fused kernels. Compilation time: 21.099 seconds + 0: time to initialize megatron (seconds): 75.920 + 0: [after megatron is initialized] datetime: 2023-03-16 19:22:19 + 0: building GPT model ... + 0: [2023-03-16 19:22:20,059] [INFO] [utils.py:827:see_memory_usage] Before Building Model + 0: [2023-03-16 19:22:20,060] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB + 0: [2023-03-16 19:22:20,060] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.6 GB, percent = 6.1% + 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None + 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi + 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 + 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63, ProcessCoord(pipe=0, data=64, model=0): 64, ProcessCoord(pipe=0, data=65, model=0): 65, ProcessCoord(pipe=0, data=66, model=0): 66, ProcessCoord(pipe=0, data=67, model=0): 67, ProcessCoord(pipe=0, data=68, model=0): 68, ProcessCoord(pipe=0, data=69, model=0): + 0: 69, ProcessCoord(pipe=0, data=70, model=0): 70, ProcessCoord(pipe=0, data=71, model=0): 71, ProcessCoord(pipe=0, data=72, model=0): 72, ProcessCoord(pipe=0, data=73, model=0): 73, ProcessCoord(pipe=0, data=74, model=0): 74, ProcessCoord(pipe=0, data=75, model=0): 75, ProcessCoord(pipe=0, data=76, model=0): 76, ProcessCoord(pipe=0, data=77, model=0): 77, ProcessCoord(pipe=0, data=78, model=0): 78, ProcessCoord(pipe=0, data=79, model=0): 79, ProcessCoord(pipe=0, data=80, model=0): 80, ProcessCoord(pipe=0, data=81, model=0): 81, ProcessCoord(pipe=0, data=82, model=0): 82, ProcessCoord(pipe=0, data=83, model=0): 83, ProcessCoord(pipe=0, data=84, model=0): 84, ProcessCoord(pipe=0, data=85, model=0): 85, ProcessCoord(pipe=0, data=86, model=0): 86, ProcessCoord(pipe=0, data=87, model=0): 87, ProcessCoord(pipe=0, data=88, model=0): 88, ProcessCoord(pipe=0, data=89, model=0): 89, ProcessCoord(pipe=0, data=90, model=0): 90, ProcessCoord(pipe=0, data=91, model=0): 91, ProcessCoord(pipe=0, data=92, model=0): 92, Process + 0: Coord(pipe=0, data=93, model=0): 93, ProcessCoord(pipe=0, data=94, model=0): 94, ProcessCoord(pipe=0, data=95, model=0): 95, ProcessCoord(pipe=0, data=96, model=0): 96, ProcessCoord(pipe=0, data=97, model=0): 97, ProcessCoord(pipe=0, data=98, model=0): 98, ProcessCoord(pipe=0, data=99, model=0): 99, ProcessCoord(pipe=0, data=100, model=0): 100, ProcessCoord(pipe=0, data=101, model=0): 101, ProcessCoord(pipe=0, data=102, model=0): 102, ProcessCoord(pipe=0, data=103, model=0): 103, ProcessCoord(pipe=0, data=104, model=0): 104, ProcessCoord(pipe=0, data=105, model=0): 105, ProcessCoord(pipe=0, data=106, model=0): 106, ProcessCoord(pipe=0, data=107, model=0): 107, ProcessCoord(pipe=0, data=108, model=0): 108, ProcessCoord(pipe=0, data=109, model=0): 109, ProcessCoord(pipe=0, data=110, model=0): 110, ProcessCoord(pipe=0, data=111, model=0): 111, ProcessCoord(pipe=0, data=112, model=0): 112, ProcessCoord(pipe=0, data=113, model=0): 113, ProcessCoord(pipe=0, data=114, model=0): 114, ProcessCoord(pipe=0, data=115, mo + 0: del=0): 115, ProcessCoord(pipe=0, data=116, model=0): 116, ProcessCoord(pipe=0, data=117, model=0): 117, ProcessCoord(pipe=0, data=118, model=0): 118, ProcessCoord(pipe=0, data=119, model=0): 119, ProcessCoord(pipe=0, data=120, model=0): 120, ProcessCoord(pipe=0, data=121, model=0): 121, ProcessCoord(pipe=0, data=122, model=0): 122, ProcessCoord(pipe=0, data=123, model=0): 123, ProcessCoord(pipe=0, data=124, model=0): 124, ProcessCoord(pipe=0, data=125, model=0): 125, ProcessCoord(pipe=0, data=126, model=0): 126, ProcessCoord(pipe=0, data=127, model=0): 127} + 0: [2023-03-16 19:22:24,099] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer + 0: stage=0 layers=41 + 0: 0: _to_float16 + 0: 1: EmbeddingPipe + 0: 2: + 0: 3: ParallelTransformerLayerPipe + 0: 4: ParallelTransformerLayerPipe + 0: 5: ParallelTransformerLayerPipe + 0: 6: ParallelTransformerLayerPipe + 0: 7: ParallelTransformerLayerPipe + 0: 8: ParallelTransformerLayerPipe + 0: 9: ParallelTransformerLayerPipe + 0: 10: ParallelTransformerLayerPipe + 0: 11: ParallelTransformerLayerPipe + 0: 12: ParallelTransformerLayerPipe + 0: 13: ParallelTransformerLayerPipe + 0: 14: ParallelTransformerLayerPipe + 0: 15: ParallelTransformerLayerPipe + 0: 16: ParallelTransformerLayerPipe + 0: 17: ParallelTransformerLayerPipe + 0: 18: ParallelTransformerLayerPipe + 0: 19: ParallelTransformerLayerPipe + 0: 20: ParallelTransformerLayerPipe + 0: 21: ParallelTransformerLayerPipe + 0: 22: ParallelTransformerLayerPipe + 0: 23: ParallelTransformerLayerPipe + 0: 24: ParallelTransformerLayerPipe + 0: 25: ParallelTransformerLayerPipe + 0: 26: ParallelTransformerLayerPipe + 0: 27: ParallelTransformerLayerPipe + 0: 28: ParallelTransformerLayerPipe + 0: 29: ParallelTransformerLayerPipe + 0: 30: ParallelTransformerLayerPipe + 0: 31: ParallelTransformerLayerPipe + 0: 32: ParallelTransformerLayerPipe + 0: 33: ParallelTransformerLayerPipe + 0: 34: ParallelTransformerLayerPipe + 0: 35: ParallelTransformerLayerPipe + 0: 36: ParallelTransformerLayerPipe + 0: 37: undo + 0: 38: MixedFusedLayerNorm + 0: 39: EmbeddingPipe + 0: 40: float16_to_fp32 + 0: loss: CrossEntropy + 0: [2023-03-16 19:22:24,395] [INFO] [utils.py:827:see_memory_usage] After Building Model + 0: [2023-03-16 19:22:24,396] [INFO] [utils.py:828:see_memory_usage] MA 5.26 GB Max_MA 5.26 GB CA 5.31 GB Max_CA 5 GB + 0: [2023-03-16 19:22:24,396] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.65 GB, percent = 6.1% + 0: setting training iterations to 0 + 0: > learning rate decay style: cosine + 0: DeepSpeed is enabled. + 0: [2023-03-16 19:22:24,399] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown + 0: [2023-03-16 19:22:40,642] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False + 0: [2023-03-16 19:22:40,643] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer + 0: [2023-03-16 19:22:40,643] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer + 0: [2023-03-16 19:22:40,663] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam + 0: [2023-03-16 19:22:40,664] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer + 0: [2023-03-16 19:22:40,778] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer + 0: [2023-03-16 19:22:40,779] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.27 GB CA 5.32 GB Max_CA 5 GB + 0: [2023-03-16 19:22:40,779] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.33 GB, percent = 6.2% + 1: ninja: no work to do. + 1: Time to load utils op: 0.2895984649658203 seconds + 1: Time to load utils op: 0.0005977153778076172 seconds + 3: ninja: no work to do. + 3: Time to load utils op: 0.18455767631530762 seconds + 1: Time to load utils op: 0.20229339599609375 seconds + 1: Time to load utils op: 0.20286989212036133 seconds + 1: Time to load utils op: 0.20180201530456543 seconds + 1: Time to load utils op: 0.20250487327575684 seconds + 1: Time to load utils op: 0.2029120922088623 seconds + 1: Time to load utils op: 0.20235061645507812 seconds + 1: Time to load utils op: 0.20382165908813477 seconds + 3: Time to load utils op: 0.20232272148132324 seconds + 3: Time to load utils op: 0.20224308967590332 seconds + 3: Time to load utils op: 0.20187091827392578 seconds + 3: Time to load utils op: 0.20215940475463867 seconds + 3: Time to load utils op: 0.2021796703338623 seconds + 3: Time to load utils op: 0.20190763473510742 seconds + 0: Time to load utils op: 0.21220636367797852 seconds + 0: Time to load utils op: 0.2114706039428711 seconds + 0: Time to load utils op: 0.2120821475982666 seconds + 0: Time to load utils op: 0.21174359321594238 seconds + 0: Time to load utils op: 0.2121880054473877 seconds + 0: Time to load utils op: 0.21214890480041504 seconds + 0: Time to load utils op: 0.2121577262878418 seconds + 2: Time to load utils op: 0.21123027801513672 seconds + 2: Time to load utils op: 0.2112562656402588 seconds + 2: Time to load utils op: 0.2112596035003662 seconds + 2: Time to load utils op: 0.21126532554626465 secondsTime to load utils op: 0.2112712860107422 secondsTime to load utils op: 0.21128201484680176 seconds + 2: + 2: Time to load utils op: 0.2112722396850586 seconds + 2: + 2: Time to load utils op: 0.21127986907958984 seconds + 4: Time to load utils op: 0.21190452575683594 seconds + 4: Time to load utils op: 0.21193289756774902 seconds + 4: Time to load utils op: 0.21197295188903809 seconds + 4: Time to load utils op: 0.21198105812072754 seconds + 4: Time to load utils op: 0.2120075225830078 seconds + 4: Time to load utils op: 0.21201443672180176 seconds + 4: Time to load utils op: 0.2120203971862793 seconds + 4: Time to load utils op: 0.21202945709228516 seconds + 5: Time to load utils op: 0.21099400520324707 seconds + 5: Time to load utils op: 0.21102046966552734 seconds + 5: Time to load utils op: 0.21105146408081055 seconds + 5: Time to load utils op: 0.21105122566223145 secondsTime to load utils op: 0.2110598087310791 seconds + 5: + 5: Time to load utils op: 0.21105718612670898 secondsTime to load utils op: 0.2110598087310791 secondsTime to load utils op: 0.21106457710266113 seconds + 5: + 5: + 6: Time to load utils op: 0.21231818199157715 seconds + 6: Time to load utils op: 0.21233081817626953 seconds + 6: Time to load utils op: 0.2123417854309082 seconds + 6: Time to load utils op: 0.21234846115112305 seconds + 6: Time to load utils op: 0.2121267318725586 seconds + 6: Time to load utils op: 0.20895028114318848 seconds + 6: Time to load utils op: 0.21190261840820312 secondsTime to load utils op: 0.21195411682128906 seconds + 6: + 1: Time to load utils op: 0.00045228004455566406 seconds + 1: Time to load utils op: 0.0003814697265625 secondsTime to load utils op: 0.00036644935607910156 seconds + 1: + 1: Time to load utils op: 0.0004222393035888672 secondsTime to load utils op: 0.0005350112915039062 seconds + 1: + 1: Time to load utils op: 0.00038242340087890625 seconds + 1: Time to load utils op: 0.0003542900085449219 seconds + 7: Time to load utils op: 0.21179461479187012 seconds + 7: Time to load utils op: 0.21181321144104004 secondsTime to load utils op: 0.21181797981262207 seconds + 7: + 7: Time to load utils op: 0.21183991432189941 secondsTime to load utils op: 0.21185302734375 seconds + 7: + 7: Time to load utils op: 0.21184897422790527 seconds + 7: Time to load utils op: 0.21186184883117676 seconds + 7: Time to load utils op: 0.21186518669128418 seconds +13: Time to load utils op: 0.2091062068939209 secondsTime to load utils op: 0.20922183990478516 seconds +13: +13: Time to load utils op: 0.20918488502502441 seconds +13: Time to load utils op: 0.20994257926940918 seconds +13: Time to load utils op: 0.20917010307312012 seconds +13: Time to load utils op: 0.20995640754699707 seconds +13: Time to load utils op: 0.20903444290161133 seconds + 8: Time to load utils op: 0.2115483283996582 seconds + 8: Time to load utils op: 0.21155333518981934 seconds + 8: Time to load utils op: 0.21158218383789062 seconds + 8: Time to load utils op: 0.21160578727722168 secondsTime to load utils op: 0.21161389350891113 seconds + 8: Time to load utils op: 0.2116241455078125 seconds + 8: + 8: Time to load utils op: 0.2116231918334961 secondsTime to load utils op: 0.21163058280944824 seconds + 8: + 9: Time to load utils op: 0.21187710762023926 secondsTime to load utils op: 0.21187567710876465 seconds + 9: + 9: Time to load utils op: 0.2118992805480957 seconds + 9: Time to load utils op: 0.21185731887817383 seconds + 9: Time to load utils op: 0.2119302749633789 seconds + 9: Time to load utils op: 0.2119309902191162 secondsTime to load utils op: 0.21193170547485352 seconds + 9: Time to load utils op: 0.21193599700927734 seconds + 9: + 3: Time to load utils op: 0.5043106079101562 seconds + 0: Time to load utils op: 0.40537381172180176 seconds +13: Time to load utils op: 0.5043056011199951 seconds +11: Time to load utils op: 0.2375473976135254 secondsTime to load utils op: 0.23755574226379395 seconds +11: +11: Time to load utils op: 0.23754072189331055 seconds +11: Time to load utils op: 0.237565279006958 seconds +11: Time to load utils op: 0.23758959770202637 seconds +11: Time to load utils op: 0.23759055137634277 secondsTime to load utils op: 0.23759746551513672 seconds +11: +11: Time to load utils op: 0.2375965118408203 seconds +10: Time to load utils op: 0.24654555320739746 seconds +10: Time to load utils op: 0.24654912948608398 seconds +10: Time to load utils op: 0.24659228324890137 seconds +10: Time to load utils op: 0.24660062789916992 secondsTime to load utils op: 0.2466142177581787 secondsTime to load utils op: 0.24659991264343262 seconds +10: +10: +10: Time to load utils op: 0.2466142177581787 seconds +10: Time to load utils op: 0.24663734436035156 seconds + 3: Time to load utils op: 0.00045013427734375 seconds + 3: Time to load utils op: 0.00039315223693847656 seconds + 3: Time to load utils op: 0.0004761219024658203 seconds + 3: Time to load utils op: 0.0003464221954345703 seconds + 3: Time to load utils op: 0.00039649009704589844 seconds + 3: Time to load utils op: 0.0004341602325439453 seconds + 3: Time to load utils op: 0.0004017353057861328 seconds + 3: Time to load utils op: 0.0004773139953613281 seconds + 9: Time to load utils op: 0.000843048095703125 seconds + 9: Time to load utils op: 0.0008711814880371094 seconds +15: Time to load utils op: 0.25902581214904785 seconds +15: Time to load utils op: 0.2590315341949463 seconds +12: Time to load utils op: 0.2619760036468506 seconds +15: Time to load utils op: 0.25905418395996094 seconds +15: Time to load utils op: 0.25905799865722656 seconds +12: Time to load utils op: 0.2619924545288086 seconds +12: Time to load utils op: 0.2619915008544922 seconds +15: Time to load utils op: 0.25907468795776367 seconds +15: Time to load utils op: 0.25908780097961426 secondsTime to load utils op: 0.2590906620025635 seconds +12: Time to load utils op: 0.2620053291320801 seconds +15: +15: Time to load utils op: 0.2590970993041992 seconds +12: Time to load utils op: 0.2620246410369873 seconds +12: Time to load utils op: 0.26202917098999023 secondsTime to load utils op: 0.2620377540588379 seconds +12: +12: Time to load utils op: 0.26203203201293945 seconds + 9: Time to load utils op: 0.0011954307556152344 secondsTime to load utils op: 0.0013022422790527344 seconds + 9: + 9: Time to load utils op: 0.0012738704681396484 secondsTime to load utils op: 0.0012531280517578125 seconds + 9: + 4: Time to load utils op: 0.0007467269897460938 seconds + 9: Time to load utils op: 0.00121307373046875 seconds + 9: Time to load utils op: 0.0012645721435546875 seconds + 4: Time to load utils op: 0.0008397102355957031 seconds + 4: Time to load utils op: 0.0009791851043701172 seconds + 4: Time to load utils op: 0.0009284019470214844 seconds + 4: Time to load utils op: 0.0011048316955566406 secondsTime to load utils op: 0.001111745834350586 seconds + 4: + 4: Time to load utils op: 0.001180410385131836 seconds + 4: Time to load utils op: 0.0011906623840332031 seconds +14: Time to load utils op: 0.2650303840637207 seconds +14: Time to load utils op: 0.2650482654571533 seconds +14: Time to load utils op: 0.26505541801452637 seconds +14: Time to load utils op: 0.2650597095489502 seconds +14: Time to load utils op: 0.2650744915008545 secondsTime to load utils op: 0.26506996154785156 seconds +14: +14: Time to load utils op: 0.26508569717407227 seconds +14: Time to load utils op: 0.2650926113128662 seconds + 0: Time to load utils op: 0.0007035732269287109 seconds + 0: Time to load utils op: 0.0006997585296630859 seconds + 0: Time to load utils op: 0.0007166862487792969 seconds + 0: Time to load utils op: 0.0006875991821289062 secondsTime to load utils op: 0.0007102489471435547 seconds + 0: + 0: Time to load utils op: 0.0006182193756103516 seconds + 0: Time to load utils op: 0.0007758140563964844 seconds + 2: Time to load utils op: 0.0007531642913818359 seconds + 2: Time to load utils op: 0.0008227825164794922 seconds +13: Time to load utils op: 0.0004010200500488281 secondsTime to load utils op: 0.000400543212890625 seconds +13: Time to load utils op: 0.0005183219909667969 seconds +13: + 2: Time to load utils op: 0.0010342597961425781 seconds +13: Time to load utils op: 0.0004961490631103516 seconds + 6: Time to load utils op: 0.00049591064453125 seconds + 6: Time to load utils op: 0.0003924369812011719 seconds +13: Time to load utils op: 0.0004086494445800781 secondsTime to load utils op: 0.00044989585876464844 seconds +13: +13: Time to load utils op: 0.00042557716369628906 secondsTime to load utils op: 0.0004096031188964844 seconds +13: + 8: Time to load utils op: 0.0006034374237060547 seconds + 2: Time to load utils op: 0.0011947154998779297 seconds + 2: Time to load utils op: 0.001177072525024414 seconds + 8: Time to load utils op: 0.0006654262542724609 seconds + 6: Time to load utils op: 0.000438690185546875 secondsTime to load utils op: 0.00042819976806640625 seconds + 6: Time to load utils op: 0.00042891502380371094 seconds + 6: + 2: Time to load utils op: 0.0011599063873291016 seconds + 2: Time to load utils op: 0.0011513233184814453 seconds + 7: Time to load utils op: 0.0010445117950439453 seconds + 6: Time to load utils op: 0.000377655029296875 seconds + 2: Time to load utils op: 0.0012176036834716797 seconds + 6: Time to load utils op: 0.00039315223693847656 seconds + 6: Time to load utils op: 0.0005648136138916016 seconds + 8: Time to load utils op: 0.0009000301361083984 seconds + 8: Time to load utils op: 0.0010721683502197266 seconds + 7: Time to load utils op: 0.0013990402221679688 seconds + 7: Time to load utils op: 0.0013308525085449219 seconds + 7: Time to load utils op: 0.0013315677642822266 seconds + 7: Time to load utils op: 0.0013115406036376953 secondsTime to load utils op: 0.0014023780822753906 seconds + 7: + 7: Time to load utils op: 0.0013246536254882812 seconds + 8: Time to load utils op: 0.0012803077697753906 seconds + 7: Time to load utils op: 0.0013861656188964844 seconds + 8: Time to load utils op: 0.001287221908569336 seconds + 8: Time to load utils op: 0.0011682510375976562 seconds + 8: Time to load utils op: 0.0012698173522949219 seconds + 5: Time to load utils op: 0.0010159015655517578 seconds + 5: Time to load utils op: 0.0010647773742675781 seconds + 5: Time to load utils op: 0.0012340545654296875 seconds + 5: Time to load utils op: 0.0012772083282470703 secondsTime to load utils op: 0.0013391971588134766 seconds + 5: Time to load utils op: 0.0013687610626220703 seconds + 5: + 5: Time to load utils op: 0.0012738704681396484 seconds + 5: Time to load utils op: 0.001346588134765625 seconds +11: Time to load utils op: 0.0009672641754150391 seconds +11: Time to load utils op: 0.00101470947265625 seconds +11: Time to load utils op: 0.0012171268463134766 secondsTime to load utils op: 0.001215219497680664 seconds +11: +11: Time to load utils op: 0.001249551773071289 secondsTime to load utils op: 0.0012371540069580078 seconds +11: +11: Time to load utils op: 0.0013065338134765625 seconds +11: Time to load utils op: 0.001268625259399414 seconds +10: Time to load utils op: 0.0009005069732666016 seconds +10: Time to load utils op: 0.001153707504272461 seconds +10: Time to load utils op: 0.0011670589447021484 seconds +10: Time to load utils op: 0.001129150390625 seconds +10: Time to load utils op: 0.0011382102966308594 seconds +10: Time to load utils op: 0.0011718273162841797 secondsTime to load utils op: 0.001125335693359375 seconds +10: +10: Time to load utils op: 0.0011868476867675781 seconds +14: Time to load utils op: 0.0008156299591064453 seconds +12: Time to load utils op: 0.0009503364562988281 seconds +14: Time to load utils op: 0.0010139942169189453 seconds +12: Time to load utils op: 0.0009784698486328125 seconds +15: Time to load utils op: 0.0010936260223388672 seconds +12: Time to load utils op: 0.0011069774627685547 seconds +14: Time to load utils op: 0.001299142837524414 seconds +14: Time to load utils op: 0.0012335777282714844 secondsTime to load utils op: 0.0012302398681640625 seconds +14: +12: Time to load utils op: 0.0013203620910644531 secondsTime to load utils op: 0.0013363361358642578 seconds +12: +14: Time to load utils op: 0.0012447834014892578 secondsTime to load utils op: 0.001241922378540039 seconds +12: Time to load utils op: 0.0012366771697998047 seconds +15: Time to load utils op: 0.0012950897216796875 seconds +14: +15: Time to load utils op: 0.0013637542724609375 seconds +15: Time to load utils op: 0.0012979507446289062 seconds +12: Time to load utils op: 0.0012409687042236328 seconds +14: Time to load utils op: 0.0012640953063964844 seconds +12: Time to load utils op: 0.0013566017150878906 seconds +15: Time to load utils op: 0.0013318061828613281 seconds +15: Time to load utils op: 0.001287698745727539 secondsTime to load utils op: 0.0012879371643066406 seconds +15: +15: Time to load utils op: 0.0013453960418701172 seconds + 0: [2023-03-16 19:22:41,316] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 + 0: [2023-03-16 19:22:41,317] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.25 GB CA 5.32 GB Max_CA 5 GB + 0: [2023-03-16 19:22:41,317] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: [2023-03-16 19:22:41,432] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 + 0: [2023-03-16 19:22:41,433] [INFO] [utils.py:828:see_memory_usage] MA 10.67 GB Max_MA 10.67 GB CA 13.39 GB Max_CA 13 GB + 0: [2023-03-16 19:22:41,433] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: [2023-03-16 19:22:41,535] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 + 0: [2023-03-16 19:22:41,535] [INFO] [utils.py:828:see_memory_usage] MA 10.67 GB Max_MA 10.67 GB CA 13.39 GB Max_CA 13 GB + 0: [2023-03-16 19:22:41,536] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: [2023-03-16 19:22:41,639] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 + 0: [2023-03-16 19:22:41,639] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 19:22:41,639] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: [2023-03-16 19:22:41,739] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 + 0: [2023-03-16 19:22:41,739] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 19:22:41,739] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: [2023-03-16 19:22:41,845] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 + 0: [2023-03-16 19:22:41,845] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 19:22:41,846] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: [2023-03-16 19:22:41,945] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer + 0: [2023-03-16 19:22:41,946] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 19:22:41,946] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: [2023-03-16 19:22:42,051] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer + 0: [2023-03-16 19:22:42,052] [INFO] [utils.py:828:see_memory_usage] MA 15.94 GB Max_MA 15.94 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 19:22:42,052] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: [2023-03-16 19:22:42,153] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer + 0: [2023-03-16 19:22:42,153] [INFO] [utils.py:828:see_memory_usage] MA 15.94 GB Max_MA 15.94 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 19:22:42,154] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.49 GB, percent = 6.3% + 0: [2023-03-16 19:22:42,154] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam + 0: [2023-03-16 19:22:42,154] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler + 0: [2023-03-16 19:22:42,154] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = + 0: [2023-03-16 19:22:42,154] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] activation_checkpointing_config { + 0: "partition_activations": false, + 0: "contiguous_memory_optimization": false, + 0: "cpu_checkpointing": false, + 0: "number_checkpoints": null, + 0: "synchronize_checkpoint_boundary": false, + 0: "profile": false + 0: } + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] amp_enabled .................. False + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] amp_params ................... False + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] autotuning_config ............ { + 0: "enabled": false, + 0: "start_step": null, + 0: "end_step": null, + 0: "metric_path": null, + 0: "arg_mappings": null, + 0: "metric": "throughput", + 0: "model_info": null, + 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", + 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", + 0: "overwrite": true, + 0: "fast": true, + 0: "start_profile_step": 3, + 0: "end_profile_step": 5, + 0: "tuner_type": "gridsearch", + 0: "tuner_early_stopping": 5, + 0: "tuner_num_trials": 50, + 0: "model_info_path": null, + 0: "mp_size": 1, + 0: "max_train_batch_size": null, + 0: "min_train_batch_size": 1, + 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, + 0: "min_train_micro_batch_size_per_gpu": 1, + 0: "num_tuning_micro_batch_sizes": 3 + 0: } + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] bfloat16_enabled ............. True + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] comms_config ................. + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] communication_data_type ...... None + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa + 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] curriculum_enabled ........... False + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] curriculum_params ............ False + 0: [2023-03-16 19:22:42,155] [INFO] [config.py:1011:print] dataloader_drop_last ......... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] disable_allgather ............ False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] dump_state ................... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] elasticity_enabled ........... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] flops_profiler_config ........ { + 0: "enabled": false, + 0: "profile_step": 1, + 0: "module_depth": -1, + 0: "top_modules": 1, + 0: "detailed": true, + 0: "output_file": null + 0: } + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] fp16_auto_cast ............... None + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] fp16_enabled ................. False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] global_rank .................. 0 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] load_universal_checkpoint .... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] loss_scale ................... 1.0 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] memory_breakdown ............. False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] monitor_config ............... + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] nebula_config ................ { + 0: "enabled": false, + 0: "persistent_storage_path": null, + 0: "persistent_time_interval": 100, + 0: "num_of_version_in_retention": 2, + 0: "enable_nebula_load": true, + 0: "load_path": null + 0: } + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] optimizer_name ............... None + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] optimizer_params ............. None + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] pld_enabled .................. False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] pld_params ................... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] prescale_gradients ........... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] scheduler_name ............... None + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] scheduler_params ............. None + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] sparse_attention ............. None + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] steps_per_print .............. 2000 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] train_batch_size ............. 128 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 1 + 0: [2023-03-16 19:22:42,156] [INFO] [config.py:1011:print] use_node_local_storage ....... False + 0: [2023-03-16 19:22:42,157] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False + 0: [2023-03-16 19:22:42,157] [INFO] [config.py:1011:print] world_size ................... 128 + 0: [2023-03-16 19:22:42,157] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False + 0: [2023-03-16 19:22:42,157] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False + 0: [2023-03-16 19:22:42,157] [INFO] [config.py:1011:print] zero_enabled ................. False + 0: [2023-03-16 19:22:42,157] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 + 0: [2023-03-16 19:22:42,157] [INFO] [config.py:996:print_user_config] json = { + 0: "train_micro_batch_size_per_gpu": 1, + 0: "train_batch_size": 128, + 0: "gradient_clipping": 1.0, + 0: "zero_optimization": { + 0: "stage": 0 + 0: }, + 0: "bf16": { + 0: "enabled": true + 0: }, + 0: "steps_per_print": 2.000000e+03, + 0: "wall_clock_breakdown": false + 0: } + 0: Time to load utils op: 0.00039768218994140625 seconds + 0: [2023-03-16 19:22:42,157] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=1 + 0: [2023-03-16 19:22:42,209] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=41 [0, 41) STAGE_PARAMS=2809026560 (2809.027M) TOTAL_PARAMS=2809026560 (2809.027M) UNIQUE_PARAMS=2809026560 (2809.027M) + 8: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 0: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 9: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +13: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 3: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 4: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +12: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +15: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +14: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 1: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 8: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 2: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 9: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 7: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 0: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 3: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 4: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +12: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt... +10: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. +10: [2023-03-16 19:22:42,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/mp_rank_00_model_states.pt. + 5: [2023-03-16 19:22:42,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +14: [2023-03-16 19:22:42,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +14: [2023-03-16 19:22:42,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:42,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:42,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:42,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:42,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:42,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +14: [2023-03-16 19:22:42,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +14: [2023-03-16 19:22:42,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +14: [2023-03-16 19:22:42,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +14: [2023-03-16 19:22:42,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +14: [2023-03-16 19:22:42,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:42,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:42,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:42,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:42,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:42,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:42,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:42,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 5: [2023-03-16 19:22:42,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 6: [2023-03-16 19:22:42,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:42,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:42,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:42,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:42,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:42,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:42,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:42,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:42,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +11: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:42,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:42,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:42,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:42,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:42,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:42,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:42,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:42,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 0: [2023-03-16 19:22:42,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:42,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:42,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +13: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 8: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:42,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 4: [2023-03-16 19:22:42,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 2: [2023-03-16 19:22:42,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 3: [2023-03-16 19:22:42,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:42,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +10: [2023-03-16 19:22:42,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 7: [2023-03-16 19:22:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 1: [2023-03-16 19:22:42,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... + 9: [2023-03-16 19:22:42,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +15: [2023-03-16 19:22:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +12: [2023-03-16 19:22:42,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt... +14: [2023-03-16 19:22:42,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:42,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:42,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:43,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:43,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:43,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:43,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:43,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:43,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:43,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:43,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:43,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:43,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:43,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:43,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:43,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:43,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:43,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:43,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:43,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:43,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:43,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +14: [2023-03-16 19:22:43,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +11: [2023-03-16 19:22:43,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:43,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:43,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:43,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:43,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:43,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:43,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:43,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:43,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:43,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:43,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +13: [2023-03-16 19:22:43,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +10: [2023-03-16 19:22:43,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:43,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +15: [2023-03-16 19:22:43,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:43,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:43,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:43,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:43,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. +12: [2023-03-16 19:22:43,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_01-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +10: [2023-03-16 19:22:43,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +14: [2023-03-16 19:22:43,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +15: [2023-03-16 19:22:43,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +12: [2023-03-16 19:22:43,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +13: [2023-03-16 19:22:43,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:43,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:43,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:43,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:43,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:43,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:43,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:43,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +15: [2023-03-16 19:22:43,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 1: [2023-03-16 19:22:43,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:43,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 0: [2023-03-16 19:22:43,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:43,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:43,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:43,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:43,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +13: [2023-03-16 19:22:43,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:43,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +10: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +14: [2023-03-16 19:22:43,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 6: [2023-03-16 19:22:43,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 8: [2023-03-16 19:22:43,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:43,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 9: [2023-03-16 19:22:43,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:43,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt... +11: [2023-03-16 19:22:43,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +11: [2023-03-16 19:22:43,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 4: [2023-03-16 19:22:43,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:43,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:43,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:43,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:43,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. +12: [2023-03-16 19:22:43,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:43,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_03-model_00-model_states.pt. + 7: [2023-03-16 19:22:43,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:43,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:43,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:43,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:43,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:43,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:43,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:43,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:43,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:43,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:43,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:43,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:43,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:43,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:43,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:43,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:43,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:43,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:43,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:44,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +15: [2023-03-16 19:22:44,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:44,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:44,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:44,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:44,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:44,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:44,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:44,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:44,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:44,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:44,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:44,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +14: [2023-03-16 19:22:44,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:44,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:44,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:44,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:44,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:44,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:44,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +11: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:44,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +15: [2023-03-16 19:22:44,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:44,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:44,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:44,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +10: [2023-03-16 19:22:44,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +10: [2023-03-16 19:22:44,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:44,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:44,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:44,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:44,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +12: [2023-03-16 19:22:44,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:44,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +13: [2023-03-16 19:22:44,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:44,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... +14: [2023-03-16 19:22:44,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +12: [2023-03-16 19:22:44,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +13: [2023-03-16 19:22:44,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. +11: [2023-03-16 19:22:44,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_04-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +14: [2023-03-16 19:22:44,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +15: [2023-03-16 19:22:44,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +15: [2023-03-16 19:22:44,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +10: [2023-03-16 19:22:44,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +10: [2023-03-16 19:22:44,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +14: [2023-03-16 19:22:44,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +11: [2023-03-16 19:22:44,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +11: [2023-03-16 19:22:44,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +12: [2023-03-16 19:22:44,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... +13: [2023-03-16 19:22:44,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +12: [2023-03-16 19:22:44,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. +13: [2023-03-16 19:22:44,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_05-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:44,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:44,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:44,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:44,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:44,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:44,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:44,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:44,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:44,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +15: [2023-03-16 19:22:44,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 4: [2023-03-16 19:22:44,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +15: [2023-03-16 19:22:44,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:44,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:44,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:44,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:44,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:44,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:44,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +11: [2023-03-16 19:22:44,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:44,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:44,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:44,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 3: [2023-03-16 19:22:44,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 9: [2023-03-16 19:22:44,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:44,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +11: [2023-03-16 19:22:44,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:44,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:44,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:44,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:44,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:44,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:44,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:44,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:44,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 6: [2023-03-16 19:22:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 0: [2023-03-16 19:22:44,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +12: [2023-03-16 19:22:44,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:44,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:44,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:44,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:44,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:44,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +14: [2023-03-16 19:22:44,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:44,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +10: [2023-03-16 19:22:45,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:45,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:45,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:45,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:45,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:45,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:45,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:45,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:45,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:45,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:45,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +10: [2023-03-16 19:22:45,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +14: [2023-03-16 19:22:45,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:45,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:45,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:45,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:45,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... +13: [2023-03-16 19:22:45,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +12: [2023-03-16 19:22:45,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_06-model_00-model_states.pt. +13: [2023-03-16 19:22:45,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:45,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +14: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +10: [2023-03-16 19:22:45,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +15: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +12: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +12: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +12: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +12: [2023-03-16 19:22:45,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +12: [2023-03-16 19:22:45,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +13: [2023-03-16 19:22:45,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... +11: [2023-03-16 19:22:45,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +11: [2023-03-16 19:22:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +12: [2023-03-16 19:22:45,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +14: [2023-03-16 19:22:45,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +10: [2023-03-16 19:22:45,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +15: [2023-03-16 19:22:45,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +13: [2023-03-16 19:22:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +12: [2023-03-16 19:22:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +12: [2023-03-16 19:22:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +12: [2023-03-16 19:22:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. +12: [2023-03-16 19:22:45,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_07-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 2: [2023-03-16 19:22:45,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:45,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +11: [2023-03-16 19:22:45,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +11: [2023-03-16 19:22:45,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +11: [2023-03-16 19:22:45,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +11: [2023-03-16 19:22:45,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +11: [2023-03-16 19:22:45,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +11: [2023-03-16 19:22:45,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +13: [2023-03-16 19:22:45,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 1: [2023-03-16 19:22:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +14: [2023-03-16 19:22:45,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +15: [2023-03-16 19:22:45,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +12: [2023-03-16 19:22:45,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:45,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +15: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... +10: [2023-03-16 19:22:45,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:45,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:45,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:45,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:45,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:45,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:45,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:45,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:45,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:45,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:45,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:45,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +10: [2023-03-16 19:22:45,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:45,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:45,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:45,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 5: [2023-03-16 19:22:45,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:45,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:45,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +14: [2023-03-16 19:22:45,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 8: [2023-03-16 19:22:45,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 6: [2023-03-16 19:22:45,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 9: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +12: [2023-03-16 19:22:45,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:45,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 7: [2023-03-16 19:22:45,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. +13: [2023-03-16 19:22:45,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:45,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 4: [2023-03-16 19:22:45,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:45,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:45,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:45,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:45,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:45,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:45,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:45,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:45,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:45,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:45,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:45,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:45,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:45,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 3: [2023-03-16 19:22:45,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:45,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_08-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +11: [2023-03-16 19:22:46,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +11: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +11: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +11: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:46,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:46,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:46,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:46,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:46,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:46,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:46,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:46,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +12: [2023-03-16 19:22:46,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:46,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +14: [2023-03-16 19:22:46,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +11: [2023-03-16 19:22:46,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +15: [2023-03-16 19:22:46,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +10: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +11: [2023-03-16 19:22:46,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +13: [2023-03-16 19:22:46,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +10: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +13: [2023-03-16 19:22:46,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... +15: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +12: [2023-03-16 19:22:46,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. +14: [2023-03-16 19:22:46,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_09-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +15: [2023-03-16 19:22:46,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +13: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +11: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +14: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +10: [2023-03-16 19:22:46,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... +12: [2023-03-16 19:22:46,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt... + 2: [2023-03-16 19:22:46,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 2: [2023-03-16 19:22:46,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +14: [2023-03-16 19:22:46,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:46,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:46,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:46,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:46,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:46,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 3: [2023-03-16 19:22:46,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:46,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:46,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:46,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 5: [2023-03-16 19:22:46,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 8: [2023-03-16 19:22:46,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:46,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:46,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +13: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 7: [2023-03-16 19:22:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +10: [2023-03-16 19:22:46,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:46,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:46,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +11: [2023-03-16 19:22:46,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 6: [2023-03-16 19:22:46,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 9: [2023-03-16 19:22:46,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:46,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 1: [2023-03-16 19:22:46,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:46,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:46,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +12: [2023-03-16 19:22:46,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 4: [2023-03-16 19:22:46,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:46,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:46,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. + 0: [2023-03-16 19:22:46,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_10-model_00-model_states.pt. +15: [2023-03-16 19:22:46,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:46,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:46,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:46,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:46,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:46,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:46,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:46,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:46,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:46,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:46,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:46,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:46,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:46,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:46,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:46,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:47,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:47,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:47,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +15: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:47,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:47,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:47,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:47,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:47,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:47,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:47,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +13: [2023-03-16 19:22:47,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +12: [2023-03-16 19:22:47,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +14: [2023-03-16 19:22:47,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +14: [2023-03-16 19:22:47,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +14: [2023-03-16 19:22:47,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +11: [2023-03-16 19:22:47,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt... +10: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +15: [2023-03-16 19:22:47,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +10: [2023-03-16 19:22:47,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +13: [2023-03-16 19:22:47,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +11: [2023-03-16 19:22:47,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_11-model_00-model_states.pt. +12: [2023-03-16 19:22:47,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +14: [2023-03-16 19:22:47,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +14: [2023-03-16 19:22:47,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +14: [2023-03-16 19:22:47,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +14: [2023-03-16 19:22:47,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +14: [2023-03-16 19:22:47,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +14: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +14: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +11: [2023-03-16 19:22:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +12: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 7: [2023-03-16 19:22:47,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +13: [2023-03-16 19:22:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +15: [2023-03-16 19:22:47,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:47,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:47,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:47,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +10: [2023-03-16 19:22:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 8: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt... +10: [2023-03-16 19:22:47,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:47,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:47,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 6: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +13: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 1: [2023-03-16 19:22:47,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:47,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:47,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:47,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:47,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:47,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:47,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:47,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:47,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:47,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:47,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:47,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:47,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:47,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:47,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:47,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:47,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:47,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:47,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:47,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:47,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +14: [2023-03-16 19:22:47,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +11: [2023-03-16 19:22:47,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:47,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:47,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:47,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:47,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:47,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 2: [2023-03-16 19:22:47,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:47,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 3: [2023-03-16 19:22:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +12: [2023-03-16 19:22:47,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:47,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:47,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:47,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 0: [2023-03-16 19:22:47,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 5: [2023-03-16 19:22:47,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. +15: [2023-03-16 19:22:47,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 9: [2023-03-16 19:22:47,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_12-model_00-model_states.pt. + 4: [2023-03-16 19:22:47,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:47,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:47,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:47,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:47,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:47,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:47,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:48,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:48,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:48,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:48,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +14: [2023-03-16 19:22:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +12: [2023-03-16 19:22:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +14: [2023-03-16 19:22:48,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +14: [2023-03-16 19:22:48,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +14: [2023-03-16 19:22:48,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +14: [2023-03-16 19:22:48,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +13: [2023-03-16 19:22:48,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +12: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +11: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +11: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +11: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +11: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +11: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:48,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +13: [2023-03-16 19:22:48,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:48,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:48,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:48,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:48,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:48,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:48,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +15: [2023-03-16 19:22:48,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:48,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +11: [2023-03-16 19:22:48,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt... +10: [2023-03-16 19:22:48,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +10: [2023-03-16 19:22:48,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +11: [2023-03-16 19:22:48,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +11: [2023-03-16 19:22:48,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +15: [2023-03-16 19:22:48,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +11: [2023-03-16 19:22:48,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. +11: [2023-03-16 19:22:48,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_13-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 1: [2023-03-16 19:22:48,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:48,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:48,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +14: [2023-03-16 19:22:48,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 6: [2023-03-16 19:22:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +14: [2023-03-16 19:22:48,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +14: [2023-03-16 19:22:48,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +14: [2023-03-16 19:22:48,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +14: [2023-03-16 19:22:48,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +12: [2023-03-16 19:22:48,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:48,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:48,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +15: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +13: [2023-03-16 19:22:48,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 7: [2023-03-16 19:22:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:48,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:48,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 8: [2023-03-16 19:22:48,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +10: [2023-03-16 19:22:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... + 8: [2023-03-16 19:22:48,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:48,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:48,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +10: [2023-03-16 19:22:48,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +11: [2023-03-16 19:22:48,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt... +13: [2023-03-16 19:22:48,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:48,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:48,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:48,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:48,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:48,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:48,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:48,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +12: [2023-03-16 19:22:48,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 3: [2023-03-16 19:22:48,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 5: [2023-03-16 19:22:48,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:48,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:48,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +14: [2023-03-16 19:22:48,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 9: [2023-03-16 19:22:48,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:48,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 2: [2023-03-16 19:22:48,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:48,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:48,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:48,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:48,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:48,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:48,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:48,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:48,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +11: [2023-03-16 19:22:48,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. +15: [2023-03-16 19:22:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 4: [2023-03-16 19:22:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_14-model_00-model_states.pt. + 0: [2023-03-16 19:22:48,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:48,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:48,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:49,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:49,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:49,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:49,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:49,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:49,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:49,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +10: [2023-03-16 19:22:49,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +10: [2023-03-16 19:22:49,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:49,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:49,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:49,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:49,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:49,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:49,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +15: [2023-03-16 19:22:49,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +12: [2023-03-16 19:22:49,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +12: [2023-03-16 19:22:49,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +13: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +11: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt... +14: [2023-03-16 19:22:49,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +13: [2023-03-16 19:22:49,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +14: [2023-03-16 19:22:49,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +15: [2023-03-16 19:22:49,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_15-model_00-model_states.pt. +11: [2023-03-16 19:22:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:49,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:49,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 7: [2023-03-16 19:22:49,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:49,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:49,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +15: [2023-03-16 19:22:49,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:49,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 8: [2023-03-16 19:22:49,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:49,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:49,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:49,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +13: [2023-03-16 19:22:49,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 2: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +14: [2023-03-16 19:22:49,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +11: [2023-03-16 19:22:49,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 4: [2023-03-16 19:22:49,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +12: [2023-03-16 19:22:49,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:49,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... +10: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 4: [2023-03-16 19:22:49,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +12: [2023-03-16 19:22:49,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:49,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:49,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:49,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:49,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:49,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 5: [2023-03-16 19:22:49,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:49,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +14: [2023-03-16 19:22:49,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:49,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:49,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:49,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:49,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:49,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:49,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:49,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:49,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:49,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:49,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:49,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:49,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +13: [2023-03-16 19:22:49,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:49,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:49,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:49,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:49,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:49,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +10: [2023-03-16 19:22:49,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 3: [2023-03-16 19:22:49,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:49,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:49,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +15: [2023-03-16 19:22:49,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:49,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:49,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:49,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:49,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:49,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 9: [2023-03-16 19:22:49,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:49,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. +11: [2023-03-16 19:22:49,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:49,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:49,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:49,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:49,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:49,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:49,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:49,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:49,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:49,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 6: [2023-03-16 19:22:49,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:49,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:49,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_16-model_00-model_states.pt. + 0: [2023-03-16 19:22:49,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:49,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:49,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:49,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:50,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:50,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:50,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:50,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:50,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:50,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:50,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:50,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:50,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:50,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:50,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:50,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:50,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:50,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:50,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:50,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:50,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:50,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:50,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:50,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:50,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:50,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:50,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:50,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +15: [2023-03-16 19:22:50,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +15: [2023-03-16 19:22:50,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:50,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:50,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:50,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +13: [2023-03-16 19:22:50,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +12: [2023-03-16 19:22:50,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:50,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:50,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:50,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:50,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:50,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +11: [2023-03-16 19:22:50,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +15: [2023-03-16 19:22:50,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +15: [2023-03-16 19:22:50,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +15: [2023-03-16 19:22:50,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +15: [2023-03-16 19:22:50,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +10: [2023-03-16 19:22:50,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +10: [2023-03-16 19:22:50,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... +14: [2023-03-16 19:22:50,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +12: [2023-03-16 19:22:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +14: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +11: [2023-03-16 19:22:50,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. +13: [2023-03-16 19:22:50,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_17-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +14: [2023-03-16 19:22:50,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +10: [2023-03-16 19:22:50,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +13: [2023-03-16 19:22:50,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:50,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +10: [2023-03-16 19:22:50,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:50,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:50,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +15: [2023-03-16 19:22:50,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +12: [2023-03-16 19:22:50,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +11: [2023-03-16 19:22:50,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 2: [2023-03-16 19:22:50,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +15: [2023-03-16 19:22:50,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +15: [2023-03-16 19:22:50,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +15: [2023-03-16 19:22:50,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:50,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:50,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:50,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:50,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:50,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +12: [2023-03-16 19:22:50,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +13: [2023-03-16 19:22:50,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 4: [2023-03-16 19:22:50,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +11: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +14: [2023-03-16 19:22:50,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... +15: [2023-03-16 19:22:50,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:50,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:50,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:50,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:50,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:50,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:50,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:50,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 1: [2023-03-16 19:22:50,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:50,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 7: [2023-03-16 19:22:50,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. +15: [2023-03-16 19:22:50,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 3: [2023-03-16 19:22:50,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:50,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 9: [2023-03-16 19:22:50,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:50,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 0: [2023-03-16 19:22:50,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_18-model_00-model_states.pt. + 6: [2023-03-16 19:22:50,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:50,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:50,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:50,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:50,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:50,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:50,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:50,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:50,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:50,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:50,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:50,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +15: [2023-03-16 19:22:50,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:50,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:50,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:50,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:50,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:50,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:50,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:50,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:50,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:50,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:50,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:50,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 8: [2023-03-16 19:22:50,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:50,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:50,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:50,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:50,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 5: [2023-03-16 19:22:50,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:50,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:50,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:50,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:50,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:50,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:50,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:50,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:50,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:51,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:51,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:51,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:51,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:51,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +10: [2023-03-16 19:22:51,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +10: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:51,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:51,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +14: [2023-03-16 19:22:51,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:51,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:51,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:51,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +14: [2023-03-16 19:22:51,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +15: [2023-03-16 19:22:51,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:51,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:51,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:51,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:51,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:51,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:51,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +15: [2023-03-16 19:22:51,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:51,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:51,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:51,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:51,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +15: [2023-03-16 19:22:51,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +12: [2023-03-16 19:22:51,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +15: [2023-03-16 19:22:51,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +12: [2023-03-16 19:22:51,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:51,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +11: [2023-03-16 19:22:51,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:51,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +13: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt... +11: [2023-03-16 19:22:51,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. +13: [2023-03-16 19:22:51,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_19-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +12: [2023-03-16 19:22:51,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +13: [2023-03-16 19:22:51,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +11: [2023-03-16 19:22:51,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +15: [2023-03-16 19:22:51,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +15: [2023-03-16 19:22:51,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +15: [2023-03-16 19:22:51,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +15: [2023-03-16 19:22:51,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +15: [2023-03-16 19:22:51,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +15: [2023-03-16 19:22:51,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +13: [2023-03-16 19:22:51,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +14: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +10: [2023-03-16 19:22:51,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... +12: [2023-03-16 19:22:51,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +11: [2023-03-16 19:22:51,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +15: [2023-03-16 19:22:51,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +14: [2023-03-16 19:22:51,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. +10: [2023-03-16 19:22:51,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:51,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_20-model_00-model_states.pt. + 0: [2023-03-16 19:22:51,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:51,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:51,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:51,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +12: [2023-03-16 19:22:51,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:51,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +12: [2023-03-16 19:22:51,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 8: [2023-03-16 19:22:51,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 8: [2023-03-16 19:22:51,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:51,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:51,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:51,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:51,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:51,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:51,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:51,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 5: [2023-03-16 19:22:51,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:51,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:51,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:51,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:51,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:51,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +11: [2023-03-16 19:22:51,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:51,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:51,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +11: [2023-03-16 19:22:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:51,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 3: [2023-03-16 19:22:51,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:51,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:51,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:51,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:51,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:51,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:51,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:51,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:51,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:51,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:51,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:51,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:51,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:51,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:51,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +14: [2023-03-16 19:22:51,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:51,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +15: [2023-03-16 19:22:51,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:51,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:51,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 4: [2023-03-16 19:22:51,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:51,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:51,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:51,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:51,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +13: [2023-03-16 19:22:52,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +13: [2023-03-16 19:22:52,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:52,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... +10: [2023-03-16 19:22:52,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:52,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:52,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:52,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:52,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:52,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:52,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:52,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:52,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:52,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:52,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:52,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:52,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +15: [2023-03-16 19:22:52,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:52,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:52,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:52,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:52,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +14: [2023-03-16 19:22:52,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. +10: [2023-03-16 19:22:52,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_21-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +12: [2023-03-16 19:22:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +11: [2023-03-16 19:22:52,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +11: [2023-03-16 19:22:52,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +12: [2023-03-16 19:22:52,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +10: [2023-03-16 19:22:52,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +10: [2023-03-16 19:22:52,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +15: [2023-03-16 19:22:52,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +13: [2023-03-16 19:22:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... +14: [2023-03-16 19:22:52,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +13: [2023-03-16 19:22:52,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +15: [2023-03-16 19:22:52,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. +14: [2023-03-16 19:22:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_22-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +11: [2023-03-16 19:22:52,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:52,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +12: [2023-03-16 19:22:52,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +10: [2023-03-16 19:22:52,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +10: [2023-03-16 19:22:52,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +10: [2023-03-16 19:22:52,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +13: [2023-03-16 19:22:52,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 4: [2023-03-16 19:22:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 4: [2023-03-16 19:22:52,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +14: [2023-03-16 19:22:52,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +10: [2023-03-16 19:22:52,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +10: [2023-03-16 19:22:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +10: [2023-03-16 19:22:52,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:52,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 8: [2023-03-16 19:22:52,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:52,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +15: [2023-03-16 19:22:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... +13: [2023-03-16 19:22:52,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:52,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:52,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:52,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:52,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:52,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 5: [2023-03-16 19:22:52,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 3: [2023-03-16 19:22:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:52,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 6: [2023-03-16 19:22:52,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 6: [2023-03-16 19:22:52,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:52,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:52,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:52,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:52,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 1: [2023-03-16 19:22:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +11: [2023-03-16 19:22:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:52,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:52,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:52,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 2: [2023-03-16 19:22:52,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:52,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 7: [2023-03-16 19:22:52,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:52,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:52,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +14: [2023-03-16 19:22:52,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:52,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +12: [2023-03-16 19:22:52,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:52,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 9: [2023-03-16 19:22:52,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:52,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:52,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:52,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. + 0: [2023-03-16 19:22:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_23-model_00-model_states.pt. +15: [2023-03-16 19:22:52,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:52,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:53,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:53,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:53,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +10: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +10: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +10: [2023-03-16 19:22:53,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:53,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +13: [2023-03-16 19:22:53,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:53,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:53,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:53,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +12: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:53,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:53,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:53,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +15: [2023-03-16 19:22:53,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +14: [2023-03-16 19:22:53,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... +11: [2023-03-16 19:22:53,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +11: [2023-03-16 19:22:53,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +11: [2023-03-16 19:22:53,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +11: [2023-03-16 19:22:53,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +11: [2023-03-16 19:22:53,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +11: [2023-03-16 19:22:53,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +11: [2023-03-16 19:22:53,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +10: [2023-03-16 19:22:53,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +15: [2023-03-16 19:22:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +13: [2023-03-16 19:22:53,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +14: [2023-03-16 19:22:53,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. +12: [2023-03-16 19:22:53,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_24-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 9: [2023-03-16 19:22:53,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +10: [2023-03-16 19:22:53,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:53,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 2: [2023-03-16 19:22:53,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:53,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 7: [2023-03-16 19:22:53,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:53,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 7: [2023-03-16 19:22:53,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:53,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 2: [2023-03-16 19:22:53,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:53,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +14: [2023-03-16 19:22:53,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:53,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:53,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:53,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 5: [2023-03-16 19:22:53,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:53,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +11: [2023-03-16 19:22:53,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:53,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:53,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:53,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +11: [2023-03-16 19:22:53,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +11: [2023-03-16 19:22:53,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:53,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:53,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:53,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:53,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:53,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:53,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:53,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:53,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +12: [2023-03-16 19:22:53,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 8: [2023-03-16 19:22:53,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:53,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +13: [2023-03-16 19:22:53,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 5: [2023-03-16 19:22:53,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:53,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... +15: [2023-03-16 19:22:53,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 6: [2023-03-16 19:22:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:53,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:53,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:53,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 1: [2023-03-16 19:22:53,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:53,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:53,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:53,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:53,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 3: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:53,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:53,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +10: [2023-03-16 19:22:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:54,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:54,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:54,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:54,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:54,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:54,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:54,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:54,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:54,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:54,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +11: [2023-03-16 19:22:54,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:54,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +14: [2023-03-16 19:22:54,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:54,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:54,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:54,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:54,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:54,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:54,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:54,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:54,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:54,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:54,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:54,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +12: [2023-03-16 19:22:54,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +15: [2023-03-16 19:22:54,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:54,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_25-model_00-model_states.pt. +13: [2023-03-16 19:22:54,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +13: [2023-03-16 19:22:54,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +15: [2023-03-16 19:22:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +15: [2023-03-16 19:22:54,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +15: [2023-03-16 19:22:54,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +15: [2023-03-16 19:22:54,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +15: [2023-03-16 19:22:54,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +14: [2023-03-16 19:22:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +14: [2023-03-16 19:22:54,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +13: [2023-03-16 19:22:54,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +13: [2023-03-16 19:22:54,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +13: [2023-03-16 19:22:54,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +11: [2023-03-16 19:22:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +11: [2023-03-16 19:22:54,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +15: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +13: [2023-03-16 19:22:54,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +12: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +13: [2023-03-16 19:22:54,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +13: [2023-03-16 19:22:54,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +13: [2023-03-16 19:22:54,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +10: [2023-03-16 19:22:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +10: [2023-03-16 19:22:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... +15: [2023-03-16 19:22:54,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +15: [2023-03-16 19:22:54,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +15: [2023-03-16 19:22:54,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +15: [2023-03-16 19:22:54,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_26-model_00-model_states.pt. +12: [2023-03-16 19:22:54,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +14: [2023-03-16 19:22:54,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:54,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:54,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +11: [2023-03-16 19:22:54,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +11: [2023-03-16 19:22:54,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:54,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +11: [2023-03-16 19:22:54,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:54,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:54,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +14: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:54,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:54,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 3: [2023-03-16 19:22:54,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +11: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +13: [2023-03-16 19:22:54,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +11: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +11: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +11: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 9: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +10: [2023-03-16 19:22:54,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:54,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 6: [2023-03-16 19:22:54,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:54,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +15: [2023-03-16 19:22:54,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 5: [2023-03-16 19:22:54,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 5: [2023-03-16 19:22:54,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 2: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 4: [2023-03-16 19:22:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +10: [2023-03-16 19:22:54,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +13: [2023-03-16 19:22:54,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +15: [2023-03-16 19:22:54,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:54,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:54,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:54,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:54,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 7: [2023-03-16 19:22:54,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:54,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:54,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:54,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 8: [2023-03-16 19:22:54,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:54,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:54,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:54,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:54,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt... +12: [2023-03-16 19:22:54,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:54,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:54,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. +12: [2023-03-16 19:22:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:54,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:54,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:54,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:54,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:54,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_27-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:55,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:55,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:55,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:55,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:55,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:55,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:55,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:55,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:55,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:55,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:55,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:55,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:55,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:55,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +11: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +10: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +11: [2023-03-16 19:22:55,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +11: [2023-03-16 19:22:55,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +14: [2023-03-16 19:22:55,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:55,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:55,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:55,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:55,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:55,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +15: [2023-03-16 19:22:55,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +13: [2023-03-16 19:22:55,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +15: [2023-03-16 19:22:55,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +11: [2023-03-16 19:22:55,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +13: [2023-03-16 19:22:55,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +10: [2023-03-16 19:22:55,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +14: [2023-03-16 19:22:55,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... +12: [2023-03-16 19:22:55,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. +12: [2023-03-16 19:22:55,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_28-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +15: [2023-03-16 19:22:55,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +12: [2023-03-16 19:22:55,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +14: [2023-03-16 19:22:55,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +11: [2023-03-16 19:22:55,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +14: [2023-03-16 19:22:55,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +10: [2023-03-16 19:22:55,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... +13: [2023-03-16 19:22:55,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +14: [2023-03-16 19:22:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:55,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +14: [2023-03-16 19:22:55,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +14: [2023-03-16 19:22:55,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +14: [2023-03-16 19:22:55,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:55,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:55,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 3: [2023-03-16 19:22:55,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:55,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:55,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:55,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:55,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:55,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:55,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:55,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:55,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:55,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 2: [2023-03-16 19:22:55,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:55,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +11: [2023-03-16 19:22:55,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:55,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +13: [2023-03-16 19:22:55,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:55,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +15: [2023-03-16 19:22:55,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +12: [2023-03-16 19:22:55,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:55,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 8: [2023-03-16 19:22:55,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 9: [2023-03-16 19:22:55,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 1: [2023-03-16 19:22:55,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 4: [2023-03-16 19:22:55,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:55,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. +10: [2023-03-16 19:22:55,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 6: [2023-03-16 19:22:55,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 7: [2023-03-16 19:22:55,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:55,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 5: [2023-03-16 19:22:55,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:55,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:55,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 7: [2023-03-16 19:22:55,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:55,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:55,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:55,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:55,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_29-model_00-model_states.pt. + 0: [2023-03-16 19:22:55,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:55,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +13: [2023-03-16 19:22:56,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +13: [2023-03-16 19:22:56,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:56,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:56,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +14: [2023-03-16 19:22:56,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +14: [2023-03-16 19:22:56,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +14: [2023-03-16 19:22:56,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +14: [2023-03-16 19:22:56,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +14: [2023-03-16 19:22:56,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +11: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:56,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +11: [2023-03-16 19:22:56,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +12: [2023-03-16 19:22:56,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +12: [2023-03-16 19:22:56,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:56,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +15: [2023-03-16 19:22:56,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +15: [2023-03-16 19:22:56,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt... +10: [2023-03-16 19:22:56,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +10: [2023-03-16 19:22:56,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. +14: [2023-03-16 19:22:56,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_30-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:56,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 3: [2023-03-16 19:22:56,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 3: [2023-03-16 19:22:56,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:56,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:56,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +11: [2023-03-16 19:22:56,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +11: [2023-03-16 19:22:56,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:56,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:56,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:56,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:56,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 9: [2023-03-16 19:22:56,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:56,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +12: [2023-03-16 19:22:56,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +15: [2023-03-16 19:22:56,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 6: [2023-03-16 19:22:56,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:56,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +15: [2023-03-16 19:22:56,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +12: [2023-03-16 19:22:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +10: [2023-03-16 19:22:56,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +14: [2023-03-16 19:22:56,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +14: [2023-03-16 19:22:56,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... +13: [2023-03-16 19:22:56,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:56,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +13: [2023-03-16 19:22:56,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:56,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:56,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:56,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 7: [2023-03-16 19:22:56,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:56,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 8: [2023-03-16 19:22:56,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 5: [2023-03-16 19:22:56,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. +10: [2023-03-16 19:22:56,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:56,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 1: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 2: [2023-03-16 19:22:56,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:56,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:56,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:56,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:56,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_31-model_00-model_states.pt. + 0: [2023-03-16 19:22:56,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:56,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:56,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:56,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:56,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:56,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:56,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:56,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:56,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:56,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:56,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:56,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:56,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:56,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:56,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:56,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:56,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:56,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:56,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:56,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:56,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:56,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:56,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:57,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:57,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:57,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:57,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:57,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:57,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:57,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:57,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:57,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:57,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:57,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:57,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +12: [2023-03-16 19:22:57,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:57,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:57,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +12: [2023-03-16 19:22:57,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:57,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:57,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:57,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:57,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:57,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:57,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:57,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:57,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:57,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +14: [2023-03-16 19:22:57,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:57,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:57,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:57,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:57,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:57,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:57,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:57,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:57,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +15: [2023-03-16 19:22:57,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:57,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:57,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:57,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:57,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:57,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:57,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +13: [2023-03-16 19:22:57,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:57,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:57,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +15: [2023-03-16 19:22:57,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:57,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +14: [2023-03-16 19:22:57,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:57,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +10: [2023-03-16 19:22:57,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... +11: [2023-03-16 19:22:57,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +10: [2023-03-16 19:22:57,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +13: [2023-03-16 19:22:57,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. +11: [2023-03-16 19:22:57,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_32-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +14: [2023-03-16 19:22:57,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +15: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +11: [2023-03-16 19:22:57,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +10: [2023-03-16 19:22:57,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +10: [2023-03-16 19:22:57,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +10: [2023-03-16 19:22:57,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +10: [2023-03-16 19:22:57,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +12: [2023-03-16 19:22:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +12: [2023-03-16 19:22:57,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +14: [2023-03-16 19:22:57,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +12: [2023-03-16 19:22:57,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +10: [2023-03-16 19:22:57,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +10: [2023-03-16 19:22:57,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +15: [2023-03-16 19:22:57,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +12: [2023-03-16 19:22:57,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +12: [2023-03-16 19:22:57,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +12: [2023-03-16 19:22:57,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +12: [2023-03-16 19:22:57,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +12: [2023-03-16 19:22:57,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +12: [2023-03-16 19:22:57,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +11: [2023-03-16 19:22:57,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... +13: [2023-03-16 19:22:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. +13: [2023-03-16 19:22:57,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 5: [2023-03-16 19:22:57,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:57,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_33-model_00-model_states.pt. + 0: [2023-03-16 19:22:57,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:57,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:57,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:57,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:57,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:57,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:57,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:57,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:57,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:57,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 7: [2023-03-16 19:22:57,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:57,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:57,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 3: [2023-03-16 19:22:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:57,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +10: [2023-03-16 19:22:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:57,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:57,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:57,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:57,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:57,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:57,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:57,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:57,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:57,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:57,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +10: [2023-03-16 19:22:57,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +10: [2023-03-16 19:22:57,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +10: [2023-03-16 19:22:57,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +10: [2023-03-16 19:22:57,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +12: [2023-03-16 19:22:57,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:57,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 6: [2023-03-16 19:22:57,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 2: [2023-03-16 19:22:57,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:57,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:57,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:57,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 2: [2023-03-16 19:22:57,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:57,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:57,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:57,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +11: [2023-03-16 19:22:57,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 9: [2023-03-16 19:22:57,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:57,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:57,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 1: [2023-03-16 19:22:57,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:57,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:57,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:57,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:57,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:58,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +14: [2023-03-16 19:22:58,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:58,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:58,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:58,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:58,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +13: [2023-03-16 19:22:58,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:58,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:58,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:58,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:58,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:58,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:58,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:58,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:58,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +14: [2023-03-16 19:22:58,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:58,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:58,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:58,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +14: [2023-03-16 19:22:58,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +14: [2023-03-16 19:22:58,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +14: [2023-03-16 19:22:58,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +14: [2023-03-16 19:22:58,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +14: [2023-03-16 19:22:58,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt... +15: [2023-03-16 19:22:58,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:58,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +15: [2023-03-16 19:22:58,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:58,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:58,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:58,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:58,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:58,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:58,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:58,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:58,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:58,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:58,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:58,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:58,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:58,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +10: [2023-03-16 19:22:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:58,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +12: [2023-03-16 19:22:58,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +13: [2023-03-16 19:22:58,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:58,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. +11: [2023-03-16 19:22:58,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_34-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +14: [2023-03-16 19:22:58,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +15: [2023-03-16 19:22:58,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +13: [2023-03-16 19:22:58,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +11: [2023-03-16 19:22:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +10: [2023-03-16 19:22:58,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... +12: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +10: [2023-03-16 19:22:58,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +12: [2023-03-16 19:22:58,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +14: [2023-03-16 19:22:58,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +15: [2023-03-16 19:22:58,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 5: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +13: [2023-03-16 19:22:58,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. +11: [2023-03-16 19:22:58,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 4: [2023-03-16 19:22:58,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:58,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:58,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_35-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:58,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:58,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:58,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:58,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:58,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:58,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:58,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:58,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:58,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 3: [2023-03-16 19:22:58,961] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 29 + 3: [2023-03-16 19:22:58,961] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 30 + 3: [2023-03-16 19:22:58,961] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 24 + 3: [2023-03-16 19:22:58,962] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 31 + 3: [2023-03-16 19:22:58,962] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 25 + 3: [2023-03-16 19:22:58,962] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 26 + 3: [2023-03-16 19:22:58,962] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 27 + 8: [2023-03-16 19:22:58,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 3: [2023-03-16 19:22:58,963] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 28 + 8: [2023-03-16 19:22:58,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:58,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +13: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:58,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:58,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:58,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +10: [2023-03-16 19:22:58,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:58,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,976] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 86 +10: [2023-03-16 19:22:58,976] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 84 +10: [2023-03-16 19:22:58,976] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 87 +10: [2023-03-16 19:22:58,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,978] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 80 +10: [2023-03-16 19:22:58,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:58,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:58,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:58,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:58,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:58,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:58,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:58,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,983] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 85 +15: [2023-03-16 19:22:58,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +10: [2023-03-16 19:22:58,985] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 81 + 6: [2023-03-16 19:22:58,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +10: [2023-03-16 19:22:58,986] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 82 + 7: [2023-03-16 19:22:58,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +10: [2023-03-16 19:22:58,989] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 83 + 9: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +11: [2023-03-16 19:22:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:58,990] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 50 + 6: [2023-03-16 19:22:58,990] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 55 + 1: [2023-03-16 19:22:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +11: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,992] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 91 +11: [2023-03-16 19:22:58,992] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 89 +11: [2023-03-16 19:22:58,992] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 90 +12: [2023-03-16 19:22:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,992] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 93 +12: [2023-03-16 19:22:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +15: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 1: [2023-03-16 19:22:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... + 7: [2023-03-16 19:22:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... + 1: [2023-03-16 19:22:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... + 7: [2023-03-16 19:22:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:58,998] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 94 +12: [2023-03-16 19:22:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +11: [2023-03-16 19:22:58,999] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 88 + 7: [2023-03-16 19:22:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +11: [2023-03-16 19:22:59,000] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 92 + 7: [2023-03-16 19:22:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 1: [2023-03-16 19:22:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... + 1: [2023-03-16 19:22:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... + 9: [2023-03-16 19:22:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:59,001] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 101 +12: [2023-03-16 19:22:59,001] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 102 +11: [2023-03-16 19:22:59,001] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 95 + 7: [2023-03-16 19:22:59,003] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 57 + 7: [2023-03-16 19:22:59,003] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 63 + 7: [2023-03-16 19:22:59,003] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 60 + 1: [2023-03-16 19:22:59,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... + 1: [2023-03-16 19:22:59,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... + 2: [2023-03-16 19:22:59,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:59,005] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 62 + 1: [2023-03-16 19:22:59,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... + 7: [2023-03-16 19:22:59,006] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 58 + 2: [2023-03-16 19:22:59,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:59,008] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 61 + 2: [2023-03-16 19:22:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 7: [2023-03-16 19:22:59,008] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 59 + 2: [2023-03-16 19:22:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 7: [2023-03-16 19:22:59,009] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 56 + 2: [2023-03-16 19:22:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 2: [2023-03-16 19:22:59,018] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 18 + 2: [2023-03-16 19:22:59,018] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 21 + 2: [2023-03-16 19:22:59,018] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 16 + 2: [2023-03-16 19:22:59,018] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 19 + 2: [2023-03-16 19:22:59,019] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 17 + 2: [2023-03-16 19:22:59,019] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 23 + 6: [2023-03-16 19:22:59,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +12: [2023-03-16 19:22:59,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,021] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 22 +12: [2023-03-16 19:22:59,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:59,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 2: [2023-03-16 19:22:59,025] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 20 + 4: [2023-03-16 19:22:59,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:59,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:59,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:59,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,033] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 52 +12: [2023-03-16 19:22:59,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:59,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:59,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:59,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +12: [2023-03-16 19:22:59,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 6: [2023-03-16 19:22:59,037] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 51 + 6: [2023-03-16 19:22:59,037] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 49 +12: [2023-03-16 19:22:59,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 6: [2023-03-16 19:22:59,038] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 53 + 6: [2023-03-16 19:22:59,038] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 54 +12: [2023-03-16 19:22:59,039] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 103 +14: [2023-03-16 19:22:59,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:59,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:59,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:59,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:59,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:59,039] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 97 + 6: [2023-03-16 19:22:59,040] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 48 +13: [2023-03-16 19:22:59,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,041] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 98 +12: [2023-03-16 19:22:59,044] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 100 +13: [2023-03-16 19:22:59,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:59,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,045] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 119 +14: [2023-03-16 19:22:59,045] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 116 +13: [2023-03-16 19:22:59,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +12: [2023-03-16 19:22:59,047] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 96 +15: [2023-03-16 19:22:59,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +12: [2023-03-16 19:22:59,047] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 99 + 8: [2023-03-16 19:22:59,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:59,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:59,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:59,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:59,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:59,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:59,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 8: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +13: [2023-03-16 19:22:59,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt... +14: [2023-03-16 19:22:59,069] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 118 + 9: [2023-03-16 19:22:59,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +13: [2023-03-16 19:22:59,070] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 111 +13: [2023-03-16 19:22:59,070] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 110 +13: [2023-03-16 19:22:59,070] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 107 +13: [2023-03-16 19:22:59,070] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 109 +15: [2023-03-16 19:22:59,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:59,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,072] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 105 +15: [2023-03-16 19:22:59,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,072] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 108 +14: [2023-03-16 19:22:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +13: [2023-03-16 19:22:59,075] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 106 +13: [2023-03-16 19:22:59,075] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 104 + 0: [2023-03-16 19:22:59,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +14: [2023-03-16 19:22:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 8: [2023-03-16 19:22:59,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 69 + 8: [2023-03-16 19:22:59,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 64 + 8: [2023-03-16 19:22:59,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 68 + 8: [2023-03-16 19:22:59,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 66 + 8: [2023-03-16 19:22:59,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 70 + 8: [2023-03-16 19:22:59,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 65 + 8: [2023-03-16 19:22:59,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 71 + 8: [2023-03-16 19:22:59,079] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 67 + 0: [2023-03-16 19:22:59,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +14: [2023-03-16 19:22:59,080] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 117 + 9: [2023-03-16 19:22:59,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,080] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 114 +15: [2023-03-16 19:22:59,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: [2023-03-16 19:22:59,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +14: [2023-03-16 19:22:59,083] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 112 +14: [2023-03-16 19:22:59,083] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 115 + 0: [2023-03-16 19:22:59,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +15: [2023-03-16 19:22:59,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +15: [2023-03-16 19:22:59,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... +14: [2023-03-16 19:22:59,087] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 113 + 4: [2023-03-16 19:22:59,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... +15: [2023-03-16 19:22:59,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. +15: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... +15: [2023-03-16 19:22:59,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... + 9: [2023-03-16 19:22:59,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 0: > overriding learning rate value to 0.0002 + 0: > overriding minimum learning rate value to 2e-05 + 0: > overriding warmup iterations value to 0 + 0: > overriding total number of iterations value to 1 + 0: > overriding decay style value to cosine + 0: [2023-03-16 19:22:59,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... + 0: [2023-03-16 19:22:59,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... + 9: [2023-03-16 19:22:59,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 0: [2023-03-16 19:22:59,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... + 9: [2023-03-16 19:22:59,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 9: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 9: [2023-03-16 19:22:59,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. +15: [2023-03-16 19:22:59,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 120 +15: [2023-03-16 19:22:59,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 123 +15: [2023-03-16 19:22:59,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 121 +15: [2023-03-16 19:22:59,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 127 +15: [2023-03-16 19:22:59,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 126 +15: [2023-03-16 19:22:59,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 122 +15: [2023-03-16 19:22:59,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 125 +15: [2023-03-16 19:22:59,105] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 124 + 9: [2023-03-16 19:22:59,109] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 74 + 9: [2023-03-16 19:22:59,109] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 72 + 9: [2023-03-16 19:22:59,109] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 76 + 9: [2023-03-16 19:22:59,109] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 73 + 9: [2023-03-16 19:22:59,109] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 75 + 9: [2023-03-16 19:22:59,110] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 77 + 9: [2023-03-16 19:22:59,112] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 78 + 9: [2023-03-16 19:22:59,113] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 79 + 4: [2023-03-16 19:22:59,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,122] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 37 + 4: [2023-03-16 19:22:59,122] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 36 + 4: [2023-03-16 19:22:59,122] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 38 + 4: [2023-03-16 19:22:59,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 4: [2023-03-16 19:22:59,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_36-model_00-model_states.pt. + 4: [2023-03-16 19:22:59,131] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 39 + 4: [2023-03-16 19:22:59,132] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 34 + 4: [2023-03-16 19:22:59,135] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 35 + 4: [2023-03-16 19:22:59,136] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 32 + 4: [2023-03-16 19:22:59,136] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 33 + 5: [2023-03-16 19:22:59,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt... + 5: [2023-03-16 19:22:59,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/layer_38-model_00-model_states.pt. + 5: [2023-03-16 19:22:59,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 47 + 5: [2023-03-16 19:22:59,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 42 + 5: [2023-03-16 19:22:59,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 44 + 5: [2023-03-16 19:22:59,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 40 + 5: [2023-03-16 19:22:59,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 41 + 5: [2023-03-16 19:22:59,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 45 + 5: [2023-03-16 19:22:59,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 46 + 5: [2023-03-16 19:22:59,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 43 + 0: [2023-03-16 19:23:05,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. + 0: [2023-03-16 19:23:05,812] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 5 + 1: [2023-03-16 19:23:05,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. + 1: [2023-03-16 19:23:05,831] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 9 + 0: [2023-03-16 19:23:06,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. + 0: [2023-03-16 19:23:06,056] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 0 + 0: [2023-03-16 19:23:06,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. + 0: [2023-03-16 19:23:06,099] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 7 + 1: [2023-03-16 19:23:06,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. + 1: [2023-03-16 19:23:06,137] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 14 + 1: [2023-03-16 19:23:06,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. + 1: [2023-03-16 19:23:06,257] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 12 + 1: [2023-03-16 19:23:06,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. + 1: [2023-03-16 19:23:06,394] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 13 + 1: [2023-03-16 19:23:06,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. + 1: [2023-03-16 19:23:06,450] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 15 + 1: [2023-03-16 19:23:06,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. + 1: [2023-03-16 19:23:06,532] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 10 + 1: [2023-03-16 19:23:06,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. + 1: [2023-03-16 19:23:06,532] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 8 + 0: [2023-03-16 19:23:06,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. + 0: [2023-03-16 19:23:06,569] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 2 + 0: [2023-03-16 19:23:06,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. + 0: [2023-03-16 19:23:06,613] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 1 + 0: [2023-03-16 19:23:06,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. + 0: [2023-03-16 19:23:06,757] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 3 + 0: [2023-03-16 19:23:06,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. + 0: [2023-03-16 19:23:06,901] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 4 + 0: [2023-03-16 19:23:06,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. + 0: [2023-03-16 19:23:06,911] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 6 + 1: [2023-03-16 19:23:08,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step190/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. + 1: [2023-03-16 19:23:08,864] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 16 ZeRO state_dicts for rank 11 diff --git a/2b8100m100m/3325438.err b/2b8100m100m/3325438.err new file mode 100644 index 0000000000000000000000000000000000000000..e551825956b5671b7cfbe3bf2e12d604545edce2 --- /dev/null +++ b/2b8100m100m/3325438.err @@ -0,0 +1,2212 @@ + 4: 2023-03-16 20:43:34.768420: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 20:43:34.768434: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 20:43:34.768440: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 20:43:34.768435: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 20:43:34.768456: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 20:43:34.768455: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 20:43:34.768457: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 20:43:34.768533: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 20:43:34.769775: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 20:43:34.769778: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 20:43:34.769790: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 20:43:34.769819: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 20:43:34.769834: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 20:43:34.769851: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 20:43:34.769865: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 20:43:34.769874: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 20:43:34.770915: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 20:43:34.770940: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 20:43:34.770914: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 20:43:34.771004: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 20:43:34.771000: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 20:43:34.771020: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 20:43:34.771030: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 20:43:34.771054: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 20:43:34.771668: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 20:43:34.771698: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 20:43:34.771723: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 20:43:34.771738: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 20:43:34.771765: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 20:43:34.771795: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 20:43:34.771810: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 20:43:34.771796: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 20:43:34.772653: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 20:43:34.772673: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 20:43:34.772659: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 20:43:34.772708: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 20:43:34.772726: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 20:43:34.772745: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 20:43:34.772722: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 20:43:34.772755: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 20:43:34.773497: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 20:43:34.773484: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 20:43:34.773482: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 20:43:34.773553: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 20:43:34.773561: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 20:43:34.773572: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 20:43:34.773574: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 20:43:34.773545: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 20:43:34.773939: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 20:43:34.774005: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 20:43:34.774006: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 20:43:34.774048: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 20:43:34.774042: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 20:43:34.774057: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 20:43:34.774049: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 20:43:34.774086: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 20:43:34.774671: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 20:43:34.774686: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 20:43:34.774689: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 20:43:34.774736: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 20:43:34.774748: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 20:43:34.774761: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 20:43:34.774764: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 20:43:34.774770: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 20:43:34.775157: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 20:43:34.775164: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 20:43:34.775191: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 20:43:34.775222: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 20:43:34.775238: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 20:43:34.775243: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 20:43:34.775244: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 20:43:34.775220: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 20:43:34.775649: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 20:43:34.775649: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 20:43:34.775658: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 20:43:34.775677: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 20:43:34.775703: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 20:43:34.775721: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 20:43:34.775729: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 20:43:34.775700: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 20:43:34.776294: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 20:43:34.776305: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 20:43:34.776336: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 20:43:34.776375: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 20:43:34.776336: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 20:43:34.776385: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 20:43:34.776394: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 20:43:34.776405: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 20:43:34.777129: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 20:43:34.777144: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 20:43:34.777163: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 20:43:34.777149: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 20:43:34.777328: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 20:43:34.777339: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 20:43:34.777349: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: 2023-03-16 20:43:34.777193: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 20:43:34.777210: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 20:43:34.777347: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 20:43:34.777209: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 20:43:34.777363: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 20:43:34.777377: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 20:43:34.777272: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 20:43:34.777367: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 20:43:34.777388: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 20:43:34.777791: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 20:43:34.777808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 20:43:34.777817: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 20:43:34.777871: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 20:43:34.777896: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 20:43:34.777893: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 20:43:34.777897: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 20:43:34.777924: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 20:43:34.778027: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 20:43:34.778022: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 20:43:34.778055: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 20:43:34.778089: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 20:43:34.778111: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 20:43:34.778085: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 20:43:34.778130: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 20:43:34.778146: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: 2023-03-16 20:43:34.778199: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 20:43:34.778207: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 20:43:34.778218: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 20:43:34.778226: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 20:43:34.778228: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 20:43:34.778229: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 20:43:34.778234: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 20:43:34.778246: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 20:43:47.906840: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:43:47.906869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:43:47.906899: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:43:47.906916: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:43:47.906934: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:43:47.906944: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:43:47.906944: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:43:47.906965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:43:47.913914: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 20:43:47.913943: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 20:43:47.908026: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 2023-03-16 20:43:47.906844: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 20:43:47.913967: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 20:43:47.913978: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:43:47.913993: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 20:43:47.914001: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 20:43:47.908191: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 20:43:47.908064: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 20:43:47.914052: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 20:43:47.908110: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 20:43:47.907165: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 2023-03-16 20:43:47.914068: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 20:43:47.909031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 20:43:47.908683: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 2023-03-16 20:43:47.908200: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 20:43:47.908922: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 20:43:47.908080: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 20:43:47.908928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 2023-03-16 20:43:47.906875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.914270: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 20:43:47.914284: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 20:43:47.908942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:43:47.914159: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 20:43:47.908995: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:43:47.909068: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.908240: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:43:47.908118: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:43:47.906898: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:43:47.908382: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 20:43:47.908128: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:43:47.914355: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 20:43:47.908714: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 2023-03-16 20:43:47.908978: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:43:47.908963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 20:43:47.908953: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 20:43:47.908931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:43:47.908942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 20:43:47.907197: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:43:47.914237: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 20:43:47.909078: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:43:47.914316: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.908251: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:43:47.908130: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 2023-03-16 20:43:47.908645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:43:47.914294: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 20:43:47.906913: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:43:47.908372: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 20:43:47.908147: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:43:47.908760: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 2023-03-16 20:43:47.909009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:43:47.908970: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 20:43:47.908976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 20:43:47.908960: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:43:47.908954: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:43:47.914179: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 20:43:47.907224: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 20:43:47.909030: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:43:47.914433: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:43:47.914257: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 20:43:47.909099: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.908268: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:43:47.908138: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 2023-03-16 20:43:47.908692: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:43:47.906927: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:43:47.914358: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 20:43:47.908429: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 20:43:47.908156: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:43:47.914348: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.909041: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:43:47.908981: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 20:43:47.908990: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 20:43:47.908976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:43:47.914542: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 20:43:47.908963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 20:43:47.907216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 20:43:47.909028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:43:47.909110: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 20:43:47.908785: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.908286: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:43:47.914348: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:43:47.908095: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 2023-03-16 20:43:47.908721: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:43:47.914316: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 20:43:47.906930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:43:47.908463: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 20:43:47.908168: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:43:47.914374: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 20:43:47.914377: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 20:43:47.914393: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 20:43:47.914394: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:43:47.909037: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.914326: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 20:43:47.908999: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 20:43:47.909006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 20:43:47.908980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:43:47.908972: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:43:47.914198: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 20:43:47.907236: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 20:43:47.909051: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:43:47.914471: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:43:47.914281: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 20:43:47.909115: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 20:43:47.908793: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.908294: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:43:47.908182: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 2023-03-16 20:43:47.914570: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 20:43:47.914592: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:43:47.914335: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 20:43:47.906942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:43:47.914379: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 20:43:47.908468: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 20:43:47.908158: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:43:47.914364: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.909072: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:43:47.909010: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 20:43:47.909030: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 20:43:47.909000: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:43:47.908736: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 20:43:47.908969: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 20:43:47.907239: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 20:43:47.909050: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:43:47.914486: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:43:47.909135: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 20:43:47.908773: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.908299: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:43:47.914374: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 20:43:47.914377: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 20:43:47.914372: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:43:47.914495: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:43:47.914346: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 20:43:47.906952: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:43:47.908446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 20:43:47.908174: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:43:47.914429: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 20:43:47.914431: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:43:47.909086: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.914342: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 20:43:47.908999: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 20:43:47.909002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 20:43:47.909009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 20:43:47.914523: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 20:43:47.908703: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 20:43:47.908982: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:43:47.914213: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 20:43:47.914226: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 20:43:47.914239: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 20:43:47.914249: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 20:43:47.907244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 20:43:47.909072: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:43:47.909145: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 2023-03-16 20:43:47.908800: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:43:47.914344: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 20:43:47.914364: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 20:43:47.914370: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 20:43:47.914381: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:43:47.914536: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 20:43:47.914557: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:43:47.914250: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:43:47.914405: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 20:43:47.908481: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 20:43:47.908183: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:43:47.909098: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 2023-03-16 20:43:47.908991: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 20:43:47.909030: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 20:43:47.909020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 20:43:47.914570: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 20:43:47.908725: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 2023-03-16 20:43:47.909028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 2023-03-16 20:43:47.907247: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 20:43:47.909074: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:43:47.914532: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 20:43:47.914535: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 20:43:47.914547: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:43:47.914327: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 20:43:47.914326: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 20:43:47.914336: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 20:43:47.914349: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 20:43:47.914584: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 20:43:47.908805: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:43:47.914398: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 20:43:47.914405: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 20:43:47.914427: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 20:43:47.914439: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:43:47.914583: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 20:43:47.914585: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:43:47.914367: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 20:43:47.914382: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 20:43:47.914393: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 20:43:47.914402: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:43:47.914412: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 20:43:47.908521: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 20:43:47.914353: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:43:47.914399: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 20:43:47.914402: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 20:43:47.914419: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 20:43:47.914424: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.909101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 2023-03-16 20:43:47.914627: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 20:43:47.909000: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 20:43:47.914631: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 20:43:47.908746: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 2023-03-16 20:43:47.914584: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 20:43:47.909068: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:43:47.914573: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 20:43:47.914619: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 20:43:47.914431: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:43:47.914656: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:43:47.914629: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 20:43:47.914652: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 20:43:47.914619: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 20:43:47.914626: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:43:47.914425: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 20:43:47.914442: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 20:43:47.914448: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 20:43:47.914453: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.914739: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 20:43:47.914674: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 20:43:47.914674: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 20:43:47.914683: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 20:43:47.914717: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 20:43:47.908736: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 2023-03-16 20:43:47.914650: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 20:43:47.914655: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.914770: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 20:43:47.914847: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 20:43:47.914855: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 20:43:47.914749: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:43:47.914658: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 20:43:47.914658: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 20:43:47.914694: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 20:43:47.914670: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.914792: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.914817: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.914825: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 20:43:47.914869: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 20:43:47.914766: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 20:43:47.914760: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.914934: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.914936: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 20:43:47.914787: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 20:43:47.914840: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 20:43:47.914857: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 20:43:47.914952: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 20:43:47.914947: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 20:43:47.914968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 20:43:47.914972: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 20:44:22.287770: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 20:44:22.287820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.287784: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 20:44:22.287860: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.287794: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 20:44:22.287874: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.287805: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 20:44:22.287887: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.287812: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 20:44:22.287894: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.287826: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 20:44:22.287901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.287822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 20:44:22.287921: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.287824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: 2023-03-16 20:44:22.288182: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 20:44:22.287929: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:44:22.288198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:44:22.288214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: 2023-03-16 20:44:22.288327: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:44:22.288227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:44:22.288226: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:44:22.288232: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:44:22.288349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: 2023-03-16 20:44:22.288241: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:44:22.288354: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: 2023-03-16 20:44:22.288245: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.288429: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:44:22.288366: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:44:22.288389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.288444: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: 2023-03-16 20:44:22.288384: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.288462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: 2023-03-16 20:44:22.288385: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.288468: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: 2023-03-16 20:44:22.288394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.288482: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.288489: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.288503: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.288524: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.288742: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.288760: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.288772: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.288786: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:44:22.288872: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 20:44:22.288782: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.288791: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.288799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:44:22.288891: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 20:44:22.288965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 20:44:22.288812: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:44:22.288916: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 20:44:22.288974: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:44:22.288924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.289017: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: 2023-03-16 20:44:22.288933: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 20:44:22.288992: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 20:44:22.288996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.289069: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: 2023-03-16 20:44:22.288937: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 20:44:22.288989: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 20:44:22.289002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.289033: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.288989: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 20:44:22.289137: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 20:44:22.289013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.289088: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.289046: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.288994: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 20:44:22.289018: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.289110: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.289053: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.289012: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 20:44:22.289155: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 20:44:22.289028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.289120: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.289062: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.289019: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 20:44:22.289167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 20:44:22.289031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.289135: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.289064: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.289033: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: 2023-03-16 20:44:22.289174: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 20:44:22.289042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.289138: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.289073: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.289185: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.289145: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.289078: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.289199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.289167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.289206: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.289202: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.290356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: 2023-03-16 20:44:22.290361: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.290356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:44:22.290360: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 20:44:22.290359: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:44:22.290363: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 20:44:22.290361: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:44:22.290362: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:44:22.290365: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.290361: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.290367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.290373: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 20:44:22.290373: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 20:44:22.290375: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 20:44:22.290379: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 20:44:22.290367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 20:44:22.290379: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 20:44:22.290382: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 20:44:22.290428: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: 2023-03-16 20:44:22.290367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:44:22.290377: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 20:44:22.290378: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 20:44:22.290433: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: 2023-03-16 20:44:22.290380: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 20:44:22.290382: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 20:44:22.290383: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:44:22.290383: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 20:44:22.290386: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 20:44:22.290441: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 20:44:22.290446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 20:44:22.290394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 20:44:22.290408: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 20:44:22.290916: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.290958: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 20:44:22.290917: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.290961: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 20:44:22.290923: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.290963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 20:44:22.290926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.290964: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 20:44:22.290927: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.290962: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 20:44:22.290927: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:44:22.290931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 20:44:22.290931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 20:44:22.290965: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 20:44:22.290925: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.290966: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.290968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.290976: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 20:44:22.290975: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 20:44:22.290930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 20:44:22.291339: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 20:44:22.290980: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 20:44:22.290982: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 20:44:22.290984: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 20:44:22.290943: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 20:44:22.290943: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 20:44:22.290984: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 20:44:22.290986: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 20:44:22.290987: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 20:44:22.290944: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 20:44:22.290947: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 20:44:22.290949: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 20:44:22.291340: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 20:44:22.290951: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:44:22.291340: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:44:22.291340: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.291533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 20:44:22.291344: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:44:22.291346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.291535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 20:44:22.291346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.291535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 20:44:22.291347: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:44:22.291354: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 20:44:22.291358: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 20:44:22.291537: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.288942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: 2023-03-16 20:44:22.291358: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 20:44:22.291361: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 20:44:22.291361: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 20:44:22.291717: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 20:44:22.291361: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 20:44:22.291363: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 20:44:22.291364: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 20:44:22.291533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.288945: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.291723: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 20:44:22.291538: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.291598: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.291726: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 20:44:22.291540: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.291600: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.291725: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 20:44:22.291541: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.291599: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 20:44:22.291549: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 20:44:22.291553: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.291912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: 2023-03-16 20:44:22.291728: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 20:44:22.291895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 20:44:22.291554: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 20:44:22.291556: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 20:44:22.291558: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 20:44:22.291602: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.291733: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 20:44:22.291559: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 20:44:22.291561: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 20:44:22.291562: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.291728: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.291601: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.291917: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: 2023-03-16 20:44:22.291732: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 20:44:22.291900: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.291606: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 20:44:22.292002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.291919: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: 2023-03-16 20:44:22.291729: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 20:44:22.291900: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.291609: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.291738: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:44:22.291613: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 20:44:22.291621: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 20:44:22.291919: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: 2023-03-16 20:44:22.291742: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 20:44:22.291744: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 20:44:22.291745: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.291902: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.291611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 20:44:22.291749: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 20:44:22.291749: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 20:44:22.291748: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292002: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 20:44:22.291617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 20:44:22.291622: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 20:44:22.291920: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.291901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 20:44:22.291624: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.291932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292006: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.291912: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 20:44:22.291623: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 20:44:22.291631: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 20:44:22.291634: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 20:44:22.291920: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.291903: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 20:44:22.292009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.291922: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.291903: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 20:44:22.292010: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.291923: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.291906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 20:44:22.291929: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 20:44:22.291930: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292010: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.291918: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 20:44:22.291917: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 20:44:22.291942: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 20:44:22.291941: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 20:44:22.291943: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 20:44:22.291922: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 20:44:22.291923: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 20:44:22.291927: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 20:44:22.291945: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 20:44:22.291945: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292011: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 20:44:22.291928: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 20:44:22.291929: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 20:44:22.292019: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292019: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292024: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292012: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 20:44:22.292027: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292028: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292029: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292033: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 20:44:22.292036: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 20:44:22.294906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.294953: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.294959: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.294977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.294973: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.294998: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.295005: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.295027: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.297394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.297396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.297397: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.297393: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.297395: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.297395: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.297403: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.297412: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 20:44:22.297404: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 20:44:22.297411: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 20:44:22.297414: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 20:44:22.297418: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 20:44:22.297418: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 20:44:22.297418: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 20:44:22.297422: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 20:44:22.297423: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 20:44:22.303249: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.303274: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.303290: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.303323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.303322: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:44:22.303431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 20:44:22.303333: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.303347: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.303351: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:44:22.303471: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:44:22.303491: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:44:22.303510: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:44:22.303520: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:44:22.303529: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:44:22.303538: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:44:22.303568: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305873: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 20:44:22.305901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305878: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305878: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 20:44:22.305901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305878: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 20:44:22.305905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305882: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 20:44:22.305905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305883: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 20:44:22.305905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305885: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 20:44:22.305901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305890: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 20:44:22.305896: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 20:44:22.305899: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 20:44:22.305900: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 20:44:22.305912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: 2023-03-16 20:44:22.305901: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 20:44:22.305904: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 20:44:22.305912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 20:44:22.305917: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 20:44:22.305918: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 20:44:22.305923: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 20:44:22.305924: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 20:44:22.305923: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 20:44:22.305926: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 20:44:22.305926: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 20:44:22.305930: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 20:44:22.305931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 20:44:22.329138: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.329176: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.329188: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.329220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.329223: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.329231: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.329237: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.329300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.291615: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.291618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.291621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.291623: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.291627: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.291632: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.291629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.291643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 20:44:22.291644: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 20:44:22.291647: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 20:44:22.291651: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 20:44:22.291653: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 20:44:22.291654: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 20:44:22.291656: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 20:44:22.291718: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 20:44:22.291737: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 20:44:22.331532: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.331539: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.331537: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.331542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.331541: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.331547: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 20:44:22.331545: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.331541: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.331547: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 20:44:22.331555: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 20:44:22.331558: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 20:44:22.331560: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 20:44:22.331561: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 20:44:22.331563: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 20:44:22.331565: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 20:44:22.331565: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module scaled_upper_triang_masked_softmax_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module scaled_upper_triang_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module scaled_masked_softmax_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module scaled_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module fused_mix_prec_layer_norm_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module fused_mix_prec_layer_norm_cuda... + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. +14: Successfully preprocessed all matching files. +14: Successfully preprocessed all matching files. +14: Successfully preprocessed all matching files. +13: Successfully preprocessed all matching files. + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 4: Building extension module utils... + 4: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 4: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: + 2: + 2: + 2: + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: + 1: + 1: + 1: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: + 6: + 6: + 6: + 6: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: + 9: + 9: + 9: + 9: + 9: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: + 8: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: + 8: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: + 5: + 5: + 5: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: +12: +12: +12: +12: +12: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +15: +15: +15: +15: +15: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 0: Building extension module utils... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module utils... + 7: Loading extension module utils... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 4: Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + 4: + 4: + 4: + 4: + 4: Loading extension module utils...Loading extension module utils... + 4: + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 1: Loading extension module utils... + 2: Loading extension module utils... + 1: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 3: Loading extension module utils... + 1: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 7: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... +14: Loading extension module utils... + 8: Loading extension module utils... +14: Loading extension module utils... + 8: Loading extension module utils... +14: Loading extension module utils... + 8: Loading extension module utils... +13: Loading extension module utils... + 8: Loading extension module utils... +13: Loading extension module utils... +14: Loading extension module utils... + 8: Loading extension module utils... + 5: Loading extension module utils... + 8: Loading extension module utils... +14: Loading extension module utils... +13: Loading extension module utils... + 5: Loading extension module utils... +13: Loading extension module utils... + 8: Loading extension module utils... +14: Loading extension module utils... + 5: Loading extension module utils... +11: Loading extension module utils... +13: Loading extension module utils... + 5: Loading extension module utils... +14: Loading extension module utils... + 8: Loading extension module utils... +13: Loading extension module utils... +11: Loading extension module utils... +13: Loading extension module utils... + 5: Loading extension module utils... +14: Loading extension module utils... +12: Loading extension module utils... +11: Loading extension module utils... + 5: Loading extension module utils... +13: Loading extension module utils... +12: Loading extension module utils... +11: Loading extension module utils... + 5: Loading extension module utils... +11: Loading extension module utils... + 5: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... +12: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: +12: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 0: + 0: Loading extension module utils...Loading extension module utils... + 0: +12: Loading extension module utils... + 7: Loading extension module utils... +12: Loading extension module utils... + 0: Loading extension module utils... + 7: Loading extension module utils... +12: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... +10: Loading extension module utils... +15: Loading extension module utils... +10: Loading extension module utils... +15: Loading extension module utils... +10: Loading extension module utils... +15: Loading extension module utils... +10: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +10: Loading extension module utils... +15: Loading extension module utils... +10: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 4: + 4: Loading extension module utils... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 4: Loading extension module utils... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 0: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... +10: Loading extension module utils... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 1: + 3: Loading extension module utils... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 1: + 1: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 2: + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +14: +14: Loading extension module utils...Loading extension module utils... +14: +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 6: + 6: Loading extension module utils...Loading extension module utils... + 6: + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 6: + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +13: +13: Loading extension module utils...Loading extension module utils... +13: +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +13: +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +12: +12: +12: Loading extension module utils...Loading extension module utils...Loading extension module utils... +12: +12: +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +12: +12: Loading extension module utils... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: Loading extension module utils...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 5: + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 5: + 5: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... +11: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +11: +11: Loading extension module utils...Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: + 5: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +11: +11: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 8: + 8: Loading extension module utils...Loading extension module utils... + 8: + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 8: + 8: Loading extension module utils...Loading extension module utils... + 8: + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + 0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/2b8100m100m/3325438.out b/2b8100m100m/3325438.out new file mode 100644 index 0000000000000000000000000000000000000000..40ed435183d0bb1c2ef606b6a52d7443c9acd634 --- /dev/null +++ b/2b8100m100m/3325438.out @@ -0,0 +1,1809 @@ +Model parameters: d_model 2560 ffw_size 10240 kv_size 128 n_heads 20 n_layers 34 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 34 --hidden-size 2560 --num-attention-heads 20 --kv-channels 128 --ffn-hidden-size 10240 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 2 --global-batch-size 512 --train-samples 48_828 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-2b8100m100m --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 48_828 --lr-warmup-samples 488 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 10000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_2b8100m100m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_2b8100m100m --load checkpoints_2b8100m100m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3325438.json --zero-stage 0 +START 3325438: Thu 16 Mar 2023 08:42:48 PM EET + 0: + 0: + 0: ======================= ROCm System Management Interface ======================= + 0: ================================= Concise Info ================================= + 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 0: 0 44.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 2 37.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 4 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 6 38.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: ================================================================================ + 0: ============================= End of ROCm SMI Log ============================== +12: +12: +12: ======================= ROCm System Management Interface ======================= +12: ================================= Concise Info ================================= +12: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +12: 0 47.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 2 39.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 4 46.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 6 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: ================================================================================ +12: ============================= End of ROCm SMI Log ============================== +15: +15: +15: ======================= ROCm System Management Interface ======================= +15: ================================= Concise Info ================================= +15: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +15: 0 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 2 37.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 4 44.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 6 36.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: ================================================================================ +15: ============================= End of ROCm SMI Log ============================== + 2: + 2: + 2: ======================= ROCm System Management Interface ======================= + 2: ================================= Concise Info ================================= + 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 2: 0 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 2 44.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 4 41.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 6 38.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: ================================================================================ + 2: ============================= End of ROCm SMI Log ============================== + 6: + 6: + 6: ======================= ROCm System Management Interface ======================= + 6: ================================= Concise Info ================================= + 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 6: 0 42.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 2 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 4 44.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 6 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: ================================================================================ + 6: ============================= End of ROCm SMI Log ============================== +10: +10: +10: ======================= ROCm System Management Interface ======================= +10: ================================= Concise Info ================================= +10: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +10: 0 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 2 42.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 4 42.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 6 38.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: ================================================================================ +10: ============================= End of ROCm SMI Log ============================== + 7: + 7: + 7: ======================= ROCm System Management Interface ======================= + 7: ================================= Concise Info ================================= + 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 7: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 2 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 4 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 6 38.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: ================================================================================ + 7: ============================= End of ROCm SMI Log ============================== + 5: + 5: + 5: ======================= ROCm System Management Interface ======================= + 5: ================================= Concise Info ================================= + 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 5: 0 48.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 2 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 4 41.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 6 38.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: ================================================================================ + 5: ============================= End of ROCm SMI Log ============================== +13: +13: +13: ======================= ROCm System Management Interface ======================= +13: ================================= Concise Info ================================= +13: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +13: 0 42.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 2 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 4 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 6 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: ================================================================================ +13: ============================= End of ROCm SMI Log ============================== + 4: + 4: + 4: ======================= ROCm System Management Interface ======================= + 4: ================================= Concise Info ================================= + 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 4: 0 46.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 4 44.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 6 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: ================================================================================ + 4: ============================= End of ROCm SMI Log ============================== + 9: + 9: + 9: ======================= ROCm System Management Interface ======================= + 9: ================================= Concise Info ================================= + 9: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 9: 0 48.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 2 43.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 4 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 6 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: ================================================================================ + 9: ============================= End of ROCm SMI Log ============================== + 3: + 3: + 3: ======================= ROCm System Management Interface ======================= + 3: ================================= Concise Info ================================= + 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 3: 0 43.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 2 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 4 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 6 40.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: ================================================================================ + 3: ============================= End of ROCm SMI Log ============================== +11: +11: +11: ======================= ROCm System Management Interface ======================= +11: ================================= Concise Info ================================= +11: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +11: 0 49.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 2 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 4 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 6 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: ================================================================================ +11: ============================= End of ROCm SMI Log ============================== + 1: + 1: + 1: ======================= ROCm System Management Interface ======================= + 1: ================================= Concise Info ================================= + 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 1: 0 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 2 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 4 38.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 5 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 6 41.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: ================================================================================ + 1: ============================= End of ROCm SMI Log ============================== +14: +14: +14: ======================= ROCm System Management Interface ======================= +14: ================================= Concise Info ================================= +14: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +14: 0 46.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 2 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 4 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 6 42.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 7 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: ================================================================================ +14: ============================= End of ROCm SMI Log ============================== + 8: + 8: + 8: ======================= ROCm System Management Interface ======================= + 8: ================================= Concise Info ================================= + 8: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 8: 0 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 1 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 2 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 4 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 6 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: ================================================================================ + 8: ============================= End of ROCm SMI Log ============================== +15: Launching on nid005299 (15/16), master nid005284 port 9999, GPUs 8, CUDA: True + 4: Launching on nid005288 (4/16), master nid005284 port 9999, GPUs 8, CUDA: True + 3: Launching on nid005287 (3/16), master nid005284 port 9999, GPUs 8, CUDA: True +12: Launching on nid005296 (12/16), master nid005284 port 9999, GPUs 8, CUDA: True + 9: Launching on nid005293 (9/16), master nid005284 port 9999, GPUs 8, CUDA: True + 0: Launching on nid005284 (0/16), master nid005284 port 9999, GPUs 8, CUDA: True + 8: Launching on nid005292 (8/16), master nid005284 port 9999, GPUs 8, CUDA: True +14: Launching on nid005298 (14/16), master nid005284 port 9999, GPUs 8, CUDA: True +13: Launching on nid005297 (13/16), master nid005284 port 9999, GPUs 8, CUDA: True + 7: Launching on nid005291 (7/16), master nid005284 port 9999, GPUs 8, CUDA: True +11: Launching on nid005295 (11/16), master nid005284 port 9999, GPUs 8, CUDA: True + 5: Launching on nid005289 (5/16), master nid005284 port 9999, GPUs 8, CUDA: True + 2: Launching on nid005286 (2/16), master nid005284 port 9999, GPUs 8, CUDA: True + 1: Launching on nid005285 (1/16), master nid005284 port 9999, GPUs 8, CUDA: True +10: Launching on nid005294 (10/16), master nid005284 port 9999, GPUs 8, CUDA: True + 6: Launching on nid005290 (6/16), master nid005284 port 9999, GPUs 8, CUDA: True + 0: using world size: 128, data-parallel-size: 128, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 + 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. + 0: using torch.bfloat16 for parameters ... + 0: ------------------------ arguments ------------------------ + 0: abort_on_unmet_fused_kernel_constraints ......... False + 0: accumulate_allreduce_grads_in_fp32 .............. True + 0: adam_beta1 ...................................... 0.9 + 0: adam_beta2 ...................................... 0.999 + 0: adam_eps ........................................ 1e-08 + 0: adlr_autoresume ................................. False + 0: adlr_autoresume_interval ........................ 1000 + 0: apply_query_key_layer_scaling ................... True + 0: apply_residual_connection_post_layernorm ........ False + 0: attention_dropout ............................... 0.1 + 0: attention_softmax_in_fp32 ....................... False + 0: bert_binary_head ................................ True + 0: bert_load ....................................... None + 0: bf16 ............................................ True + 0: bias_dropout_fusion ............................. True + 0: bias_gelu_fusion ................................ True + 0: biencoder_projection_dim ........................ 0 + 0: biencoder_shared_query_context_model ............ False + 0: block_data_path ................................. None + 0: checkpoint_activations .......................... True + 0: checkpoint_in_cpu ............................... False + 0: checkpoint_num_layers ........................... 1 + 0: clip_grad ....................................... 1.0 + 0: codecarbon_dir .................................. None + 0: consumed_train_samples .......................... 0 + 0: consumed_train_tokens ........................... 0 + 0: consumed_valid_samples .......................... 0 + 0: contigious_checkpointing ........................ False + 0: cpu_optimizer ................................... False + 0: cpu_torch_adam .................................. False + 0: curriculum_learning ............................. False + 0: data_impl ....................................... mmap + 0: data_parallel_size .............................. 128 + 0: data_path ....................................... None + 0: dataloader_type ................................. single + 0: DDP_impl ........................................ local + 0: decoder_seq_length .............................. None + 0: deepscale ....................................... False + 0: deepscale_config ................................ None + 0: deepspeed ....................................... True + 0: deepspeed_activation_checkpointing .............. False + 0: deepspeed_config ................................ ds_configs/3325438.json + 0: deepspeed_mpi ................................... False + 0: distribute_checkpointed_activations ............. False + 0: distributed_backend ............................. nccl + 0: embed_layernorm ................................. False + 0: embedding_path .................................. None + 0: encoder_seq_length .............................. 2048 + 0: eod_mask_loss ................................... False + 0: eval_interval ................................... 1000 + 0: eval_iters ...................................... 1 + 0: eval_only ....................................... None + 0: evidence_data_path .............................. None + 0: exit_duration_in_mins ........................... None + 0: exit_interval ................................... None + 0: ffn_hidden_size ................................. 10240 + 0: finetune ........................................ False + 0: fp16 ............................................ False + 0: fp16_lm_cross_entropy ........................... False + 0: fp32_residual_connection ........................ False + 0: gigaflos_no_embeds .............................. 0 + 0: global_batch_size ............................... 512 + 0: glu_activation .................................. None + 0: hidden_dropout .................................. 0.1 + 0: hidden_size ..................................... 2560 + 0: hysteresis ...................................... 2 + 0: ict_head_size ................................... None + 0: ict_load ........................................ None + 0: img_dim ......................................... 224 + 0: indexer_batch_size .............................. 128 + 0: indexer_log_interval ............................ 1000 + 0: inference ....................................... False + 0: init_method_std ................................. 0.02 + 0: init_method_xavier_uniform ...................... False + 0: initial_loss_scale .............................. 4294967296 + 0: kill_switch_path ................................ kill-switch-2b8100m100m + 0: kv_channels ..................................... 128 + 0: layer_norm_fusion ............................... True + 0: layernorm_epsilon ............................... 1e-05 + 0: lazy_mpu_init ................................... None + 0: load ............................................ checkpoints_2b8100m100m + 0: local_rank ...................................... None + 0: log_batch_size_to_tensorboard ................... True + 0: log_interval .................................... 10 + 0: log_learning_rate_to_tensorboard ................ True + 0: log_level ....................................... None + 0: log_level_replica ............................... None + 0: log_loss_scale_to_tensorboard ................... True + 0: log_num_zeros_in_grad ........................... False + 0: log_params_norm ................................. False + 0: log_path ........................................ None + 0: log_timers_to_tensorboard ....................... True + 0: log_validation_ppl_to_tensorboard ............... True + 0: loss_on_targets_only ............................ False + 0: loss_scale ...................................... 12.0 + 0: loss_scale_window ............................... 1000 + 0: lr .............................................. 0.0002 + 0: lr_decay_iters .................................. None + 0: lr_decay_samples ................................ 48828 + 0: lr_decay_style .................................. cosine + 0: lr_decay_tokens ................................. None + 0: lr_warmup_fraction .............................. None + 0: lr_warmup_iters ................................. 0 + 0: lr_warmup_samples ............................... 488 + 0: make_vocab_size_divisible_by .................... 128 + 0: mask_prob ....................................... 0.15 + 0: masked_softmax_fusion ........................... True + 0: max_position_embeddings ......................... 2048 + 0: mean_noise_span_length .......................... None + 0: memory_centric_tiled_linear ..................... False + 0: merge_file ...................................... gpt2/merges.txt + 0: micro_batch_size ................................ 2 + 0: min_loss_scale .................................. 1.0 + 0: min_lr .......................................... 2e-05 + 0: mmap_warmup ..................................... False + 0: no_load_optim ................................... None + 0: no_load_rng ..................................... None + 0: no_save_optim ................................... None + 0: no_save_rng ..................................... None + 0: noise_density ................................... None + 0: num_attention_heads ............................. 20 + 0: num_channels .................................... 3 + 0: num_classes ..................................... 1000 + 0: num_layers ...................................... 34 + 0: num_layers_per_virtual_pipeline_stage ........... None + 0: num_workers ..................................... 2 + 0: onnx_safe ....................................... None + 0: openai_gelu ..................................... False + 0: optimizer ....................................... adam + 0: optimizer_fusion ................................ True + 0: override_lr_scheduler ........................... False + 0: pad_vocab_size_to ............................... None + 0: params_dtype .................................... torch.bfloat16 + 0: partition_activations ........................... False + 0: patch_dim ....................................... 16 + 0: pipeline_model_parallel_size .................... 1 + 0: position_embedding_type ......................... PositionEmbeddingType.absolute + 0: pp_partition_method ............................. None + 0: profile_backward ................................ False + 0: query_in_block_prob ............................. 0.1 + 0: rampup_batch_size ............................... None + 0: rank ............................................ 0 + 0: remote_device ................................... none + 0: reset_attention_mask ............................ False + 0: reset_position_ids .............................. False + 0: reset_progress .................................. None + 0: retriever_report_topk_accuracies ................ [] + 0: retriever_score_scaling ......................... False + 0: retriever_seq_length ............................ 256 + 0: reweight_loss_based_on_position_frequency ....... False + 0: sample_rate ..................................... 1.0 + 0: save ............................................ checkpoints_2b8100m100m + 0: save_interval ................................... 10000 + 0: scatter_gather_tensors_in_pipeline .............. True + 0: scattered_embeddings ............................ False + 0: seed ............................................ 1234 + 0: seq_length ...................................... 2048 + 0: sgd_momentum .................................... 0.9 + 0: short_seq_prob .................................. 0.1 + 0: skip_train_iteration_range ...................... None + 0: split ........................................... None + 0: split_transformers .............................. False + 0: sync_tp_duplicated_parameters ................... False + 0: synchronize_each_layer .......................... False + 0: tensor_model_parallel_size ...................... 1 + 0: tensorboard_dir ................................. tensorboard_2b8100m100m + 0: tensorboard_log_interval ........................ 1 + 0: tensorboard_queue_size .......................... 5 + 0: test_weighted_split_paths ....................... None + 0: test_weighted_split_paths_path .................. None + 0: tile_factor ..................................... 1 + 0: titles_data_path ................................ None + 0: tokenizer_name_or_path .......................... None + 0: tokenizer_type .................................. GPT2BPETokenizer + 0: train_iters ..................................... None + 0: train_samples ................................... 48828 + 0: train_tokens .................................... None + 0: train_weighted_split_names ...................... ['train'] + 0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] + 0: train_weighted_split_paths_path ................. None + 0: train_weighted_split_splits ..................... [['0:1']] + 0: train_weighted_split_weights .................... [['1.0']] + 0: universal_checkpoint ............................ False + 0: use_bnb_optimizer ............................... False + 0: use_checkpoint_lr_scheduler ..................... False + 0: use_contiguous_buffers_in_ddp ................... True + 0: use_cpu_initialization .......................... None + 0: use_one_sent_docs ............................... False + 0: use_pin_memory .................................. False + 0: valid_num_workers ............................... 2 + 0: valid_weighted_split_names ...................... ['validation'] + 0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] + 0: valid_weighted_split_paths_path ................. None + 0: valid_weighted_split_splits ..................... [['0:1']] + 0: valid_weighted_split_weights .................... [['1.0']] + 0: virtual_pipeline_model_parallel_size ............ None + 0: vocab_extra_ids ................................. 0 + 0: vocab_file ...................................... gpt2/vocab.json + 0: weight_decay .................................... 0.1 + 0: world_size ...................................... 128 + 0: zero_allgather_bucket_size ...................... 0.0 + 0: zero_contigious_gradients ....................... False + 0: zero_reduce_bucket_size ......................... 0.0 + 0: zero_reduce_scatter ............................. False + 0: zero_stage ...................................... 0 + 0: -------------------- end of arguments --------------------- + 0: setting number of micro-batches to constant 2 + 0: > building GPT2BPETokenizer tokenizer ... + 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) + 0: DeepSpeed general environment info: + 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] + 0: torch version .................... 1.13.0+rocm5.2 + 0: torch cuda version ............... None + 0: torch hip version ................ 5.2.21151-afdc89f8 + 0: nvcc version ..................... None + 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] + 0: deepspeed info ................... 0.7.5, unknown, unknown + 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 + 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** + 0: > initializing torch distributed ... + 0: [2023-03-16 20:45:49,442] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +15: > setting tensorboard ... + 0: > initializing tensor model parallel with size 1 + 0: > initializing pipeline model parallel with size 1 + 0: > setting random seeds to 1234 ... + 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 + 0: > compiling dataset index builder ... + 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: make: Nothing to be done for 'default'. + 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: >>> done with dataset index builder. Compilation time: 0.117 seconds + 0: > compiling and loading fused kernels ... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 87 + 0: ninja: no work to do. + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 63 + 0: [1/1] c++ scaled_masked_softmax_hip.cuda.o scaled_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 67 + 0: ninja: no work to do. + 0: >>> done with compiling and loading fused kernels. Compilation time: 26.629 seconds + 0: time to initialize megatron (seconds): -2.321 + 0: [after megatron is initialized] datetime: 2023-03-16 20:46:21 + 0: building GPT model ... + 0: [2023-03-16 20:46:21,817] [INFO] [utils.py:827:see_memory_usage] Before Building Model + 0: [2023-03-16 20:46:21,818] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB + 0: [2023-03-16 20:46:21,818] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.65 GB, percent = 6.1% + 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None + 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi + 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 + 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63, ProcessCoord(pipe=0, data=64, model=0): 64, ProcessCoord(pipe=0, data=65, model=0): 65, ProcessCoord(pipe=0, data=66, model=0): 66, ProcessCoord(pipe=0, data=67, model=0): 67, ProcessCoord(pipe=0, data=68, model=0): 68, ProcessCoord(pipe=0, data=69, model=0): + 0: 69, ProcessCoord(pipe=0, data=70, model=0): 70, ProcessCoord(pipe=0, data=71, model=0): 71, ProcessCoord(pipe=0, data=72, model=0): 72, ProcessCoord(pipe=0, data=73, model=0): 73, ProcessCoord(pipe=0, data=74, model=0): 74, ProcessCoord(pipe=0, data=75, model=0): 75, ProcessCoord(pipe=0, data=76, model=0): 76, ProcessCoord(pipe=0, data=77, model=0): 77, ProcessCoord(pipe=0, data=78, model=0): 78, ProcessCoord(pipe=0, data=79, model=0): 79, ProcessCoord(pipe=0, data=80, model=0): 80, ProcessCoord(pipe=0, data=81, model=0): 81, ProcessCoord(pipe=0, data=82, model=0): 82, ProcessCoord(pipe=0, data=83, model=0): 83, ProcessCoord(pipe=0, data=84, model=0): 84, ProcessCoord(pipe=0, data=85, model=0): 85, ProcessCoord(pipe=0, data=86, model=0): 86, ProcessCoord(pipe=0, data=87, model=0): 87, ProcessCoord(pipe=0, data=88, model=0): 88, ProcessCoord(pipe=0, data=89, model=0): 89, ProcessCoord(pipe=0, data=90, model=0): 90, ProcessCoord(pipe=0, data=91, model=0): 91, ProcessCoord(pipe=0, data=92, model=0): 92, Process + 0: Coord(pipe=0, data=93, model=0): 93, ProcessCoord(pipe=0, data=94, model=0): 94, ProcessCoord(pipe=0, data=95, model=0): 95, ProcessCoord(pipe=0, data=96, model=0): 96, ProcessCoord(pipe=0, data=97, model=0): 97, ProcessCoord(pipe=0, data=98, model=0): 98, ProcessCoord(pipe=0, data=99, model=0): 99, ProcessCoord(pipe=0, data=100, model=0): 100, ProcessCoord(pipe=0, data=101, model=0): 101, ProcessCoord(pipe=0, data=102, model=0): 102, ProcessCoord(pipe=0, data=103, model=0): 103, ProcessCoord(pipe=0, data=104, model=0): 104, ProcessCoord(pipe=0, data=105, model=0): 105, ProcessCoord(pipe=0, data=106, model=0): 106, ProcessCoord(pipe=0, data=107, model=0): 107, ProcessCoord(pipe=0, data=108, model=0): 108, ProcessCoord(pipe=0, data=109, model=0): 109, ProcessCoord(pipe=0, data=110, model=0): 110, ProcessCoord(pipe=0, data=111, model=0): 111, ProcessCoord(pipe=0, data=112, model=0): 112, ProcessCoord(pipe=0, data=113, model=0): 113, ProcessCoord(pipe=0, data=114, model=0): 114, ProcessCoord(pipe=0, data=115, mo + 0: del=0): 115, ProcessCoord(pipe=0, data=116, model=0): 116, ProcessCoord(pipe=0, data=117, model=0): 117, ProcessCoord(pipe=0, data=118, model=0): 118, ProcessCoord(pipe=0, data=119, model=0): 119, ProcessCoord(pipe=0, data=120, model=0): 120, ProcessCoord(pipe=0, data=121, model=0): 121, ProcessCoord(pipe=0, data=122, model=0): 122, ProcessCoord(pipe=0, data=123, model=0): 123, ProcessCoord(pipe=0, data=124, model=0): 124, ProcessCoord(pipe=0, data=125, model=0): 125, ProcessCoord(pipe=0, data=126, model=0): 126, ProcessCoord(pipe=0, data=127, model=0): 127} + 0: [2023-03-16 20:46:25,841] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer + 0: stage=0 layers=41 + 0: 0: _to_float16 + 0: 1: EmbeddingPipe + 0: 2: + 0: 3: ParallelTransformerLayerPipe + 0: 4: ParallelTransformerLayerPipe + 0: 5: ParallelTransformerLayerPipe + 0: 6: ParallelTransformerLayerPipe + 0: 7: ParallelTransformerLayerPipe + 0: 8: ParallelTransformerLayerPipe + 0: 9: ParallelTransformerLayerPipe + 0: 10: ParallelTransformerLayerPipe + 0: 11: ParallelTransformerLayerPipe + 0: 12: ParallelTransformerLayerPipe + 0: 13: ParallelTransformerLayerPipe + 0: 14: ParallelTransformerLayerPipe + 0: 15: ParallelTransformerLayerPipe + 0: 16: ParallelTransformerLayerPipe + 0: 17: ParallelTransformerLayerPipe + 0: 18: ParallelTransformerLayerPipe + 0: 19: ParallelTransformerLayerPipe + 0: 20: ParallelTransformerLayerPipe + 0: 21: ParallelTransformerLayerPipe + 0: 22: ParallelTransformerLayerPipe + 0: 23: ParallelTransformerLayerPipe + 0: 24: ParallelTransformerLayerPipe + 0: 25: ParallelTransformerLayerPipe + 0: 26: ParallelTransformerLayerPipe + 0: 27: ParallelTransformerLayerPipe + 0: 28: ParallelTransformerLayerPipe + 0: 29: ParallelTransformerLayerPipe + 0: 30: ParallelTransformerLayerPipe + 0: 31: ParallelTransformerLayerPipe + 0: 32: ParallelTransformerLayerPipe + 0: 33: ParallelTransformerLayerPipe + 0: 34: ParallelTransformerLayerPipe + 0: 35: ParallelTransformerLayerPipe + 0: 36: ParallelTransformerLayerPipe + 0: 37: undo + 0: 38: MixedFusedLayerNorm + 0: 39: EmbeddingPipe + 0: 40: float16_to_fp32 + 0: loss: CrossEntropy + 0: [2023-03-16 20:46:26,323] [INFO] [utils.py:827:see_memory_usage] After Building Model + 0: [2023-03-16 20:46:26,324] [INFO] [utils.py:828:see_memory_usage] MA 5.26 GB Max_MA 5.26 GB CA 5.31 GB Max_CA 5 GB + 0: [2023-03-16 20:46:26,324] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.69 GB, percent = 6.1% + 0: setting training iterations to 95 + 0: > learning rate decay style: cosine + 0: DeepSpeed is enabled. + 0: [2023-03-16 20:46:26,327] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown + 0: [2023-03-16 20:46:42,653] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False + 0: [2023-03-16 20:46:42,654] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer + 0: [2023-03-16 20:46:42,654] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer + 0: [2023-03-16 20:46:42,672] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam + 0: [2023-03-16 20:46:42,672] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer + 0: [2023-03-16 20:46:42,789] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer + 0: [2023-03-16 20:46:42,789] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.27 GB CA 5.32 GB Max_CA 5 GB + 0: [2023-03-16 20:46:42,790] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.38 GB, percent = 6.2% + 4: ninja: no work to do. + 4: Time to load utils op: 0.32492518424987793 seconds + 0: ninja: no work to do. + 0: Time to load utils op: 0.14334774017333984 seconds + 7: Time to load utils op: 0.11618781089782715 seconds + 4: Time to load utils op: 0.0006234645843505859 seconds + 0: Time to load utils op: 0.2021348476409912 seconds + 0: Time to load utils op: 0.202467679977417 seconds + 0: Time to load utils op: 0.20261502265930176 seconds + 0: Time to load utils op: 0.2021031379699707 seconds + 0: Time to load utils op: 0.2024240493774414 seconds + 4: Time to load utils op: 0.2025279998779297 secondsTime to load utils op: 0.20299744606018066 seconds + 4: + 4: Time to load utils op: 0.20380640029907227 seconds + 4: Time to load utils op: 0.20347380638122559 seconds + 4: Time to load utils op: 0.20277023315429688 secondsTime to load utils op: 0.20276498794555664 seconds + 4: + 4: Time to load utils op: 0.20343422889709473 seconds + 0: Time to load utils op: 0.0006198883056640625 seconds + 7: Time to load utils op: 0.20398783683776855 seconds + 2: Time to load utils op: 0.21174979209899902 secondsTime to load utils op: 0.21175384521484375 seconds + 2: + 2: Time to load utils op: 0.2117764949798584 secondsTime to load utils op: 0.21172356605529785 seconds + 2: + 2: Time to load utils op: 0.21178793907165527 seconds + 2: Time to load utils op: 0.21178102493286133 secondsTime to load utils op: 0.2117938995361328 seconds + 2: + 2: Time to load utils op: 0.21179628372192383 seconds + 1: Time to load utils op: 0.21236729621887207 seconds + 1: Time to load utils op: 0.2123873233795166 seconds + 1: Time to load utils op: 0.21239876747131348 seconds + 1: Time to load utils op: 0.21241211891174316 seconds + 1: Time to load utils op: 0.21243643760681152 secondsTime to load utils op: 0.21244382858276367 seconds + 1: + 1: Time to load utils op: 0.21244239807128906 secondsTime to load utils op: 0.21244287490844727 seconds + 1: + 3: Time to load utils op: 0.21323895454406738 seconds + 3: Time to load utils op: 0.21324896812438965 seconds + 3: Time to load utils op: 0.21326279640197754 seconds + 3: Time to load utils op: 0.21327829360961914 secondsTime to load utils op: 0.21327471733093262 secondsTime to load utils op: 0.21328067779541016 seconds + 3: + 3: + 3: Time to load utils op: 0.2132871150970459 seconds + 3: Time to load utils op: 0.21329641342163086 seconds + 6: Time to load utils op: 0.21138668060302734 seconds + 6: Time to load utils op: 0.21139121055603027 secondsTime to load utils op: 0.21139955520629883 seconds + 6: + 6: Time to load utils op: 0.21140384674072266 seconds + 6: Time to load utils op: 0.21140289306640625 seconds + 6: Time to load utils op: 0.21141552925109863 secondsTime to load utils op: 0.21141505241394043 seconds + 6: Time to load utils op: 0.21142029762268066 seconds + 6: + 0: Time to load utils op: 0.0004048347473144531 seconds + 0: Time to load utils op: 0.0003974437713623047 seconds + 0: Time to load utils op: 0.00039124488830566406 seconds + 0: Time to load utils op: 0.00039505958557128906 seconds + 0: Time to load utils op: 0.4026188850402832 seconds + 0: Time to load utils op: 0.00038886070251464844 seconds + 7: Time to load utils op: 0.2027275562286377 seconds + 7: Time to load utils op: 0.20272397994995117 seconds + 7: Time to load utils op: 0.20279574394226074 seconds + 7: Time to load utils op: 0.20316410064697266 seconds +14: Time to load utils op: 0.21152329444885254 seconds +14: Time to load utils op: 0.21137213706970215 seconds +14: Time to load utils op: 0.21049118041992188 seconds +14: Time to load utils op: 0.21067380905151367 seconds +14: Time to load utils op: 0.21169710159301758 seconds +14: Time to load utils op: 0.21106886863708496 seconds + 9: Time to load utils op: 0.21330785751342773 seconds +14: Time to load utils op: 0.2108628749847412 seconds +14: Time to load utils op: 0.210723876953125 seconds + 9: Time to load utils op: 0.2133324146270752 secondsTime to load utils op: 0.21327972412109375 seconds + 9: + 9: Time to load utils op: 0.21334481239318848 seconds + 9: Time to load utils op: 0.21335101127624512 seconds + 9: Time to load utils op: 0.21336650848388672 secondsTime to load utils op: 0.2133636474609375 seconds + 9: + 9: Time to load utils op: 0.2133786678314209 seconds + 8: Time to load utils op: 0.21384310722351074 seconds + 8: Time to load utils op: 0.21387052536010742 seconds + 8: Time to load utils op: 0.21388530731201172 seconds + 8: Time to load utils op: 0.2138991355895996 seconds + 8: Time to load utils op: 0.21390795707702637 seconds + 8: Time to load utils op: 0.21391820907592773 secondsTime to load utils op: 0.21392345428466797 seconds + 8: + 8: Time to load utils op: 0.2139267921447754 seconds + 4: Time to load utils op: 0.0004353523254394531 seconds + 4: Time to load utils op: 0.00045228004455566406 seconds +13: Time to load utils op: 0.2128913402557373 seconds +13: Time to load utils op: 0.21294522285461426 seconds +13: Time to load utils op: 0.21297526359558105 seconds +13: Time to load utils op: 0.21328473091125488 seconds +13: Time to load utils op: 0.21278023719787598 secondsTime to load utils op: 0.2129676342010498 seconds +13: +13: Time to load utils op: 0.21302008628845215 secondsTime to load utils op: 0.21296167373657227 seconds +13: + 4: Time to load utils op: 0.0003905296325683594 seconds + 4: Time to load utils op: 0.00040912628173828125 seconds + 5: Time to load utils op: 0.21276211738586426 secondsTime to load utils op: 0.21276569366455078 seconds + 5: + 5: Time to load utils op: 0.21277332305908203 secondsTime to load utils op: 0.21277642250061035 seconds + 5: Time to load utils op: 0.2127833366394043 seconds + 5: Time to load utils op: 0.2127058506011963 seconds + 5: + 5: Time to load utils op: 0.21279239654541016 secondsTime to load utils op: 0.21279072761535645 seconds + 5: + 4: Time to load utils op: 0.00038886070251464844 seconds + 7: Time to load utils op: 0.0005142688751220703 seconds + 4: Time to load utils op: 0.0003879070281982422 seconds +12: Time to load utils op: 0.2116398811340332 seconds +12: Time to load utils op: 0.2116403579711914 secondsTime to load utils op: 0.21164822578430176 secondsTime to load utils op: 0.21164202690124512 secondsTime to load utils op: 0.2116386890411377 seconds +12: +12: +12: + 4: Time to load utils op: 0.00037479400634765625 seconds +12: Time to load utils op: 0.2116527557373047 seconds +12: Time to load utils op: 0.21165943145751953 secondsTime to load utils op: 0.21165895462036133 seconds +12: +11: Time to load utils op: 0.21184921264648438 seconds +11: Time to load utils op: 0.2118690013885498 seconds +11: Time to load utils op: 0.2118833065032959 seconds +11: Time to load utils op: 0.21189165115356445 secondsTime to load utils op: 0.21190643310546875 seconds +11: +11: Time to load utils op: 0.2118990421295166 seconds +11: Time to load utils op: 0.21192359924316406 seconds +11: Time to load utils op: 0.21191096305847168 seconds + 0: Time to load utils op: 0.5278036594390869 seconds + 7: Time to load utils op: 0.20288324356079102 secondsTime to load utils op: 0.20263242721557617 seconds + 7: +15: Time to load utils op: 0.21075224876403809 seconds +15: Time to load utils op: 0.21076202392578125 seconds +15: Time to load utils op: 0.21081185340881348 seconds +15: Time to load utils op: 0.21082234382629395 secondsTime to load utils op: 0.21081924438476562 seconds +15: +15: Time to load utils op: 0.2108290195465088 seconds +15: Time to load utils op: 0.2108299732208252 secondsTime to load utils op: 0.2108478546142578 seconds +15: +10: Time to load utils op: 0.21085882186889648 secondsTime to load utils op: 0.20583105087280273 seconds +10: +10: Time to load utils op: 0.21086692810058594 seconds +10: Time to load utils op: 0.21079611778259277 secondsTime to load utils op: 0.20970582962036133 seconds +10: +10: Time to load utils op: 0.20979928970336914 secondsTime to load utils op: 0.20322918891906738 seconds +10: +10: Time to load utils op: 0.20978283882141113 seconds + 7: Time to load utils op: 0.0003960132598876953 seconds + 7: Time to load utils op: 0.00037741661071777344 seconds + 7: Time to load utils op: 0.00031948089599609375 seconds + 7: Time to load utils op: 0.0003838539123535156 seconds + 7: Time to load utils op: 0.0003566741943359375 seconds + 0: Time to load utils op: 0.0005059242248535156 seconds + 7: Time to load utils op: 0.00035834312438964844 seconds + 7: Time to load utils op: 0.00033855438232421875 seconds + 1: Time to load utils op: 0.0008802413940429688 seconds + 3: Time to load utils op: 0.0005891323089599609 seconds + 1: Time to load utils op: 0.0011744499206542969 seconds + 3: Time to load utils op: 0.0005736351013183594 seconds + 3: Time to load utils op: 0.0005435943603515625 seconds + 1: Time to load utils op: 0.0014395713806152344 seconds + 1: Time to load utils op: 0.0014140605926513672 seconds + 1: Time to load utils op: 0.0013766288757324219 seconds + 1: Time to load utils op: 0.0013072490692138672 secondsTime to load utils op: 0.0013053417205810547 seconds + 1: + 1: Time to load utils op: 0.0013103485107421875 seconds + 3: Time to load utils op: 0.0008921623229980469 seconds + 3: Time to load utils op: 0.000537872314453125 seconds + 3: Time to load utils op: 0.0005266666412353516 seconds + 3: Time to load utils op: 0.0005497932434082031 seconds + 3: Time to load utils op: 0.0006043910980224609 seconds + 2: Time to load utils op: 0.0008461475372314453 seconds + 2: Time to load utils op: 0.0008101463317871094 seconds + 2: Time to load utils op: 0.0009543895721435547 seconds + 2: Time to load utils op: 0.0010747909545898438 seconds + 2: Time to load utils op: 0.0009393692016601562 seconds + 2: Time to load utils op: 0.0010254383087158203 seconds + 2: Time to load utils op: 0.001035451889038086 seconds + 2: Time to load utils op: 0.0009834766387939453 seconds +14: Time to load utils op: 0.0007739067077636719 seconds +14: Time to load utils op: 0.0008392333984375 seconds +14: Time to load utils op: 0.0011188983917236328 seconds +14: Time to load utils op: 0.0010921955108642578 seconds +14: Time to load utils op: 0.0012619495391845703 seconds +14: Time to load utils op: 0.0012023448944091797 seconds +14: Time to load utils op: 0.0010619163513183594 seconds +14: Time to load utils op: 0.0012052059173583984 seconds + 6: Time to load utils op: 0.0005843639373779297 seconds + 6: Time to load utils op: 0.000904083251953125 seconds + 6: Time to load utils op: 0.0006434917449951172 seconds + 6: Time to load utils op: 0.0007917881011962891 secondsTime to load utils op: 0.0006265640258789062 seconds + 6: + 6: Time to load utils op: 0.0007884502410888672 seconds + 6: Time to load utils op: 0.0006585121154785156 seconds + 6: Time to load utils op: 0.0008871555328369141 seconds + 9: Time to load utils op: 0.0009398460388183594 seconds + 9: Time to load utils op: 0.0010559558868408203 seconds + 9: Time to load utils op: 0.0011873245239257812 seconds + 9: Time to load utils op: 0.0013089179992675781 seconds + 9: Time to load utils op: 0.0012996196746826172 seconds + 9: Time to load utils op: 0.0012555122375488281 secondsTime to load utils op: 0.0013434886932373047 seconds + 9: + 9: Time to load utils op: 0.0013709068298339844 seconds +13: Time to load utils op: 0.0009348392486572266 seconds +13: Time to load utils op: 0.0010304450988769531 seconds +13: Time to load utils op: 0.0013213157653808594 secondsTime to load utils op: 0.001306295394897461 seconds +13: +13: Time to load utils op: 0.0012679100036621094 seconds +13: Time to load utils op: 0.0012729167938232422 secondsTime to load utils op: 0.0013535022735595703 seconds +13: +13: Time to load utils op: 0.0013706684112548828 seconds +12: Time to load utils op: 0.0008115768432617188 seconds +12: Time to load utils op: 0.0008165836334228516 seconds +12: Time to load utils op: 0.0007538795471191406 seconds +12: Time to load utils op: 0.0009417533874511719 seconds +12: Time to load utils op: 0.0009324550628662109 seconds +12: Time to load utils op: 0.000972747802734375 seconds +12: Time to load utils op: 0.0009989738464355469 seconds +12: Time to load utils op: 0.001093149185180664 seconds +10: Time to load utils op: 0.00045299530029296875 seconds +10: Time to load utils op: 0.00038814544677734375 seconds +10: Time to load utils op: 0.00039124488830566406 seconds +10: Time to load utils op: 0.00040459632873535156 seconds +10: Time to load utils op: 0.00038504600524902344 seconds +10: Time to load utils op: 0.0003762245178222656 seconds +10: Time to load utils op: 0.00038933753967285156 seconds +10: Time to load utils op: 0.0003972053527832031 seconds + 5: Time to load utils op: 0.0009341239929199219 seconds +11: Time to load utils op: 0.0008206367492675781 seconds +11: Time to load utils op: 0.0008847713470458984 seconds +15: Time to load utils op: 0.0008077621459960938 seconds + 5: Time to load utils op: 0.0007634162902832031 seconds + 5: Time to load utils op: 0.0007870197296142578 seconds + 5: Time to load utils op: 0.0008070468902587891 seconds + 5: Time to load utils op: 0.0011479854583740234 seconds +15: Time to load utils op: 0.0009810924530029297 seconds + 5: Time to load utils op: 0.0009832382202148438 seconds + 5: Time to load utils op: 0.0009570121765136719 seconds +11: Time to load utils op: 0.001251220703125 seconds + 5: Time to load utils op: 0.0009722709655761719 seconds +11: Time to load utils op: 0.0011005401611328125 seconds +11: Time to load utils op: 0.001163482666015625 seconds +15: Time to load utils op: 0.0011360645294189453 seconds +11: Time to load utils op: 0.0011737346649169922 seconds +11: Time to load utils op: 0.0011334419250488281 seconds +11: Time to load utils op: 0.0011518001556396484 seconds +15: Time to load utils op: 0.0012760162353515625 seconds +15: Time to load utils op: 0.0013856887817382812 secondsTime to load utils op: 0.0013647079467773438 seconds +15: +15: Time to load utils op: 0.0013229846954345703 seconds +15: Time to load utils op: 0.0013720989227294922 seconds + 8: Time to load utils op: 0.0008502006530761719 seconds + 8: Time to load utils op: 0.0008935928344726562 seconds + 8: Time to load utils op: 0.0008690357208251953 seconds + 8: Time to load utils op: 0.0008420944213867188 seconds + 8: Time to load utils op: 0.0008628368377685547 seconds + 8: Time to load utils op: 0.0009164810180664062 secondsTime to load utils op: 0.0009257793426513672 seconds + 8: + 8: Time to load utils op: 0.0009913444519042969 seconds + 0: [2023-03-16 20:46:43,317] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 + 0: [2023-03-16 20:46:43,318] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.25 GB CA 5.32 GB Max_CA 5 GB + 0: [2023-03-16 20:46:43,318] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.55 GB, percent = 6.3% + 0: [2023-03-16 20:46:43,433] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 + 0: [2023-03-16 20:46:43,434] [INFO] [utils.py:828:see_memory_usage] MA 10.67 GB Max_MA 10.67 GB CA 13.39 GB Max_CA 13 GB + 0: [2023-03-16 20:46:43,434] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.54 GB, percent = 6.3% + 0: [2023-03-16 20:46:43,536] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 + 0: [2023-03-16 20:46:43,537] [INFO] [utils.py:828:see_memory_usage] MA 10.67 GB Max_MA 10.67 GB CA 13.39 GB Max_CA 13 GB + 0: [2023-03-16 20:46:43,537] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.54 GB, percent = 6.3% + 0: [2023-03-16 20:46:43,641] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 + 0: [2023-03-16 20:46:43,641] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 20:46:43,642] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.54 GB, percent = 6.3% + 0: [2023-03-16 20:46:43,742] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 + 0: [2023-03-16 20:46:43,742] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 20:46:43,743] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.54 GB, percent = 6.3% + 0: [2023-03-16 20:46:43,848] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 + 0: [2023-03-16 20:46:43,849] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 20:46:43,849] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.54 GB, percent = 6.3% + 0: [2023-03-16 20:46:43,949] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer + 0: [2023-03-16 20:46:43,950] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 20:46:43,950] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.54 GB, percent = 6.3% + 0: [2023-03-16 20:46:44,057] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer + 0: [2023-03-16 20:46:44,057] [INFO] [utils.py:828:see_memory_usage] MA 15.94 GB Max_MA 15.94 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 20:46:44,057] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.54 GB, percent = 6.3% + 0: [2023-03-16 20:46:44,159] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer + 0: [2023-03-16 20:46:44,159] [INFO] [utils.py:828:see_memory_usage] MA 15.94 GB Max_MA 15.94 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 20:46:44,160] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.54 GB, percent = 6.3% + 0: [2023-03-16 20:46:44,160] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam + 0: [2023-03-16 20:46:44,160] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler + 0: [2023-03-16 20:46:44,160] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = + 0: [2023-03-16 20:46:44,160] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] activation_checkpointing_config { + 0: "partition_activations": false, + 0: "contiguous_memory_optimization": false, + 0: "cpu_checkpointing": false, + 0: "number_checkpoints": null, + 0: "synchronize_checkpoint_boundary": false, + 0: "profile": false + 0: } + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] amp_enabled .................. False + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] amp_params ................... False + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] autotuning_config ............ { + 0: "enabled": false, + 0: "start_step": null, + 0: "end_step": null, + 0: "metric_path": null, + 0: "arg_mappings": null, + 0: "metric": "throughput", + 0: "model_info": null, + 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", + 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", + 0: "overwrite": true, + 0: "fast": true, + 0: "start_profile_step": 3, + 0: "end_profile_step": 5, + 0: "tuner_type": "gridsearch", + 0: "tuner_early_stopping": 5, + 0: "tuner_num_trials": 50, + 0: "model_info_path": null, + 0: "mp_size": 1, + 0: "max_train_batch_size": null, + 0: "min_train_batch_size": 1, + 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, + 0: "min_train_micro_batch_size_per_gpu": 1, + 0: "num_tuning_micro_batch_sizes": 3 + 0: } + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] bfloat16_enabled ............. True + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] comms_config ................. + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] communication_data_type ...... None + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa + 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] curriculum_enabled ........... False + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] curriculum_params ............ False + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] dataloader_drop_last ......... False + 0: [2023-03-16 20:46:44,161] [INFO] [config.py:1011:print] disable_allgather ............ False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] dump_state ................... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] elasticity_enabled ........... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] flops_profiler_config ........ { + 0: "enabled": false, + 0: "profile_step": 1, + 0: "module_depth": -1, + 0: "top_modules": 1, + 0: "detailed": true, + 0: "output_file": null + 0: } + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] fp16_auto_cast ............... None + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] fp16_enabled ................. False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] global_rank .................. 0 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 2 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] load_universal_checkpoint .... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] loss_scale ................... 1.0 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] memory_breakdown ............. False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] monitor_config ............... + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] nebula_config ................ { + 0: "enabled": false, + 0: "persistent_storage_path": null, + 0: "persistent_time_interval": 100, + 0: "num_of_version_in_retention": 2, + 0: "enable_nebula_load": true, + 0: "load_path": null + 0: } + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] optimizer_name ............... None + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] optimizer_params ............. None + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] pld_enabled .................. False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] pld_params ................... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] prescale_gradients ........... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] scheduler_name ............... None + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] scheduler_params ............. None + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] sparse_attention ............. None + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] steps_per_print .............. 2000 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] train_batch_size ............. 512 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 2 + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] use_node_local_storage ....... False + 0: [2023-03-16 20:46:44,162] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False + 0: [2023-03-16 20:46:44,163] [INFO] [config.py:1011:print] world_size ................... 128 + 0: [2023-03-16 20:46:44,163] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False + 0: [2023-03-16 20:46:44,163] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False + 0: [2023-03-16 20:46:44,163] [INFO] [config.py:1011:print] zero_enabled ................. False + 0: [2023-03-16 20:46:44,163] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 + 0: [2023-03-16 20:46:44,163] [INFO] [config.py:996:print_user_config] json = { + 0: "train_micro_batch_size_per_gpu": 2, + 0: "train_batch_size": 512, + 0: "gradient_clipping": 1.0, + 0: "zero_optimization": { + 0: "stage": 0 + 0: }, + 0: "bf16": { + 0: "enabled": true + 0: }, + 0: "steps_per_print": 2.000000e+03, + 0: "wall_clock_breakdown": false + 0: } + 0: Time to load utils op: 0.00040149688720703125 seconds + 0: [2023-03-16 20:46:44,163] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=2 micro_batch_size=2 + 0: [2023-03-16 20:46:44,218] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=41 [0, 41) STAGE_PARAMS=2809026560 (2809.027M) TOTAL_PARAMS=2809026560 (2809.027M) UNIQUE_PARAMS=2809026560 (2809.027M) + 0: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 0: WARNING: could not find the metadata file checkpoints_2b8100m100m + 0: will not load any checkpoints and will start from random + 0: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 8: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 8: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 8: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 0: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 0: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +12: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +12: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +12: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 8: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +12: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +14: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +14: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +14: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 6: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 6: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 6: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +14: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +10: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +10: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 4: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 4: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 4: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +15: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +15: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +15: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +15: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +10: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 4: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +12: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +12: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +10: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +11: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +11: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +11: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 8: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +15: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +13: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +13: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +13: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +11: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 0: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 6: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +13: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 8: [2023-03-16 20:46:44,223] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 0: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +15: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +14: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +14: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 2: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 2: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 2: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 9: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 9: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 9: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +13: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +13: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 2: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 6: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 9: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +11: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 6: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +10: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 4: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 3: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 3: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 3: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +11: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 3: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 1: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 1: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 1: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +10: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +12: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 1: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 4: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 8: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 0: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +15: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 9: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 9: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +14: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +14: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +12: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 5: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 5: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 5: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 3: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 3: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 8: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 0: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 2: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +15: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 7: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 7: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 7: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 5: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 5: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 7: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 5: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 7: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 7: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +13: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 2: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 7: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 4: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +11: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +11: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 6: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 1: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 1: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +10: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 7: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 4: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 6: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +13: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +10: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 9: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 9: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 2: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 2: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 5: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 5: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 3: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 3: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 1: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + 1: [2023-03-16 20:46:44,224] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_2b8100m100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +15: time (ms) | load-checkpoint: 9.24 + 0: estimated model parameters: 2.80902656 + 0: estimated model parameters without embeddings: 2.67500544 + 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 20:46:44 + 0: > building train, validation, and test datasets ... + 0: > datasets target sizes (minimum size): + 0: train: 48828 + 0: validation: 512 + 0: test: 512 + 0: > building train, validation, and test datasets for GPT ... + 0: > building dataset index ... + 0: reading sizes... + 0: reading pointers... + 0: reading document index... + 0: creating numpy buffer of mmap... + 0: creating memory view of numpy buffer... + 0: > finished creating indexed dataset in 0.044592 seconds + 0: number of documents: 208931 + 0: > dataset split: + 0: train: + 0: document indices in [0, 208931) total of 208931 documents + 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_48828ns_2048sl_1234s_doc_idx.npy + 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_48828ns_2048sl_1234s_sample_idx.npy + 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_48828ns_2048sl_1234s_shuffle_idx.npy + 0: loaded indexed file in 0.079 seconds + 0: total number of samples: 97610 + 0: total number of epochs: 2 + 0: > building dataset index ... + 0: reading sizes... + 0: reading pointers... + 0: reading document index... + 0: creating numpy buffer of mmap... + 0: creating memory view of numpy buffer... + 0: > finished creating indexed dataset in 0.070537 seconds + 0: number of documents: 364608 + 0: > dataset split: + 0: validation: + 0: document indices in [0, 364608) total of 364608 documents + 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_512ns_2048sl_1234s_doc_idx.npy + 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_512ns_2048sl_1234s_sample_idx.npy + 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_512ns_2048sl_1234s_shuffle_idx.npy + 0: loaded indexed file in 0.058 seconds + 0: total number of samples: 84978 + 0: total number of epochs: 1 + 0: > finished creating GPT datasets ... + 0: [after dataloaders are built] datetime: 2023-03-16 20:46:59 + 0: done with setup ... + 0: training ... + 0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: +15: time (ms) | model-and-optimizer-setup: 23107.14 | train/valid/test-data-iterators-setup: 13761.65 + 0: [000-000] 2.8090B / 2.6750B + 0: [before the start of training step] datetime: 2023-03-16 20:46:59 + 0: [2023-03-16 20:47:00,810] [INFO] [checkpointing.py:553:forward] Activation Checkpointing Information + 0: [2023-03-16 20:47:00,810] [INFO] [checkpointing.py:554:forward] ----Partition Activations False, CPU CHECKPOINTING False + 0: [2023-03-16 20:47:00,810] [INFO] [checkpointing.py:557:forward] ----contiguous Memory Checkpointing False with None total layers + 0: [2023-03-16 20:47:00,810] [INFO] [checkpointing.py:560:forward] ----Synchronization False + 0: [2023-03-16 20:47:00,810] [INFO] [checkpointing.py:561:forward] ----Profiling time in checkpointing False + 0: [Rank 0] (after 10 iterations) memory (MB) | allocated: 23280.98193359375 | max allocated: 26037.23193359375 | reserved: 34094.0 | max reserved: 34094.0 +15: iteration 10/ 95 | consumed samples: 5120 | consumed tokens: 10485760 | elapsed time per iteration (s): 5.33 | learning rate: 1.960E-04 | global batch size: 512 | lm loss: 1.091608E+01 | grad norm: 3.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 95.992 | TFLOPs: 38.42 | +15: iteration 20/ 95 | consumed samples: 10240 | consumed tokens: 20971520 | elapsed time per iteration (s): 3.58 | learning rate: 1.825E-04 | global batch size: 512 | lm loss: 8.088631E+00 | grad norm: 2.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 142.833 | TFLOPs: 57.17 | +15: iteration 30/ 95 | consumed samples: 15360 | consumed tokens: 31457280 | elapsed time per iteration (s): 3.59 | learning rate: 1.611E-04 | global batch size: 512 | lm loss: 7.777384E+00 | grad norm: 1.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 142.565 | TFLOPs: 57.06 | +15: iteration 40/ 95 | consumed samples: 20480 | consumed tokens: 41943040 | elapsed time per iteration (s): 3.56 | learning rate: 1.341E-04 | global batch size: 512 | lm loss: 7.694283E+00 | grad norm: 0.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 143.928 | TFLOPs: 57.61 | +15: iteration 50/ 95 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (s): 3.62 | learning rate: 1.045E-04 | global batch size: 512 | lm loss: 7.656776E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 141.469 | TFLOPs: 56.62 | +15: iteration 60/ 95 | consumed samples: 30720 | consumed tokens: 62914560 | elapsed time per iteration (s): 3.58 | learning rate: 7.545E-05 | global batch size: 512 | lm loss: 7.626921E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 143.115 | TFLOPs: 57.28 | +15: iteration 70/ 95 | consumed samples: 35840 | consumed tokens: 73400320 | elapsed time per iteration (s): 3.56 | learning rate: 5.020E-05 | global batch size: 512 | lm loss: 7.604048E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 143.873 | TFLOPs: 57.59 | +15: iteration 80/ 95 | consumed samples: 40960 | consumed tokens: 83886080 | elapsed time per iteration (s): 3.57 | learning rate: 3.151E-05 | global batch size: 512 | lm loss: 7.576460E+00 | grad norm: 0.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 143.392 | TFLOPs: 57.39 | +15: iteration 90/ 95 | consumed samples: 46080 | consumed tokens: 94371840 | elapsed time per iteration (s): 3.56 | learning rate: 2.143E-05 | global batch size: 512 | lm loss: 7.548269E+00 | grad norm: 0.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 143.762 | TFLOPs: 57.54 | + 0: [after training is done] datetime: 2023-03-16 20:52:56 + 0: saving checkpoint at iteration 95 to checkpoints_2b8100m100m +15: ----------------------------------------------------------------------------------------------------------------- +15: validation loss at the end of training for val data | lm loss value: 7.498930E+00 | lm loss PPL: 1.806110E+03 | +15: ----------------------------------------------------------------------------------------------------------------- + 0: [2023-03-16 20:52:57,691] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step95 is begin to save! + 0: [2023-03-16 20:52:57,788] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 20:52:58,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 20:52:58,079] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 20:52:58,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 20:52:58,249] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 20:52:58,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 20:52:58,425] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 20:52:58,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 20:52:58,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 20:52:58,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 20:52:58,765] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 20:52:58,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 20:52:58,940] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 20:52:59,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 20:52:59,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 20:52:59,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 20:52:59,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 20:52:59,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 20:52:59,451] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 20:52:59,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 20:52:59,625] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 20:52:59,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 20:52:59,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 20:52:59,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 20:52:59,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 20:53:00,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 20:53:00,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 20:53:00,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 20:53:00,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 20:53:00,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 20:53:00,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 20:53:00,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 20:53:00,662] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 20:53:00,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 20:53:00,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 20:53:01,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 20:53:01,009] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 20:53:01,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 20:53:01,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 20:53:01,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 20:53:01,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 20:53:01,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 20:53:01,512] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 20:53:01,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 20:53:01,688] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 20:53:01,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 20:53:01,858] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 20:53:02,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 20:53:02,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 20:53:02,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 20:53:02,193] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 20:53:02,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 20:53:02,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 20:53:02,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 20:53:02,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 20:53:02,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 20:53:02,711] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 20:53:02,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 20:53:02,880] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 20:53:03,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 20:53:03,044] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 20:53:03,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 20:53:03,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 20:53:03,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 20:53:03,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 20:53:03,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 20:53:03,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 20:53:03,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 20:53:03,725] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 20:53:03,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 20:53:03,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 20:53:03,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 20:53:03,907] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt + 0: [2023-03-16 20:53:03,907] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 20:53:03,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... + 0: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... + 0: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... + 0: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... + 0: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... + 2: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... + 2: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... + 2: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... + 6: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... + 6: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... +10: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... +10: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... +10: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... +10: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... + 1: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... + 7: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... + 7: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... + 7: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... + 7: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... + 7: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... + 4: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... + 4: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +11: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... +11: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... +11: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... +11: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... +11: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... + 8: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... + 8: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... + 0: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... +12: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... +12: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... +12: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... +12: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... +12: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... + 2: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... + 2: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... +15: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... +15: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... + 6: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... + 6: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... + 6: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... +13: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... +13: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... +13: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... +13: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... +13: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... +10: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... + 1: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... + 1: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... + 7: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... + 7: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... + 7: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... + 5: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... + 5: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... + 5: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... + 5: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... + 3: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... + 3: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... + 3: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... + 9: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... + 9: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... + 9: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... + 9: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... + 9: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... + 4: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... + 4: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... + 4: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... + 4: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... + 4: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... +11: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... +11: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... +11: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... +14: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... +14: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... +14: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... +14: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... +14: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... + 8: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... + 8: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... + 0: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... +12: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... + 2: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... + 2: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... + 2: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... +15: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... +15: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... +15: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... +15: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... +15: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... + 6: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... + 6: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... + 6: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... +13: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... +13: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... +10: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... +10: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... + 1: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... + 1: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... + 1: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... + 1: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... + 5: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... + 5: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... + 5: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... + 5: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... + 3: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... + 3: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... + 3: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... + 3: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... + 3: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... + 9: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... + 9: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... + 4: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... +14: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... +14: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... +14: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... + 8: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... + 8: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... + 8: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... + 8: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... + 0: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... +12: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... +12: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... +15: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... +13: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... +10: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... + 1: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... + 9: [2023-03-16 20:53:03,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... +14: [2023-03-16 20:53:04,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. +14: [2023-03-16 20:53:04,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt +14: [2023-03-16 20:53:04,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +11: [2023-03-16 20:53:04,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. +11: [2023-03-16 20:53:04,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt +11: [2023-03-16 20:53:04,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +12: [2023-03-16 20:53:04,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt +12: [2023-03-16 20:53:04,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 5: [2023-03-16 20:53:04,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. + 5: [2023-03-16 20:53:04,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt + 5: [2023-03-16 20:53:04,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +11: [2023-03-16 20:53:04,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. +11: [2023-03-16 20:53:04,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt +11: [2023-03-16 20:53:04,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. +13: [2023-03-16 20:53:04,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. +13: [2023-03-16 20:53:04,272] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt +13: [2023-03-16 20:53:04,272] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +13: [2023-03-16 20:53:04,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. +13: [2023-03-16 20:53:04,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt +13: [2023-03-16 20:53:04,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 8: [2023-03-16 20:53:04,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. + 8: [2023-03-16 20:53:04,292] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt + 8: [2023-03-16 20:53:04,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +14: [2023-03-16 20:53:04,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. +14: [2023-03-16 20:53:04,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt +14: [2023-03-16 20:53:04,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +14: [2023-03-16 20:53:04,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. +14: [2023-03-16 20:53:04,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt +14: [2023-03-16 20:53:04,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 5: [2023-03-16 20:53:04,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. + 9: [2023-03-16 20:53:04,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. + 9: [2023-03-16 20:53:04,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt + 5: [2023-03-16 20:53:04,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt + 5: [2023-03-16 20:53:04,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 9: [2023-03-16 20:53:04,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 5: [2023-03-16 20:53:04,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. + 5: [2023-03-16 20:53:04,299] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt + 5: [2023-03-16 20:53:04,299] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 0: [2023-03-16 20:53:04,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. + 0: [2023-03-16 20:53:04,299] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt + 0: [2023-03-16 20:53:04,299] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 0: [2023-03-16 20:53:04,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. + 0: [2023-03-16 20:53:04,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt + 0: [2023-03-16 20:53:04,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 1: [2023-03-16 20:53:04,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. + 1: [2023-03-16 20:53:04,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt + 1: [2023-03-16 20:53:04,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 6: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. + 6: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. + 6: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. + 6: [2023-03-16 20:53:04,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt + 8: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. + 6: [2023-03-16 20:53:04,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt + 6: [2023-03-16 20:53:04,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt + 6: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 6: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 6: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 6: [2023-03-16 20:53:04,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. + 6: [2023-03-16 20:53:04,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt + 6: [2023-03-16 20:53:04,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 6: [2023-03-16 20:53:04,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. + 6: [2023-03-16 20:53:04,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt + 6: [2023-03-16 20:53:04,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 9: [2023-03-16 20:53:04,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. + 9: [2023-03-16 20:53:04,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt + 9: [2023-03-16 20:53:04,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 8: [2023-03-16 20:53:04,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt + 8: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 8: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. + 8: [2023-03-16 20:53:04,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt + 8: [2023-03-16 20:53:04,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 5: [2023-03-16 20:53:04,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. + 5: [2023-03-16 20:53:04,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt + 5: [2023-03-16 20:53:04,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 9: [2023-03-16 20:53:04,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. + 9: [2023-03-16 20:53:04,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt + 9: [2023-03-16 20:53:04,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +11: [2023-03-16 20:53:04,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. +11: [2023-03-16 20:53:04,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt +11: [2023-03-16 20:53:04,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 3: [2023-03-16 20:53:04,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. + 3: [2023-03-16 20:53:04,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt + 3: [2023-03-16 20:53:04,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 3: [2023-03-16 20:53:04,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. + 3: [2023-03-16 20:53:04,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt + 3: [2023-03-16 20:53:04,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 3: [2023-03-16 20:53:04,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. + 3: [2023-03-16 20:53:04,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt + 3: [2023-03-16 20:53:04,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +14: [2023-03-16 20:53:04,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. +14: [2023-03-16 20:53:04,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt +14: [2023-03-16 20:53:04,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +13: [2023-03-16 20:53:04,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. +13: [2023-03-16 20:53:04,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt +13: [2023-03-16 20:53:04,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 0: [2023-03-16 20:53:04,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. + 0: [2023-03-16 20:53:04,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. + 0: [2023-03-16 20:53:04,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt + 0: [2023-03-16 20:53:04,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt + 0: [2023-03-16 20:53:04,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 0: [2023-03-16 20:53:04,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +13: [2023-03-16 20:53:04,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt +12: [2023-03-16 20:53:04,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +13: [2023-03-16 20:53:04,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt +13: [2023-03-16 20:53:04,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +13: [2023-03-16 20:53:04,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. +13: [2023-03-16 20:53:04,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt +13: [2023-03-16 20:53:04,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 8: [2023-03-16 20:53:04,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. + 8: [2023-03-16 20:53:04,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt + 8: [2023-03-16 20:53:04,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 1: [2023-03-16 20:53:04,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. + 1: [2023-03-16 20:53:04,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. + 1: [2023-03-16 20:53:04,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt + 1: [2023-03-16 20:53:04,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt + 1: [2023-03-16 20:53:04,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 1: [2023-03-16 20:53:04,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +12: [2023-03-16 20:53:04,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt +12: [2023-03-16 20:53:04,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 3: [2023-03-16 20:53:04,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. + 3: [2023-03-16 20:53:04,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt + 3: [2023-03-16 20:53:04,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. + 3: [2023-03-16 20:53:04,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 3: [2023-03-16 20:53:04,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt + 3: [2023-03-16 20:53:04,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 3: [2023-03-16 20:53:04,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. + 3: [2023-03-16 20:53:04,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt + 3: [2023-03-16 20:53:04,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +11: [2023-03-16 20:53:04,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. +11: [2023-03-16 20:53:04,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt +11: [2023-03-16 20:53:04,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt +15: [2023-03-16 20:53:04,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. +15: [2023-03-16 20:53:04,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt +15: [2023-03-16 20:53:04,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. +15: [2023-03-16 20:53:04,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt +15: [2023-03-16 20:53:04,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. + 7: [2023-03-16 20:53:04,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. + 7: [2023-03-16 20:53:04,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt + 7: [2023-03-16 20:53:04,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 7: [2023-03-16 20:53:04,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. + 7: [2023-03-16 20:53:04,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt + 7: [2023-03-16 20:53:04,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 0: [2023-03-16 20:53:04,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. + 0: [2023-03-16 20:53:04,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. + 0: [2023-03-16 20:53:04,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt + 0: [2023-03-16 20:53:04,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 0: [2023-03-16 20:53:04,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. + 0: [2023-03-16 20:53:04,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt + 0: [2023-03-16 20:53:04,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 0: [2023-03-16 20:53:04,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. + 0: [2023-03-16 20:53:04,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt + 0: [2023-03-16 20:53:04,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 1: [2023-03-16 20:53:04,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. + 1: [2023-03-16 20:53:04,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt + 1: [2023-03-16 20:53:04,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 9: [2023-03-16 20:53:04,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. + 9: [2023-03-16 20:53:04,373] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt + 9: [2023-03-16 20:53:04,373] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt +15: [2023-03-16 20:53:04,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. +15: [2023-03-16 20:53:04,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt +15: [2023-03-16 20:53:04,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 9: [2023-03-16 20:53:04,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. + 9: [2023-03-16 20:53:04,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt + 9: [2023-03-16 20:53:04,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 9: [2023-03-16 20:53:04,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. + 9: [2023-03-16 20:53:04,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt + 9: [2023-03-16 20:53:04,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 8: [2023-03-16 20:53:04,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. + 8: [2023-03-16 20:53:04,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt + 8: [2023-03-16 20:53:04,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 3: [2023-03-16 20:53:04,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. + 3: [2023-03-16 20:53:04,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt + 3: [2023-03-16 20:53:04,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +12: [2023-03-16 20:53:04,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt +12: [2023-03-16 20:53:04,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 7: [2023-03-16 20:53:04,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. + 7: [2023-03-16 20:53:04,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt + 7: [2023-03-16 20:53:04,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 7: [2023-03-16 20:53:04,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. + 7: [2023-03-16 20:53:04,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt + 7: [2023-03-16 20:53:04,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +10: [2023-03-16 20:53:04,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. +10: [2023-03-16 20:53:04,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. +10: [2023-03-16 20:53:04,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt +10: [2023-03-16 20:53:04,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. +10: [2023-03-16 20:53:04,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt +10: [2023-03-16 20:53:04,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +10: [2023-03-16 20:53:04,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +10: [2023-03-16 20:53:04,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt +10: [2023-03-16 20:53:04,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +12: [2023-03-16 20:53:04,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt +12: [2023-03-16 20:53:04,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +12: [2023-03-16 20:53:04,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt +12: [2023-03-16 20:53:04,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. + 4: [2023-03-16 20:53:04,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. + 4: [2023-03-16 20:53:04,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt + 4: [2023-03-16 20:53:04,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt + 4: [2023-03-16 20:53:04,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. + 1: [2023-03-16 20:53:04,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. + 1: [2023-03-16 20:53:04,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt + 1: [2023-03-16 20:53:04,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +10: [2023-03-16 20:53:04,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. +10: [2023-03-16 20:53:04,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt +10: [2023-03-16 20:53:04,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt + 4: [2023-03-16 20:53:04,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. + 4: [2023-03-16 20:53:04,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt + 4: [2023-03-16 20:53:04,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. + 4: [2023-03-16 20:53:04,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt + 4: [2023-03-16 20:53:04,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 1: [2023-03-16 20:53:04,455] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. + 1: [2023-03-16 20:53:04,455] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt + 1: [2023-03-16 20:53:04,455] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +11: [2023-03-16 20:53:04,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. +11: [2023-03-16 20:53:04,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt +11: [2023-03-16 20:53:04,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 8: [2023-03-16 20:53:04,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. + 8: [2023-03-16 20:53:04,461] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt + 8: [2023-03-16 20:53:04,461] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 2: [2023-03-16 20:53:04,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. + 2: [2023-03-16 20:53:04,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. + 2: [2023-03-16 20:53:04,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. + 2: [2023-03-16 20:53:04,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt + 2: [2023-03-16 20:53:04,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt + 2: [2023-03-16 20:53:04,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt + 2: [2023-03-16 20:53:04,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 2: [2023-03-16 20:53:04,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 2: [2023-03-16 20:53:04,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. + 2: [2023-03-16 20:53:04,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 2: [2023-03-16 20:53:04,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt + 2: [2023-03-16 20:53:04,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 7: [2023-03-16 20:53:04,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. + 7: [2023-03-16 20:53:04,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt + 7: [2023-03-16 20:53:04,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. + 2: [2023-03-16 20:53:04,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. + 2: [2023-03-16 20:53:04,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. + 2: [2023-03-16 20:53:04,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt + 2: [2023-03-16 20:53:04,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. + 2: [2023-03-16 20:53:04,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. + 2: [2023-03-16 20:53:04,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt + 2: [2023-03-16 20:53:04,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 2: [2023-03-16 20:53:04,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt + 2: [2023-03-16 20:53:04,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt + 2: [2023-03-16 20:53:04,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 2: [2023-03-16 20:53:04,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 2: [2023-03-16 20:53:04,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt + 4: [2023-03-16 20:53:04,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 9: [2023-03-16 20:53:04,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. + 9: [2023-03-16 20:53:04,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt + 9: [2023-03-16 20:53:04,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 6: [2023-03-16 20:53:04,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. + 6: [2023-03-16 20:53:04,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt + 6: [2023-03-16 20:53:04,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. + 4: [2023-03-16 20:53:04,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt + 4: [2023-03-16 20:53:04,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. + 5: [2023-03-16 20:53:04,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. + 5: [2023-03-16 20:53:04,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt + 5: [2023-03-16 20:53:04,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt +15: [2023-03-16 20:53:04,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 4: [2023-03-16 20:53:04,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. + 4: [2023-03-16 20:53:04,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt + 4: [2023-03-16 20:53:04,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt +12: [2023-03-16 20:53:04,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +12: [2023-03-16 20:53:04,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. +12: [2023-03-16 20:53:04,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt +12: [2023-03-16 20:53:04,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt +15: [2023-03-16 20:53:04,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +11: [2023-03-16 20:53:04,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. +11: [2023-03-16 20:53:04,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt +11: [2023-03-16 20:53:04,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +14: [2023-03-16 20:53:04,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. +14: [2023-03-16 20:53:04,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt +14: [2023-03-16 20:53:04,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +10: [2023-03-16 20:53:04,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. +10: [2023-03-16 20:53:04,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt +10: [2023-03-16 20:53:04,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +11: [2023-03-16 20:53:04,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. +11: [2023-03-16 20:53:04,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt +11: [2023-03-16 20:53:04,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 1: [2023-03-16 20:53:04,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. + 1: [2023-03-16 20:53:04,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt + 1: [2023-03-16 20:53:04,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 1: [2023-03-16 20:53:04,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. + 1: [2023-03-16 20:53:04,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt + 1: [2023-03-16 20:53:04,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +13: [2023-03-16 20:53:04,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. +13: [2023-03-16 20:53:04,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt +13: [2023-03-16 20:53:04,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +14: [2023-03-16 20:53:04,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. +14: [2023-03-16 20:53:04,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt +14: [2023-03-16 20:53:04,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +13: [2023-03-16 20:53:04,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. +13: [2023-03-16 20:53:04,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt +13: [2023-03-16 20:53:04,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +14: [2023-03-16 20:53:04,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. +14: [2023-03-16 20:53:04,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt +14: [2023-03-16 20:53:04,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 8: [2023-03-16 20:53:04,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. + 8: [2023-03-16 20:53:04,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt + 8: [2023-03-16 20:53:04,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +10: [2023-03-16 20:53:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. +10: [2023-03-16 20:53:04,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt +10: [2023-03-16 20:53:04,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 5: [2023-03-16 20:53:04,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. + 5: [2023-03-16 20:53:04,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt + 5: [2023-03-16 20:53:04,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 8: [2023-03-16 20:53:04,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. + 8: [2023-03-16 20:53:04,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt + 8: [2023-03-16 20:53:04,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 5: [2023-03-16 20:53:04,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. + 5: [2023-03-16 20:53:04,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt + 5: [2023-03-16 20:53:04,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +13: [2023-03-16 20:53:04,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. +13: [2023-03-16 20:53:04,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt +13: [2023-03-16 20:53:04,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 6: [2023-03-16 20:53:04,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. + 6: [2023-03-16 20:53:04,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt + 6: [2023-03-16 20:53:04,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 7: [2023-03-16 20:53:04,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. + 7: [2023-03-16 20:53:04,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt + 7: [2023-03-16 20:53:04,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +10: [2023-03-16 20:53:04,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. +10: [2023-03-16 20:53:04,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt +10: [2023-03-16 20:53:04,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 7: [2023-03-16 20:53:04,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. + 7: [2023-03-16 20:53:04,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt + 7: [2023-03-16 20:53:04,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. + 7: [2023-03-16 20:53:04,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 7: [2023-03-16 20:53:04,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt + 7: [2023-03-16 20:53:04,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 5: [2023-03-16 20:53:04,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. + 5: [2023-03-16 20:53:04,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt + 5: [2023-03-16 20:53:04,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 3: [2023-03-16 20:53:04,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. + 3: [2023-03-16 20:53:04,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt + 3: [2023-03-16 20:53:04,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +11: [2023-03-16 20:53:04,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. +11: [2023-03-16 20:53:04,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt +11: [2023-03-16 20:53:04,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 0: [2023-03-16 20:53:04,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt + 0: [2023-03-16 20:53:04,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 9: [2023-03-16 20:53:04,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. + 9: [2023-03-16 20:53:04,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt + 9: [2023-03-16 20:53:04,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +10: [2023-03-16 20:53:04,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. +10: [2023-03-16 20:53:04,757] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt +10: [2023-03-16 20:53:04,757] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +15: [2023-03-16 20:53:04,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. +15: [2023-03-16 20:53:04,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt +15: [2023-03-16 20:53:04,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 6: [2023-03-16 20:53:04,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. + 6: [2023-03-16 20:53:04,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt + 6: [2023-03-16 20:53:04,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! +14: [2023-03-16 20:53:04,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. +14: [2023-03-16 20:53:04,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt +14: [2023-03-16 20:53:04,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95 is ready now! + 0: successfully saved checkpoint at iteration 95 to checkpoints_2b8100m100m +END 3325438: Thu 16 Mar 2023 08:53:13 PM EET diff --git a/2b8100m100m/3325710.err b/2b8100m100m/3325710.err new file mode 100644 index 0000000000000000000000000000000000000000..f72c17b5c519fec70a032df36c3241def2c08e81 --- /dev/null +++ b/2b8100m100m/3325710.err @@ -0,0 +1,2218 @@ + 4: 2023-03-16 21:20:31.667457: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 21:20:31.667445: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 21:20:31.667469: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: 2023-03-16 21:20:31.667979: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 21:20:31.667994: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 21:20:31.668003: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 21:20:31.667494: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 21:20:31.667497: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 21:20:31.667934: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 21:20:31.667946: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 21:20:31.667934: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: 2023-03-16 21:20:31.667913: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 21:20:31.667911: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 21:20:31.667665: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 21:20:31.667677: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 21:20:31.667677: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 21:20:31.667527: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 21:20:31.667931: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 21:20:31.667931: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 21:20:31.668015: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 21:20:31.668021: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 4: 2023-03-16 21:20:31.667595: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 21:20:31.667957: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 21:20:31.667966: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 21:20:31.667974: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: 2023-03-16 21:20:31.668013: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 21:20:31.668007: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 21:20:31.668050: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: 2023-03-16 21:20:31.667615: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 21:20:31.667683: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 21:20:31.667921: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 21:20:31.667935: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 21:20:31.667952: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: 2023-03-16 21:20:31.667826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 21:20:31.667865: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 21:20:31.667880: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: 2023-03-16 21:20:31.667695: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 21:20:31.667705: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 21:20:31.667964: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 21:20:31.667990: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 21:20:31.668176: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 21:20:31.668186: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 21:20:31.668210: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: 2023-03-16 21:20:31.667943: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 21:20:31.667951: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 21:20:31.667953: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: 2023-03-16 21:20:31.667872: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 5: 2023-03-16 21:20:31.667880: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 21:20:31.667982: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 21:20:31.667995: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 1: 2023-03-16 21:20:31.668008: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: 2023-03-16 21:20:31.667857: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 21:20:31.667908: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 21:20:31.668374: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 21:20:31.667965: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 21:20:31.668374: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 21:20:31.668386: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 21:20:31.668396: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 21:20:31.667922: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 21:20:31.668205: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 21:20:31.668219: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 21:20:31.668227: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: 2023-03-16 21:20:31.668388: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 0: 2023-03-16 21:20:31.668390: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 21:20:31.668048: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +15: 2023-03-16 21:20:31.668052: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +15: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 21:20:31.668378: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 21:20:31.668404: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 21:20:31.668027: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +13: 2023-03-16 21:20:31.668029: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +13: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 21:20:31.668236: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 21:20:31.668421: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 21:20:31.668423: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +11: 2023-03-16 21:20:31.668259: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +11: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 21:20:31.668070: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 21:20:31.668087: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 3: 2023-03-16 21:20:31.668446: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 21:20:31.668089: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 21:20:31.668102: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +10: 2023-03-16 21:20:31.668118: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +10: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 21:20:31.668979: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 21:20:31.669001: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 21:20:31.669008: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 21:20:31.668976: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 21:20:31.668996: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 21:20:31.669004: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: 2023-03-16 21:20:31.669089: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 21:20:31.669100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 21:20:31.669020: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 21:20:31.668996: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 21:20:31.669024: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 21:20:31.669026: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 21:20:31.669120: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 21:20:31.669129: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 21:20:31.669140: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: 2023-03-16 21:20:31.669142: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 21:20:31.669150: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 21:20:31.669160: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 21:20:31.669041: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 21:20:31.669047: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +12: 2023-03-16 21:20:31.669054: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 21:20:31.669124: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 21:20:31.669158: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 21:20:31.669163: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 21:20:31.669037: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2: 2023-03-16 21:20:31.669039: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 21:20:31.669204: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 21:20:31.669195: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 21:20:31.669195: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 21:20:31.669140: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 8: 2023-03-16 21:20:31.669145: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 8: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 21:20:31.669168: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 21:20:31.669178: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: 2023-03-16 21:20:31.669180: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: 2023-03-16 21:20:31.669050: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +12: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 9: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 21:20:31.669225: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 21:20:31.669233: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 21:20:31.669250: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 21:20:31.669254: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. +14: 2023-03-16 21:20:31.669259: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA +14: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 21:20:31.669377: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 21:20:31.669388: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 21:20:31.669377: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 21:20:31.669409: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 21:20:31.669418: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 21:20:31.669419: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 21:20:31.669416: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 7: 2023-03-16 21:20:31.669465: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA + 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. + 6: 2023-03-16 21:20:48.240282: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:20:48.240229: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 2023-03-16 21:20:48.240317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 2023-03-16 21:20:48.240442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:20:48.240259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 2023-03-16 21:20:48.240703: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.240494: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:20:48.240333: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:20:48.240278: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:20:48.240549: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:20:48.240365: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:20:48.240299: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:20:48.240719: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.240479: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:20:48.240355: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:20:48.240294: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:20:48.240533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:20:48.240368: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:20:48.240313: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:20:48.240598: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:20:48.240386: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:20:48.240335: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:20:48.240730: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 21:20:48.240735: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.240533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:20:48.240391: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:20:48.240323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 6: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:20:48.240552: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 4: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:20:48.240751: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 21:20:48.240758: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 21:20:48.240761: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 6: 2023-03-16 21:20:48.240769: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:20:48.241381: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 21:20:48.241399: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.241412: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 21:20:48.241420: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 21:20:48.241424: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 21:20:48.241438: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 21:20:48.241439: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 21:20:48.241453: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.241432: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.241453: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.241462: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.241468: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.241477: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 4: 2023-03-16 21:20:48.241466: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.241479: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +11: 2023-03-16 21:20:48.241488: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 21:20:48.241475: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:20:48.241499: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:20:48.241505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:20:48.241785: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 21:20:48.241803: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 21:20:48.241806: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 21:20:48.241524: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:20:48.241533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:20:48.241567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:20:48.241538: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:20:48.241833: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 21:20:48.241546: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 9: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:20:48.241842: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 21:20:48.241859: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 21:20:48.241864: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 9: 2023-03-16 21:20:48.241876: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 21:20:48.249187: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:20:48.249216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:20:48.249234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 21:20:48.249883: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:20:48.249268: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:20:48.249903: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 21:20:48.249285: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:20:48.250165: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 21:20:48.249275: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 21:20:48.249925: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:20:48.250179: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 21:20:48.249298: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 21:20:48.249913: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:20:48.249285: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 2023-03-16 21:20:48.249932: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:20:48.249806: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 21:20:48.249945: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 21:20:48.249825: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 21:20:48.249838: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 21:20:48.249852: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 21:20:48.249847: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:20:48.249862: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 21:20:48.249935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +12: 2023-03-16 21:20:48.249862: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:20:48.250203: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 21:20:48.250204: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +12: 2023-03-16 21:20:48.249872: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 21:20:48.249947: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 7: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:20:48.250206: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 21:20:48.250220: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 21:20:48.250222: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 7: 2023-03-16 21:20:48.250230: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 21:20:48.250572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:20:48.250592: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:20:48.250605: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:20:48.250617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:20:48.250629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:20:48.250611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 21:20:48.250639: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:20:48.250653: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:20:48.250657: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 2023-03-16 21:20:48.250644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:20:48.251116: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 21:20:48.250654: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 21:20:48.251135: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:20:48.250991: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 21:20:48.250991: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 21:20:48.251147: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 21:20:48.251160: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 21:20:48.250639: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 21:20:48.251170: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 21:20:48.250946: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:20:48.251175: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 21:20:48.250680: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 1: 2023-03-16 21:20:48.251187: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 1: 2023-03-16 21:20:48.251190: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:20:48.251009: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 21:20:48.251024: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:20:48.250677: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 2023-03-16 21:20:48.250963: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:20:48.250652: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 2023-03-16 21:20:48.250982: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:20:48.251032: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:20:48.251235: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 21:20:48.250688: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 2023-03-16 21:20:48.250992: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +10: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:20:48.251065: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 21:20:48.251061: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 21:20:48.251063: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:20:48.251251: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 21:20:48.251019: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:20:48.251013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:20:48.251020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:20:48.251261: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 21:20:48.251020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 0: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:20:48.251274: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 21:20:48.251283: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 21:20:48.251281: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 21:20:48.251288: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 0: 2023-03-16 21:20:48.251301: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 21:20:48.251235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 21:20:48.251321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 2023-03-16 21:20:48.251251: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251262: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 21:20:48.251343: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251552: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251279: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 21:20:48.251352: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251569: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251300: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 21:20:48.251372: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251582: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251318: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 21:20:48.251389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 2023-03-16 21:20:48.251480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 21:20:48.251388: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251595: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251604: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 21:20:48.251414: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 3: 2023-03-16 21:20:48.251303: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:20:48.251616: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 21:20:48.251623: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 3: 2023-03-16 21:20:48.251642: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 21:20:48.251502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 21:20:48.251411: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:20:48.251514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 21:20:48.251876: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 21:20:48.251875: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 21:20:48.251888: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 21:20:48.251521: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 21:20:48.251910: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 21:20:48.251922: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 2: 2023-03-16 21:20:48.251925: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:20:48.251533: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 2: 2023-03-16 21:20:48.251949: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 21:20:48.251955: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 21:20:48.251551: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:20:48.251585: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:20:48.251556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:20:48.252045: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 21:20:48.251795: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +13: 2023-03-16 21:20:48.252064: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 21:20:48.252081: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 21:20:48.252093: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 21:20:48.252122: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 21:20:48.252125: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:20:48.251810: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 21:20:48.251857: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:20:48.252133: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +13: 2023-03-16 21:20:48.252136: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 21:20:48.251819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:20:48.251832: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:20:48.251849: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 21:20:48.251889: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:20:48.251852: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 21:20:48.251886: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:20:48.251876: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 21:20:48.251903: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:20:48.251859: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 21:20:48.251922: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:20:48.252322: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 21:20:48.251912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 21:20:48.252339: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:20:48.252349: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 21:20:48.251926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 21:20:48.252362: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 21:20:48.252368: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 8: 2023-03-16 21:20:48.252374: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:20:48.252379: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 21:20:48.251923: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 8: 2023-03-16 21:20:48.252387: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:20:48.252388: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 21:20:48.252409: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 21:20:48.252416: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 21:20:48.252433: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 21:20:48.252438: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 21:20:48.252119: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 + 5: 2023-03-16 21:20:48.252443: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 21:20:48.252448: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. + 5: 2023-03-16 21:20:48.252454: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:20:48.252151: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:20:48.252423: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 21:20:48.252174: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:20:48.252149: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:20:48.252441: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 21:20:48.252193: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:20:48.252179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:20:48.252198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:20:48.252463: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 21:20:48.252216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +14: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:20:48.252468: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 21:20:48.252484: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 21:20:48.252484: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 21:20:48.252504: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +14: 2023-03-16 21:20:48.252503: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 21:20:48.252923: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:20:48.252941: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:20:48.252950: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:20:48.252974: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:20:48.252979: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:20:48.252989: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:20:48.252961: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:20:48.252998: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_46200 +15: 0125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:20:48.253484: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 21:20:48.253503: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 21:20:48.253514: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 21:20:48.253515: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 21:20:48.253533: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 21:20:48.253541: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 21:20:48.253552: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +15: 2023-03-16 21:20:48.253567: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. +10: 2023-03-16 21:21:22.560699: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:21:22.561534: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:21:22.561552: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 21:21:22.561558: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:21:22.561604: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.560736: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 21:21:22.561587: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: 2023-03-16 21:21:22.561616: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.560746: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 21:21:22.561604: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: 2023-03-16 21:21:22.561621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.560763: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 21:21:22.561615: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: 2023-03-16 21:21:22.561856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: 2023-03-16 21:21:22.561632: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.560772: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 21:21:22.561643: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: 2023-03-16 21:21:22.561769: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.560787: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 21:21:22.561649: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: 2023-03-16 21:21:22.561775: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:21:22.561771: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: 2023-03-16 21:21:22.560851: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 21:21:22.561655: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: 2023-03-16 21:21:22.561882: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: 2023-03-16 21:21:22.561824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 21:21:22.561730: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 6: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.561991: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.560850: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 21:21:22.561668: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: 2023-03-16 21:21:22.561884: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +10: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:21:22.561807: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: 2023-03-16 21:21:22.561915: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 21:21:22.561994: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: 2023-03-16 21:21:22.561763: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:21:22.561818: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: 2023-03-16 21:21:22.561921: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 21:21:22.562063: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.561784: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: 2023-03-16 21:21:22.561753: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.561862: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 21:21:22.562036: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:21:22.561839: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: 2023-03-16 21:21:22.561926: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.561875: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.562020: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 21:21:22.562056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: 2023-03-16 21:21:22.561850: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: 2023-03-16 21:21:22.561930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.562100: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.561797: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.561902: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.562043: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 21:21:22.562075: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:21:22.561856: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 8: 2023-03-16 21:21:22.561994: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.562115: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.561810: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: 2023-03-16 21:21:22.561782: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.561919: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.562049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 21:21:22.562088: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:21:22.561871: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: 2023-03-16 21:21:22.562125: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.562149: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.561819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: 2023-03-16 21:21:22.561809: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.561920: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.562075: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 21:21:22.562093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:21:22.561936: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.562155: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.561844: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: 2023-03-16 21:21:22.561825: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +12: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.561950: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.562083: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 21:21:22.562110: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.562151: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 21:21:22.562171: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.561840: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: 2023-03-16 21:21:22.561854: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.562089: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 2: 2023-03-16 21:21:22.562192: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +11: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.562172: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 21:21:22.562177: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:21:22.561879: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.562280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.562181: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 7: 2023-03-16 21:21:22.562182: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: 2023-03-16 21:21:22.562096: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.562193: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 0: 2023-03-16 21:21:22.562104: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.562197: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +13: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.562212: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.562274: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 4: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:21:22.564047: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: 2023-03-16 21:21:22.564056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:21:22.564052: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.564056: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 21:21:22.564048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.564057: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 21:21:22.564051: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.564057: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 21:21:22.564052: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:21:22.564228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 21:21:22.564225: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.564059: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 21:21:22.564055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.564060: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 21:21:22.564057: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:21:22.564063: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 21:21:22.564066: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 21:21:22.564063: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 21:21:22.564230: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 21:21:22.564226: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 21:21:22.564061: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:21:22.564072: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 21:21:22.564073: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 21:21:22.564064: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 21:21:22.564227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 21:21:22.564228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 6: 2023-03-16 21:21:22.564074: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +10: 2023-03-16 21:21:22.564076: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 21:21:22.564076: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 6: 2023-03-16 21:21:22.564074: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 21:21:22.564076: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 6: 2023-03-16 21:21:22.564080: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 21:21:22.564077: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 21:21:22.564079: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 21:21:22.564080: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 21:21:22.564234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +10: 2023-03-16 21:21:22.564081: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 21:21:22.564083: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +10: 2023-03-16 21:21:22.564084: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:21:22.564238: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 21:21:22.564228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:21:22.564243: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 21:21:22.564241: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:21:22.564234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 21:21:22.564228: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:21:22.564235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 21:21:22.564233: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:21:22.564242: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 21:21:22.564253: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: 2023-03-16 21:21:22.564254: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 21:21:22.564257: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 8: 2023-03-16 21:21:22.564258: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 21:21:22.562009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +15: 2023-03-16 21:21:22.564236: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 21:21:22.564299: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 8: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.564394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: 2023-03-16 21:21:22.564507: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 21:21:22.564236: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 8: 2023-03-16 21:21:22.564313: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:21:22.564245: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 21:21:22.564245: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 21:21:22.564395: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +15: 2023-03-16 21:21:22.564251: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 21:21:22.564250: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 21:21:22.564251: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +15: 2023-03-16 21:21:22.564252: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 21:21:22.564254: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +15: 2023-03-16 21:21:22.564255: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 21:21:22.564394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: 2023-03-16 21:21:22.564511: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.564396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: 2023-03-16 21:21:22.564514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.564400: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.564397: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:21:22.564718: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: 2023-03-16 21:21:22.564403: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: 2023-03-16 21:21:22.564514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.564406: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: 2023-03-16 21:21:22.564515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: 2023-03-16 21:21:22.564723: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.564414: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 21:21:22.564414: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 21:21:22.564751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 5: 2023-03-16 21:21:22.564418: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 21:21:22.564420: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 21:21:22.564421: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 21:21:22.564516: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: 2023-03-16 21:21:22.564722: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 5: 2023-03-16 21:21:22.564421: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 21:21:22.564423: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 5: 2023-03-16 21:21:22.564425: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.564522: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.564517: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: 2023-03-16 21:21:22.564725: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 21:21:22.564816: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 21:21:22.564873: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 21:21:22.564754: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 21:21:22.564812: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 21:21:22.564827: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 21:21:22.564520: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: 2023-03-16 21:21:22.564727: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.564752: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 21:21:22.564820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.564528: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 21:21:22.564532: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:21:22.564817: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 21:21:22.564876: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +11: 2023-03-16 21:21:22.564535: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 21:21:22.564536: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 21:21:22.564537: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 21:21:22.564728: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.564754: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 21:21:22.564821: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +11: 2023-03-16 21:21:22.564540: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +11: 2023-03-16 21:21:22.564541: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:21:22.564734: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 21:21:22.564820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 21:21:22.564880: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.564758: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 21:21:22.564822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 21:21:22.564820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 21:21:22.564877: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.564888: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 21:21:22.564890: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 21:21:22.564760: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 21:21:22.564824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 21:21:22.564818: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 21:21:22.564881: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.564767: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.564894: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 21:21:22.564773: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 21:21:22.564773: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 21:21:22.564775: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 21:21:22.564827: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 21:21:22.564818: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 21:21:22.564881: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 21:21:22.564777: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 21:21:22.564776: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:21:22.564832: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.564860: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 21:21:22.564822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 21:21:22.564825: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 21:21:22.564889: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: 2023-03-16 21:21:22.565089: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.564903: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 21:21:22.564905: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +13: 2023-03-16 21:21:22.564868: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 2: 2023-03-16 21:21:22.564830: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +12: 2023-03-16 21:21:22.564825: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 4: 2023-03-16 21:21:22.564907: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 21:21:22.564911: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +13: 2023-03-16 21:21:22.564876: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 21:21:22.564845: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 21:21:22.564846: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +12: 2023-03-16 21:21:22.564838: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 21:21:22.564839: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: 2023-03-16 21:21:22.564942: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: 2023-03-16 21:21:22.565089: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +13: 2023-03-16 21:21:22.564882: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 21:21:22.564847: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 21:21:22.564850: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 21:21:22.564850: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 21:21:22.564841: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 21:21:22.564843: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 21:21:22.564842: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 4: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 4: 2023-03-16 21:21:22.564955: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 2: 2023-03-16 21:21:22.564851: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 2: 2023-03-16 21:21:22.564853: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 21:21:22.564845: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +12: 2023-03-16 21:21:22.564846: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 21:21:22.565092: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.565093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.565093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.565093: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.565097: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.565107: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 21:21:22.565107: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 21:21:22.565107: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 21:21:22.565113: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 21:21:22.565113: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 21:21:22.565113: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 21:21:22.565115: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 7: 2023-03-16 21:21:22.565137: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 7: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 7: 2023-03-16 21:21:22.565152: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 21:21:22.569338: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.569363: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.569365: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.569375: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.569379: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.569387: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.569471: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.569472: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 9: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.569646: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 21:21:22.569888: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: 2023-03-16 21:21:22.569697: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.569682: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 21:21:22.569924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: 2023-03-16 21:21:22.569731: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.569696: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 21:21:22.569937: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: 2023-03-16 21:21:22.569744: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.569709: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 21:21:22.569974: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: 2023-03-16 21:21:22.569767: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.569738: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 21:21:22.569981: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: 2023-03-16 21:21:22.569787: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.569751: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 21:21:22.569986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: 2023-03-16 21:21:22.569799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.569769: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 21:21:22.570013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: 2023-03-16 21:21:22.569803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.569824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 1: 2023-03-16 21:21:22.570219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 +14: 2023-03-16 21:21:22.569869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/project_462000125 + 3: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: /samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.571356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.571359: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.571372: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 21:21:22.571371: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 21:21:22.571363: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.571362: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.571366: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.571366: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.571386: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 21:21:22.571386: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 21:21:22.571390: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 21:21:22.571389: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 21:21:22.571395: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.571416: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 9: 2023-03-16 21:21:22.571415: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 9: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 9: 2023-03-16 21:21:22.571432: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 21:21:22.571912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:21:22.571927: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 21:21:22.571924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:21:22.571922: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:21:22.571925: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.572127: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: 2023-03-16 21:21:22.571928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:21:22.571928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.572125: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: 2023-03-16 21:21:22.571933: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.572128: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: 2023-03-16 21:21:22.571927: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 1: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 1: 2023-03-16 21:21:22.571949: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 21:21:22.571949: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 21:21:22.571950: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 21:21:22.571952: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 21:21:22.571953: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 21:21:22.571955: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 1: 2023-03-16 21:21:22.571958: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.572128: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:21:22.572269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 21:21:22.572131: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.572143: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 21:21:22.572134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.572134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.572137: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: 2023-03-16 21:21:22.572273: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.572150: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 3: 2023-03-16 21:21:22.572152: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 21:21:22.572152: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 21:21:22.572155: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 21:21:22.572275: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 3: 2023-03-16 21:21:22.572157: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 21:21:22.572157: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 3: 2023-03-16 21:21:22.572158: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:21:22.572277: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:21:22.572279: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:21:22.572281: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:21:22.572280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:21:22.572294: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 21:21:22.572293: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 21:21:22.572293: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 21:21:22.572305: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 21:21:22.572305: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 21:21:22.572305: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 21:21:22.572313: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. +14: 2023-03-16 21:21:22.572374: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro +14: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 +14: 2023-03-16 21:21:22.572392: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 21:21:22.564739: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 21:21:22.564739: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 21:21:22.564742: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 21:21:22.564742: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 21:21:22.564744: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 21:21:22.564777: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:21:22.564792: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: 2023-03-16 21:21:22.564802: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps/aws-ofi-rccl:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/rccl/rccl-develop-release/rccl/lib:/pfs/lustrep4/projappl/project_462000075/samantao-public/rocm/glibc/selected:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hip/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/hsa/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/llvm:/pfs/lustrep2/projappl/pro + 0: ject_462000125/samantao-public/apps/suse-repo-deps/lib64:/pfs/lustrep2/projappl/project_462000125/samantao-public/apps/suse-repo-deps/usr/lib64:/opt/cray/pe/python/3.9.12.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.0.0/lib64 + 0: 2023-03-16 21:21:22.564816: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module scaled_upper_triang_masked_softmax_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module scaled_upper_triang_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module scaled_masked_softmax_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module scaled_masked_softmax_cuda... + 0: Successfully preprocessed all matching files. + 0: Detected CUDA files, patching ldflags + 0: Emitting ninja build file /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... + 0: Building extension module fused_mix_prec_layer_norm_cuda... + 0: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 0: Loading extension module fused_mix_prec_layer_norm_cuda... + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. +12: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 0: Successfully preprocessed all matching files. + 1: Successfully preprocessed all matching files. + 1: Successfully preprocessed all matching files. + 1: Successfully preprocessed all matching files. +14: Successfully preprocessed all matching files. +12: Successfully preprocessed all matching files. +11: Successfully preprocessed all matching files. +11: Successfully preprocessed all matching files. +11: Successfully preprocessed all matching files. +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +10: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +10: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +11: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +11: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +12: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +12: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +14: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +14: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( +15: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +15: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 5: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 5: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 7: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 7: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 6: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 6: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 1: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 1: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 4: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 4: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( + 2: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 2: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( +13: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead +13: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 8: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 8: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 3: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 3: warnings.warn( + 9: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 9: warnings.warn( + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead + 0: warnings.warn( + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 6: Building extension module utils... + 6: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: + 2: + 2: + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: + 3: + 3: + 3: + 3: + 3: + 3: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 4: + 4: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: + 5: + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: + 8: + 8: + 8: +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: +10: +10: +10: +10: +10: +10: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Emitting ninja build file /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu/utils/build.ninja... + 8: Building extension module utils... + 8: Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) + 8: Loading extension module utils... + 0: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 7: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... + 9: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... +11: Loading extension module utils... + 8: Loading extension module utils... + 8: Loading extension module utils... + 8: Loading extension module utils... + 8: Loading extension module utils... + 8: Loading extension module utils... + 8: Loading extension module utils... + 8: Loading extension module utils... +13: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +14: Loading extension module utils... +13: Loading extension module utils... +14: Loading extension module utils... +13: Loading extension module utils... +14: Loading extension module utils... +13: Loading extension module utils... +14: Loading extension module utils... +15: Loading extension module utils... +13: Loading extension module utils... +12: Loading extension module utils... +14: Loading extension module utils... +13: Loading extension module utils... +13: Loading extension module utils... +12: Loading extension module utils... +12: Loading extension module utils... +15: Loading extension module utils... +12: Loading extension module utils... +15: Loading extension module utils... +12: Loading extension module utils... +15: Loading extension module utils... +12: Loading extension module utils... +12: Loading extension module utils... +15: Loading extension module utils... +15: Loading extension module utils... +12: Loading extension module utils... +15: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... +10: Loading extension module utils... + 0: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... +11: Loading extension module utils... + 7: Loading extension module utils... + 9: Loading extension module utils... +14: Loading extension module utils... + 0: Loading extension module utils... +13: Loading extension module utils... +15: Loading extension module utils... + 0: Loading extension module utils... + 0: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 0: Loading extension module utils... + 1: Loading extension module utils... + 1: Loading extension module utils... + 6: Loading extension module utils... + 6: Loading extension module utils...Loading extension module utils... + 6: Loading extension module utils... + 6: + 3: Loading extension module utils... + 2: Loading extension module utils... + 3: Loading extension module utils... + 2: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 2: Loading extension module utils... + 3: Loading extension module utils... + 2: Loading extension module utils... + 3: Loading extension module utils... + 2: Loading extension module utils... + 3: Loading extension module utils... + 3: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 2: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 4: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 0: Loading extension module utils... + 5: Loading extension module utils... + 0: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... + 5: Loading extension module utils... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +11: +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: Loading extension module utils... +11: +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils... +11: No modifications detected for re-loaded extension module utils, skipping build step... +11: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +11: +11: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 8: + 8: Loading extension module utils...Loading extension module utils... + 8: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... +12: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 8: + 8: + 8: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 8: + 8: +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... + 8: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 8: No modifications detected for re-loaded extension module utils, skipping build step... + 8: Loading extension module utils... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +12: +12: Loading extension module utils...Loading extension module utils... +12: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +12: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +12: +12: +12: Loading extension module utils... +12: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... +12: No modifications detected for re-loaded extension module utils, skipping build step... +12: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 9: + 9: + 9: + 9: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 9: Loading extension module utils... + 9: + 9: + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 9: No modifications detected for re-loaded extension module utils, skipping build step... + 9: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +13: +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +13: + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... +13: Loading extension module utils... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... +13: No modifications detected for re-loaded extension module utils, skipping build step... +13: Loading extension module utils... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: + 7: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 7: Loading extension module utils... + 7: No modifications detected for re-loaded extension module utils, skipping build step... + 7: Loading extension module utils... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: No modifications detected for re-loaded extension module utils, skipping build step... + 7: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 7: + 7: + 7: Loading extension module utils...Loading extension module utils...Loading extension module utils... + 7: + 7: +14: Loading extension module utils... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +14: +14: Loading extension module utils... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: Loading extension module utils... +14: No modifications detected for re-loaded extension module utils, skipping build step... +14: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +14: +14: Loading extension module utils... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +15: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +15: +15: +15: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Loading extension module utils...Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +15: +15: +15: +15: +15: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... +15: No modifications detected for re-loaded extension module utils, skipping build step... +15: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +10: +10: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... +10: +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step... +10: Loading extension module utils... +10: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... +10: +10: Loading extension module utils... +10: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 6: No modifications detected for re-loaded extension module utils, skipping build step... + 6: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 1: No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + 1: + 1: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 1: No modifications detected for re-loaded extension module utils, skipping build step... + 1: Loading extension module utils... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 5: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 4: No modifications detected for re-loaded extension module utils, skipping build step... + 4: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 5: No modifications detected for re-loaded extension module utils, skipping build step... + 5: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 3: Loading extension module utils... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root...Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 3: Loading extension module utils... + 2: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 2: + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + 2: + 2: No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + 2: + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 2: No modifications detected for re-loaded extension module utils, skipping build step... + 2: Loading extension module utils... + 0: Using /pfs/lustrep4/users/muennighoff/.cache/torch_extensions/py39_cpu as PyTorch extensions root... + 0: No modifications detected for re-loaded extension module utils, skipping build step... + 0: Loading extension module utils... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/utils.py:349: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + 0: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") diff --git a/2b8100m100m/3325710.out b/2b8100m100m/3325710.out new file mode 100644 index 0000000000000000000000000000000000000000..ab10ed4806d41adb035436e0919637ba7fe2792e --- /dev/null +++ b/2b8100m100m/3325710.out @@ -0,0 +1,20535 @@ +Model parameters: d_model 2560 ffw_size 10240 kv_size 128 n_heads 20 n_layers 34 +Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 34 --hidden-size 2560 --num-attention-heads 20 --kv-channels 128 --ffn-hidden-size 10240 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 2 --global-batch-size 512 --train-samples 1 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-2b8100m100mval --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 1 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --no-load-optim --reset-progress --override-lr-scheduler --log-interval 10 --save-interval 10000 --eval-interval 1 --eval-iters 100 --eval-only true --tensorboard-dir tensorboard_2b8100m100mval --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_2b8100m100m --load checkpoints_2b8100m100m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3325710.json --zero-stage 0 +START 3325710: Thu 16 Mar 2023 09:19:29 PM EET + 0: + 0: + 0: ======================= ROCm System Management Interface ======================= + 0: ================================= Concise Info ================================= + 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 0: 0 44.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 2 38.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 4 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: 6 38.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 0: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 0: ================================================================================ + 0: ============================= End of ROCm SMI Log ============================== + 6: + 6: + 6: ======================= ROCm System Management Interface ======================= + 6: ================================= Concise Info ================================= + 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 6: 0 41.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 2 39.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 4 44.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: 6 42.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 6: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 6: ================================================================================ + 6: ============================= End of ROCm SMI Log ============================== +10: +10: +10: ======================= ROCm System Management Interface ======================= +10: ================================= Concise Info ================================= +10: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +10: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 2 41.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 4 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: 6 39.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +10: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +10: ================================================================================ +10: ============================= End of ROCm SMI Log ============================== + 4: + 4: + 4: ======================= ROCm System Management Interface ======================= + 4: ================================= Concise Info ================================= + 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 4: 0 46.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 4 43.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: 6 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 4: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 4: ================================================================================ + 4: ============================= End of ROCm SMI Log ============================== +15: +15: +15: ======================= ROCm System Management Interface ======================= +15: ================================= Concise Info ================================= +15: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +15: 0 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 2 37.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 4 44.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: 6 36.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +15: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +15: ================================================================================ +15: ============================= End of ROCm SMI Log ============================== + 8: + 8: + 8: ======================= ROCm System Management Interface ======================= + 8: ================================= Concise Info ================================= + 8: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 8: 0 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 1 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 2 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 4 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: 6 42.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 8: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 8: ================================================================================ + 8: ============================= End of ROCm SMI Log ============================== + 3: + 3: + 3: ======================= ROCm System Management Interface ======================= + 3: ================================= Concise Info ================================= + 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 3: 0 43.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 2 39.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 4 40.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: 6 40.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 3: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 3: ================================================================================ + 3: ============================= End of ROCm SMI Log ============================== + 5: + 5: + 5: ======================= ROCm System Management Interface ======================= + 5: ================================= Concise Info ================================= + 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 5: 0 48.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 2 42.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 4 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: 6 40.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 5: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 5: ================================================================================ + 5: ============================= End of ROCm SMI Log ============================== + 7: + 7: + 7: ======================= ROCm System Management Interface ======================= + 7: ================================= Concise Info ================================= + 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 7: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 2 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 4 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: 6 38.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 7: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 7: ================================================================================ + 7: ============================= End of ROCm SMI Log ============================== + 9: + 9: + 9: ======================= ROCm System Management Interface ======================= + 9: ================================= Concise Info ================================= + 9: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 9: 0 47.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 2 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 4 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: 6 39.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 9: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 9: ================================================================================ + 9: ============================= End of ROCm SMI Log ============================== +12: +12: +12: ======================= ROCm System Management Interface ======================= +12: ================================= Concise Info ================================= +12: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +12: 0 48.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 4 48.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 5 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: 6 45.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +12: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +12: ================================================================================ +12: ============================= End of ROCm SMI Log ============================== + 1: + 1: + 1: ======================= ROCm System Management Interface ======================= + 1: ================================= Concise Info ================================= + 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 1: 0 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 2 48.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 4 41.0c 99.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: 6 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 1: ================================================================================ + 1: ============================= End of ROCm SMI Log ============================== +13: +13: +13: ======================= ROCm System Management Interface ======================= +13: ================================= Concise Info ================================= +13: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +13: 0 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 2 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 4 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: 6 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +13: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +13: ================================================================================ +13: ============================= End of ROCm SMI Log ============================== + 2: + 2: + 2: ======================= ROCm System Management Interface ======================= + 2: ================================= Concise Info ================================= + 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% + 2: 0 45.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 2 43.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 4 41.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: 6 38.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% + 2: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% + 2: ================================================================================ + 2: ============================= End of ROCm SMI Log ============================== +14: +14: +14: ======================= ROCm System Management Interface ======================= +14: ================================= Concise Info ================================= +14: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +14: 0 46.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 2 42.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 4 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: 6 44.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +14: 7 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +14: ================================================================================ +14: ============================= End of ROCm SMI Log ============================== +11: +11: +11: ======================= ROCm System Management Interface ======================= +11: ================================= Concise Info ================================= +11: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% +11: 0 48.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 2 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 4 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: 6 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% +11: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% +11: ================================================================================ +11: ============================= End of ROCm SMI Log ============================== +12: Launching on nid005296 (12/16), master nid005284 port 9999, GPUs 8, CUDA: True +14: Launching on nid005298 (14/16), master nid005284 port 9999, GPUs 8, CUDA: True + 8: Launching on nid005292 (8/16), master nid005284 port 9999, GPUs 8, CUDA: True + 1: Launching on nid005285 (1/16), master nid005284 port 9999, GPUs 8, CUDA: True + 7: Launching on nid005291 (7/16), master nid005284 port 9999, GPUs 8, CUDA: True + 4: Launching on nid005288 (4/16), master nid005284 port 9999, GPUs 8, CUDA: True + 0: Launching on nid005284 (0/16), master nid005284 port 9999, GPUs 8, CUDA: True + 6: Launching on nid005290 (6/16), master nid005284 port 9999, GPUs 8, CUDA: True +15: Launching on nid005299 (15/16), master nid005284 port 9999, GPUs 8, CUDA: True +13: Launching on nid005297 (13/16), master nid005284 port 9999, GPUs 8, CUDA: True +10: Launching on nid005294 (10/16), master nid005284 port 9999, GPUs 8, CUDA: True + 2: Launching on nid005286 (2/16), master nid005284 port 9999, GPUs 8, CUDA: True +11: Launching on nid005295 (11/16), master nid005284 port 9999, GPUs 8, CUDA: True + 5: Launching on nid005289 (5/16), master nid005284 port 9999, GPUs 8, CUDA: True + 9: Launching on nid005293 (9/16), master nid005284 port 9999, GPUs 8, CUDA: True + 3: Launching on nid005287 (3/16), master nid005284 port 9999, GPUs 8, CUDA: True + 0: using world size: 128, data-parallel-size: 128, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 + 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. + 0: using torch.bfloat16 for parameters ... + 0: ------------------------ arguments ------------------------ + 0: abort_on_unmet_fused_kernel_constraints ......... False + 0: accumulate_allreduce_grads_in_fp32 .............. True + 0: adam_beta1 ...................................... 0.9 + 0: adam_beta2 ...................................... 0.999 + 0: adam_eps ........................................ 1e-08 + 0: adlr_autoresume ................................. False + 0: adlr_autoresume_interval ........................ 1000 + 0: apply_query_key_layer_scaling ................... True + 0: apply_residual_connection_post_layernorm ........ False + 0: attention_dropout ............................... 0.1 + 0: attention_softmax_in_fp32 ....................... False + 0: bert_binary_head ................................ True + 0: bert_load ....................................... None + 0: bf16 ............................................ True + 0: bias_dropout_fusion ............................. True + 0: bias_gelu_fusion ................................ True + 0: biencoder_projection_dim ........................ 0 + 0: biencoder_shared_query_context_model ............ False + 0: block_data_path ................................. None + 0: checkpoint_activations .......................... False + 0: checkpoint_in_cpu ............................... False + 0: checkpoint_num_layers ........................... 1 + 0: clip_grad ....................................... 1.0 + 0: codecarbon_dir .................................. None + 0: consumed_train_samples .......................... 0 + 0: consumed_train_tokens ........................... 0 + 0: consumed_valid_samples .......................... 0 + 0: contigious_checkpointing ........................ False + 0: cpu_optimizer ................................... False + 0: cpu_torch_adam .................................. False + 0: curriculum_learning ............................. False + 0: data_impl ....................................... mmap + 0: data_parallel_size .............................. 128 + 0: data_path ....................................... None + 0: dataloader_type ................................. single + 0: DDP_impl ........................................ local + 0: decoder_seq_length .............................. None + 0: deepscale ....................................... False + 0: deepscale_config ................................ None + 0: deepspeed ....................................... True + 0: deepspeed_activation_checkpointing .............. False + 0: deepspeed_config ................................ ds_configs/3325710.json + 0: deepspeed_mpi ................................... False + 0: distribute_checkpointed_activations ............. False + 0: distributed_backend ............................. nccl + 0: embed_layernorm ................................. False + 0: embedding_path .................................. None + 0: encoder_seq_length .............................. 2048 + 0: eod_mask_loss ................................... False + 0: eval_interval ................................... 1 + 0: eval_iters ...................................... 100 + 0: eval_only ....................................... True + 0: evidence_data_path .............................. None + 0: exit_duration_in_mins ........................... None + 0: exit_interval ................................... None + 0: ffn_hidden_size ................................. 10240 + 0: finetune ........................................ False + 0: fp16 ............................................ False + 0: fp16_lm_cross_entropy ........................... False + 0: fp32_residual_connection ........................ False + 0: gigaflos_no_embeds .............................. 0 + 0: global_batch_size ............................... 512 + 0: glu_activation .................................. None + 0: hidden_dropout .................................. 0.1 + 0: hidden_size ..................................... 2560 + 0: hysteresis ...................................... 2 + 0: ict_head_size ................................... None + 0: ict_load ........................................ None + 0: img_dim ......................................... 224 + 0: indexer_batch_size .............................. 128 + 0: indexer_log_interval ............................ 1000 + 0: inference ....................................... False + 0: init_method_std ................................. 0.02 + 0: init_method_xavier_uniform ...................... False + 0: initial_loss_scale .............................. 4294967296 + 0: kill_switch_path ................................ kill-switch-2b8100m100mval + 0: kv_channels ..................................... 128 + 0: layer_norm_fusion ............................... True + 0: layernorm_epsilon ............................... 1e-05 + 0: lazy_mpu_init ................................... None + 0: load ............................................ checkpoints_2b8100m100m + 0: local_rank ...................................... None + 0: log_batch_size_to_tensorboard ................... True + 0: log_interval .................................... 10 + 0: log_learning_rate_to_tensorboard ................ True + 0: log_level ....................................... None + 0: log_level_replica ............................... None + 0: log_loss_scale_to_tensorboard ................... True + 0: log_num_zeros_in_grad ........................... False + 0: log_params_norm ................................. False + 0: log_path ........................................ None + 0: log_timers_to_tensorboard ....................... True + 0: log_validation_ppl_to_tensorboard ............... True + 0: loss_on_targets_only ............................ False + 0: loss_scale ...................................... None + 0: loss_scale_window ............................... 1000 + 0: lr .............................................. 0.0002 + 0: lr_decay_iters .................................. None + 0: lr_decay_samples ................................ 1 + 0: lr_decay_style .................................. cosine + 0: lr_decay_tokens ................................. None + 0: lr_warmup_fraction .............................. None + 0: lr_warmup_iters ................................. 0 + 0: lr_warmup_samples ............................... 0 + 0: make_vocab_size_divisible_by .................... 128 + 0: mask_prob ....................................... 0.15 + 0: masked_softmax_fusion ........................... True + 0: max_position_embeddings ......................... 2048 + 0: mean_noise_span_length .......................... None + 0: memory_centric_tiled_linear ..................... False + 0: merge_file ...................................... gpt2/merges.txt + 0: micro_batch_size ................................ 2 + 0: min_loss_scale .................................. 1.0 + 0: min_lr .......................................... 2e-05 + 0: mmap_warmup ..................................... False + 0: no_load_optim ................................... True + 0: no_load_rng ..................................... None + 0: no_save_optim ................................... None + 0: no_save_rng ..................................... None + 0: noise_density ................................... None + 0: num_attention_heads ............................. 20 + 0: num_channels .................................... 3 + 0: num_classes ..................................... 1000 + 0: num_layers ...................................... 34 + 0: num_layers_per_virtual_pipeline_stage ........... None + 0: num_workers ..................................... 2 + 0: onnx_safe ....................................... None + 0: openai_gelu ..................................... False + 0: optimizer ....................................... adam + 0: optimizer_fusion ................................ True + 0: override_lr_scheduler ........................... True + 0: pad_vocab_size_to ............................... None + 0: params_dtype .................................... torch.bfloat16 + 0: partition_activations ........................... False + 0: patch_dim ....................................... 16 + 0: pipeline_model_parallel_size .................... 1 + 0: position_embedding_type ......................... PositionEmbeddingType.absolute + 0: pp_partition_method ............................. None + 0: profile_backward ................................ False + 0: query_in_block_prob ............................. 0.1 + 0: rampup_batch_size ............................... None + 0: rank ............................................ 0 + 0: remote_device ................................... none + 0: reset_attention_mask ............................ False + 0: reset_position_ids .............................. False + 0: reset_progress .................................. True + 0: retriever_report_topk_accuracies ................ [] + 0: retriever_score_scaling ......................... False + 0: retriever_seq_length ............................ 256 + 0: reweight_loss_based_on_position_frequency ....... False + 0: sample_rate ..................................... 1.0 + 0: save ............................................ checkpoints_2b8100m100m + 0: save_interval ................................... 10000 + 0: scatter_gather_tensors_in_pipeline .............. True + 0: scattered_embeddings ............................ False + 0: seed ............................................ 1234 + 0: seq_length ...................................... 2048 + 0: sgd_momentum .................................... 0.9 + 0: short_seq_prob .................................. 0.1 + 0: skip_train_iteration_range ...................... None + 0: split ........................................... None + 0: split_transformers .............................. False + 0: sync_tp_duplicated_parameters ................... False + 0: synchronize_each_layer .......................... False + 0: tensor_model_parallel_size ...................... 1 + 0: tensorboard_dir ................................. tensorboard_2b8100m100mval + 0: tensorboard_log_interval ........................ 1 + 0: tensorboard_queue_size .......................... 5 + 0: test_weighted_split_paths ....................... None + 0: test_weighted_split_paths_path .................. None + 0: tile_factor ..................................... 1 + 0: titles_data_path ................................ None + 0: tokenizer_name_or_path .......................... None + 0: tokenizer_type .................................. GPT2BPETokenizer + 0: train_iters ..................................... None + 0: train_samples ................................... 1 + 0: train_tokens .................................... None + 0: train_weighted_split_names ...................... ['train'] + 0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] + 0: train_weighted_split_paths_path ................. None + 0: train_weighted_split_splits ..................... [['0:1']] + 0: train_weighted_split_weights .................... [['1.0']] + 0: universal_checkpoint ............................ False + 0: use_bnb_optimizer ............................... False + 0: use_checkpoint_lr_scheduler ..................... False + 0: use_contiguous_buffers_in_ddp ................... True + 0: use_cpu_initialization .......................... None + 0: use_one_sent_docs ............................... False + 0: use_pin_memory .................................. False + 0: valid_num_workers ............................... 2 + 0: valid_weighted_split_names ...................... ['validation'] + 0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] + 0: valid_weighted_split_paths_path ................. None + 0: valid_weighted_split_splits ..................... [['0:1']] + 0: valid_weighted_split_weights .................... [['1.0']] + 0: virtual_pipeline_model_parallel_size ............ None + 0: vocab_extra_ids ................................. 0 + 0: vocab_file ...................................... gpt2/vocab.json + 0: weight_decay .................................... 0.1 + 0: world_size ...................................... 128 + 0: zero_allgather_bucket_size ...................... 0.0 + 0: zero_contigious_gradients ....................... False + 0: zero_reduce_bucket_size ......................... 0.0 + 0: zero_reduce_scatter ............................. False + 0: zero_stage ...................................... 0 + 0: -------------------- end of arguments --------------------- + 0: setting number of micro-batches to constant 2 + 0: > building GPT2BPETokenizer tokenizer ... + 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) + 0: DeepSpeed general environment info: + 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] + 0: torch version .................... 1.13.0+rocm5.2 + 0: torch cuda version ............... None + 0: torch hip version ................ 5.2.21151-afdc89f8 + 0: nvcc version ..................... None + 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] + 0: deepspeed info ................... 0.7.5, unknown, unknown + 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 + 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** + 0: > initializing torch distributed ... + 0: [2023-03-16 21:22:47,718] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +15: > setting tensorboard ... + 0: > initializing tensor model parallel with size 1 + 0: > initializing pipeline model parallel with size 1 + 0: > setting random seeds to 1234 ... + 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 + 0: > compiling dataset index builder ... + 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: make: Nothing to be done for 'default'. + 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' + 0: >>> done with dataset index builder. Compilation time: 0.119 seconds + 0: > compiling and loading fused kernels ... + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 87 + 0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.cuda.o scaled_upper_triang_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 63 + 0: ninja: no work to do. + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] + 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] + 0: Total number of unsupported CUDA function calls: 0 + 0: + 0: + 0: Total number of replaced kernel launches: 67 + 0: ninja: no work to do. + 0: >>> done with compiling and loading fused kernels. Compilation time: 30.082 seconds + 0: time to initialize megatron (seconds): 43.467 + 0: [after megatron is initialized] datetime: 2023-03-16 21:23:23 + 0: building GPT model ... + 0: [2023-03-16 21:23:23,610] [INFO] [utils.py:827:see_memory_usage] Before Building Model + 0: [2023-03-16 21:23:23,611] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB + 0: [2023-03-16 21:23:23,611] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.62 GB, percent = 6.1% + 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None + 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi + 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 + 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63, ProcessCoord(pipe=0, data=64, model=0): 64, ProcessCoord(pipe=0, data=65, model=0): 65, ProcessCoord(pipe=0, data=66, model=0): 66, ProcessCoord(pipe=0, data=67, model=0): 67, ProcessCoord(pipe=0, data=68, model=0): 68, ProcessCoord(pipe=0, data=69, model=0): + 0: 69, ProcessCoord(pipe=0, data=70, model=0): 70, ProcessCoord(pipe=0, data=71, model=0): 71, ProcessCoord(pipe=0, data=72, model=0): 72, ProcessCoord(pipe=0, data=73, model=0): 73, ProcessCoord(pipe=0, data=74, model=0): 74, ProcessCoord(pipe=0, data=75, model=0): 75, ProcessCoord(pipe=0, data=76, model=0): 76, ProcessCoord(pipe=0, data=77, model=0): 77, ProcessCoord(pipe=0, data=78, model=0): 78, ProcessCoord(pipe=0, data=79, model=0): 79, ProcessCoord(pipe=0, data=80, model=0): 80, ProcessCoord(pipe=0, data=81, model=0): 81, ProcessCoord(pipe=0, data=82, model=0): 82, ProcessCoord(pipe=0, data=83, model=0): 83, ProcessCoord(pipe=0, data=84, model=0): 84, ProcessCoord(pipe=0, data=85, model=0): 85, ProcessCoord(pipe=0, data=86, model=0): 86, ProcessCoord(pipe=0, data=87, model=0): 87, ProcessCoord(pipe=0, data=88, model=0): 88, ProcessCoord(pipe=0, data=89, model=0): 89, ProcessCoord(pipe=0, data=90, model=0): 90, ProcessCoord(pipe=0, data=91, model=0): 91, ProcessCoord(pipe=0, data=92, model=0): 92, Process + 0: Coord(pipe=0, data=93, model=0): 93, ProcessCoord(pipe=0, data=94, model=0): 94, ProcessCoord(pipe=0, data=95, model=0): 95, ProcessCoord(pipe=0, data=96, model=0): 96, ProcessCoord(pipe=0, data=97, model=0): 97, ProcessCoord(pipe=0, data=98, model=0): 98, ProcessCoord(pipe=0, data=99, model=0): 99, ProcessCoord(pipe=0, data=100, model=0): 100, ProcessCoord(pipe=0, data=101, model=0): 101, ProcessCoord(pipe=0, data=102, model=0): 102, ProcessCoord(pipe=0, data=103, model=0): 103, ProcessCoord(pipe=0, data=104, model=0): 104, ProcessCoord(pipe=0, data=105, model=0): 105, ProcessCoord(pipe=0, data=106, model=0): 106, ProcessCoord(pipe=0, data=107, model=0): 107, ProcessCoord(pipe=0, data=108, model=0): 108, ProcessCoord(pipe=0, data=109, model=0): 109, ProcessCoord(pipe=0, data=110, model=0): 110, ProcessCoord(pipe=0, data=111, model=0): 111, ProcessCoord(pipe=0, data=112, model=0): 112, ProcessCoord(pipe=0, data=113, model=0): 113, ProcessCoord(pipe=0, data=114, model=0): 114, ProcessCoord(pipe=0, data=115, mo + 0: del=0): 115, ProcessCoord(pipe=0, data=116, model=0): 116, ProcessCoord(pipe=0, data=117, model=0): 117, ProcessCoord(pipe=0, data=118, model=0): 118, ProcessCoord(pipe=0, data=119, model=0): 119, ProcessCoord(pipe=0, data=120, model=0): 120, ProcessCoord(pipe=0, data=121, model=0): 121, ProcessCoord(pipe=0, data=122, model=0): 122, ProcessCoord(pipe=0, data=123, model=0): 123, ProcessCoord(pipe=0, data=124, model=0): 124, ProcessCoord(pipe=0, data=125, model=0): 125, ProcessCoord(pipe=0, data=126, model=0): 126, ProcessCoord(pipe=0, data=127, model=0): 127} + 0: [2023-03-16 21:23:27,690] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer + 0: stage=0 layers=41 + 0: 0: _to_float16 + 0: 1: EmbeddingPipe + 0: 2: + 0: 3: ParallelTransformerLayerPipe + 0: 4: ParallelTransformerLayerPipe + 0: 5: ParallelTransformerLayerPipe + 0: 6: ParallelTransformerLayerPipe + 0: 7: ParallelTransformerLayerPipe + 0: 8: ParallelTransformerLayerPipe + 0: 9: ParallelTransformerLayerPipe + 0: 10: ParallelTransformerLayerPipe + 0: 11: ParallelTransformerLayerPipe + 0: 12: ParallelTransformerLayerPipe + 0: 13: ParallelTransformerLayerPipe + 0: 14: ParallelTransformerLayerPipe + 0: 15: ParallelTransformerLayerPipe + 0: 16: ParallelTransformerLayerPipe + 0: 17: ParallelTransformerLayerPipe + 0: 18: ParallelTransformerLayerPipe + 0: 19: ParallelTransformerLayerPipe + 0: 20: ParallelTransformerLayerPipe + 0: 21: ParallelTransformerLayerPipe + 0: 22: ParallelTransformerLayerPipe + 0: 23: ParallelTransformerLayerPipe + 0: 24: ParallelTransformerLayerPipe + 0: 25: ParallelTransformerLayerPipe + 0: 26: ParallelTransformerLayerPipe + 0: 27: ParallelTransformerLayerPipe + 0: 28: ParallelTransformerLayerPipe + 0: 29: ParallelTransformerLayerPipe + 0: 30: ParallelTransformerLayerPipe + 0: 31: ParallelTransformerLayerPipe + 0: 32: ParallelTransformerLayerPipe + 0: 33: ParallelTransformerLayerPipe + 0: 34: ParallelTransformerLayerPipe + 0: 35: ParallelTransformerLayerPipe + 0: 36: ParallelTransformerLayerPipe + 0: 37: undo + 0: 38: MixedFusedLayerNorm + 0: 39: EmbeddingPipe + 0: 40: float16_to_fp32 + 0: loss: CrossEntropy + 0: [2023-03-16 21:23:28,176] [INFO] [utils.py:827:see_memory_usage] After Building Model + 0: [2023-03-16 21:23:28,176] [INFO] [utils.py:828:see_memory_usage] MA 5.26 GB Max_MA 5.26 GB CA 5.31 GB Max_CA 5 GB + 0: [2023-03-16 21:23:28,177] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.66 GB, percent = 6.1% + 0: setting training iterations to 0 + 0: > learning rate decay style: cosine + 0: DeepSpeed is enabled. + 0: [2023-03-16 21:23:28,180] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown + 0: [2023-03-16 21:23:44,510] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False + 0: [2023-03-16 21:23:44,511] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer + 0: [2023-03-16 21:23:44,511] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer + 0: [2023-03-16 21:23:44,551] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam + 0: [2023-03-16 21:23:44,551] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer + 0: [2023-03-16 21:23:44,669] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer + 0: [2023-03-16 21:23:44,670] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.27 GB CA 5.32 GB Max_CA 5 GB + 0: [2023-03-16 21:23:44,670] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.35 GB, percent = 6.2% + 6: ninja: no work to do. + 6: Time to load utils op: 0.3598296642303467 seconds + 8: ninja: no work to do. + 6: Time to load utils op: 0.20244431495666504 seconds + 6: Time to load utils op: 0.20249700546264648 seconds + 6: Time to load utils op: 0.2022855281829834 seconds + 0: Time to load utils op: 0.30863237380981445 seconds + 8: Time to load utils op: 0.1901390552520752 seconds + 7: Time to load utils op: 0.20906567573547363 secondsTime to load utils op: 0.20850110054016113 seconds + 7: + 8: Time to load utils op: 0.20482754707336426 seconds + 7: Time to load utils op: 0.20856308937072754 seconds + 7: Time to load utils op: 0.20854425430297852 seconds + 7: Time to load utils op: 0.20868277549743652 seconds + 7: Time to load utils op: 0.20856976509094238 seconds + 7: Time to load utils op: 0.20853829383850098 seconds + 8: Time to load utils op: 0.20484018325805664 seconds + 8: Time to load utils op: 0.20500493049621582 seconds + 8: Time to load utils op: 0.20518946647644043 seconds + 8: Time to load utils op: 0.20528578758239746 seconds + 8: Time to load utils op: 0.20531392097473145 seconds + 8: Time to load utils op: 0.20545291900634766 seconds + 9: Time to load utils op: 0.210296630859375 seconds + 9: Time to load utils op: 0.20964908599853516 seconds + 9: Time to load utils op: 0.2095355987548828 seconds + 9: Time to load utils op: 0.20959973335266113 seconds + 9: Time to load utils op: 0.21025300025939941 seconds + 9: Time to load utils op: 0.21029043197631836 seconds + 9: Time to load utils op: 0.21031904220581055 seconds +11: Time to load utils op: 0.20834136009216309 secondsTime to load utils op: 0.20841002464294434 seconds +11: +11: Time to load utils op: 0.20842957496643066 seconds +11: Time to load utils op: 0.20912647247314453 seconds +11: Time to load utils op: 0.20859479904174805 secondsTime to load utils op: 0.20782160758972168 seconds +11: +11: Time to load utils op: 0.20879697799682617 seconds +13: Time to load utils op: 0.2075202465057373 seconds +13: Time to load utils op: 0.2083115577697754 seconds +13: Time to load utils op: 0.20759248733520508 seconds +13: Time to load utils op: 0.20755910873413086 secondsTime to load utils op: 0.20760059356689453 secondsTime to load utils op: 0.20759892463684082 secondsTime to load utils op: 0.2078089714050293 seconds +13: +13: +13: +14: Time to load utils op: 0.20891904830932617 seconds +14: Time to load utils op: 0.20899724960327148 seconds +14: Time to load utils op: 0.20872998237609863 seconds +14: Time to load utils op: 0.20880818367004395 seconds +14: Time to load utils op: 0.20873451232910156 seconds +14: Time to load utils op: 0.20903444290161133 seconds +14: Time to load utils op: 0.20836567878723145 seconds +12: Time to load utils op: 0.21270251274108887 seconds +12: Time to load utils op: 0.21248912811279297 secondsTime to load utils op: 0.21278834342956543 seconds +12: +15: Time to load utils op: 0.20792031288146973 seconds +15: Time to load utils op: 0.20868420600891113 secondsTime to load utils op: 0.20807409286499023 seconds +15: +12: Time to load utils op: 0.21177268028259277 secondsTime to load utils op: 0.2117302417755127 seconds +12: +12: Time to load utils op: 0.21247577667236328 seconds +12: Time to load utils op: 0.2118091583251953 seconds +15: Time to load utils op: 0.20811986923217773 secondsTime to load utils op: 0.20821666717529297 seconds +15: +12: Time to load utils op: 0.21294307708740234 seconds +15: Time to load utils op: 0.20729708671569824 seconds +15: Time to load utils op: 0.20807933807373047 seconds + 0: Time to load utils op: 0.30301356315612793 seconds + 6: Time to load utils op: 0.0005946159362792969 seconds +10: Time to load utils op: 0.212188720703125 secondsTime to load utils op: 0.21217775344848633 seconds +10: +10: Time to load utils op: 0.21219778060913086 secondsTime to load utils op: 0.21219968795776367 seconds +10: +10: Time to load utils op: 0.21220088005065918 seconds +10: Time to load utils op: 0.2122046947479248 seconds +10: Time to load utils op: 0.21222138404846191 secondsTime to load utils op: 0.21222567558288574 seconds +10: + 6: Time to load utils op: 0.00031685829162597656 seconds + 6: Time to load utils op: 0.0003437995910644531 seconds + 6: Time to load utils op: 0.0003495216369628906 seconds +11: Time to load utils op: 0.504831075668335 seconds + 7: Time to load utils op: 0.5046823024749756 seconds + 9: Time to load utils op: 0.5043535232543945 seconds +14: Time to load utils op: 0.5044524669647217 seconds + 0: Time to load utils op: 0.5053863525390625 seconds +13: Time to load utils op: 0.5050139427185059 seconds +15: Time to load utils op: 0.5049242973327637 seconds + 0: Time to load utils op: 0.4035818576812744 seconds + 0: Time to load utils op: 0.4027702808380127 seconds + 0: Time to load utils op: 0.4026174545288086 seconds + 6: Time to load utils op: 0.4030623435974121 secondsTime to load utils op: 0.4025390148162842 seconds + 6: + 6: Time to load utils op: 0.4036130905151367 seconds + 6: Time to load utils op: 0.40281128883361816 seconds + 1: Time to load utils op: 0.4106442928314209 seconds + 1: Time to load utils op: 0.41065239906311035 seconds + 1: Time to load utils op: 0.41178178787231445 seconds + 1: Time to load utils op: 0.4120042324066162 seconds + 1: Time to load utils op: 0.41181039810180664 seconds + 1: Time to load utils op: 0.4118790626525879 secondsTime to load utils op: 0.4121360778808594 seconds + 1: + 1: Time to load utils op: 0.4136984348297119 seconds +11: Time to load utils op: 0.0004756450653076172 seconds +11: Time to load utils op: 0.0004782676696777344 seconds +11: Time to load utils op: 0.0004405975341796875 seconds +11: Time to load utils op: 0.0004506111145019531 seconds +11: Time to load utils op: 0.00047397613525390625 seconds +11: Time to load utils op: 0.0004143714904785156 seconds +11: Time to load utils op: 0.0004088878631591797 secondsTime to load utils op: 0.0004050731658935547 seconds +11: + 0: Time to load utils op: 0.40323805809020996 seconds + 0: Time to load utils op: 0.4030115604400635 seconds + 3: Time to load utils op: 0.4119706153869629 seconds + 3: Time to load utils op: 0.41199445724487305 seconds + 3: Time to load utils op: 0.4120051860809326 seconds + 3: Time to load utils op: 0.41202855110168457 secondsTime to load utils op: 0.41202712059020996 seconds + 3: + 3: Time to load utils op: 0.4120364189147949 seconds + 3: Time to load utils op: 0.41203975677490234 seconds + 3: Time to load utils op: 0.4120471477508545 seconds + 8: Time to load utils op: 0.00039577484130859375 secondsTime to load utils op: 0.0005207061767578125 seconds + 8: + 8: Time to load utils op: 0.0004916191101074219 seconds + 2: Time to load utils op: 0.4127156734466553 secondsTime to load utils op: 0.4127311706542969 seconds + 2: + 8: Time to load utils op: 0.0004444122314453125 seconds + 2: Time to load utils op: 0.4127838611602783 secondsTime to load utils op: 0.41278529167175293 seconds + 2: + 8: Time to load utils op: 0.0004253387451171875 secondsTime to load utils op: 0.00041031837463378906 secondsTime to load utils op: 0.0004565715789794922 seconds + 8: + 8: + 2: Time to load utils op: 0.4127955436706543 seconds +12: Time to load utils op: 0.0007569789886474609 seconds + 2: Time to load utils op: 0.41280317306518555 secondsTime to load utils op: 0.4128081798553467 seconds + 2: Time to load utils op: 0.412811279296875 seconds + 2: + 8: Time to load utils op: 0.00047326087951660156 seconds +12: Time to load utils op: 0.0009753704071044922 secondsTime to load utils op: 0.0008068084716796875 seconds +12: +12: Time to load utils op: 0.0010709762573242188 secondsTime to load utils op: 0.0011067390441894531 seconds +12: +12: Time to load utils op: 0.0010919570922851562 seconds + 9: Time to load utils op: 0.0004878044128417969 seconds +12: Time to load utils op: 0.0010607242584228516 seconds + 9: Time to load utils op: 0.0005156993865966797 seconds +12: Time to load utils op: 0.001138925552368164 seconds + 9: Time to load utils op: 0.00045943260192871094 secondsTime to load utils op: 0.00043654441833496094 secondsTime to load utils op: 0.0004813671112060547 seconds + 9: + 9: Time to load utils op: 0.0004642009735107422 seconds + 9: + 9: Time to load utils op: 0.0004930496215820312 seconds + 4: Time to load utils op: 0.41251349449157715 seconds + 4: Time to load utils op: 0.41252779960632324 seconds + 4: Time to load utils op: 0.4125518798828125 seconds + 4: Time to load utils op: 0.4125666618347168 seconds + 4: Time to load utils op: 0.41257405281066895 seconds + 4: Time to load utils op: 0.4125642776489258 seconds + 4: Time to load utils op: 0.41258764266967773 seconds + 4: Time to load utils op: 0.4125828742980957 seconds +13: Time to load utils op: 0.0004851818084716797 seconds + 0: Time to load utils op: 0.0005621910095214844 seconds + 0: Time to load utils op: 0.0005743503570556641 seconds +13: Time to load utils op: 0.0004353523254394531 seconds + 0: Time to load utils op: 0.00038313865661621094 seconds +13: Time to load utils op: 0.0005047321319580078 seconds +13: Time to load utils op: 0.0004177093505859375 seconds +13: Time to load utils op: 0.0005123615264892578 seconds +13: Time to load utils op: 0.00040650367736816406 seconds +13: Time to load utils op: 0.0005679130554199219 seconds +13: Time to load utils op: 0.0005664825439453125 seconds + 5: Time to load utils op: 0.4121394157409668 seconds + 5: Time to load utils op: 0.41216421127319336 seconds + 5: Time to load utils op: 0.4121572971343994 seconds + 5: Time to load utils op: 0.41216564178466797 seconds + 5: Time to load utils op: 0.4121866226196289 seconds + 5: Time to load utils op: 0.41219186782836914 secondsTime to load utils op: 0.412198543548584 seconds + 5: + 5: Time to load utils op: 0.4121987819671631 seconds + 7: Time to load utils op: 0.0005159378051757812 seconds + 7: Time to load utils op: 0.0005245208740234375 seconds + 7: Time to load utils op: 0.0005035400390625 seconds + 7: Time to load utils op: 0.0005364418029785156 seconds + 7: Time to load utils op: 0.0005321502685546875 seconds + 7: Time to load utils op: 0.0005838871002197266 secondsTime to load utils op: 0.0006120204925537109 secondsTime to load utils op: 0.0005950927734375 seconds + 7: + 7: +14: Time to load utils op: 0.00045228004455566406 seconds +14: Time to load utils op: 0.00037384033203125 seconds +14: Time to load utils op: 0.0005061626434326172 seconds +14: Time to load utils op: 0.0004494190216064453 seconds +14: Time to load utils op: 0.00046563148498535156 seconds +14: Time to load utils op: 0.00040340423583984375 seconds +14: Time to load utils op: 0.00043773651123046875 secondsTime to load utils op: 0.0004162788391113281 seconds +14: +15: Time to load utils op: 0.0004916191101074219 seconds +15: Time to load utils op: 0.00046896934509277344 secondsTime to load utils op: 0.00047135353088378906 secondsTime to load utils op: 0.0004508495330810547 secondsTime to load utils op: 0.0004401206970214844 seconds +15: +15: Time to load utils op: 0.0005173683166503906 seconds +15: +15: +15: Time to load utils op: 0.00042724609375 seconds +15: Time to load utils op: 0.0005712509155273438 seconds + 0: Time to load utils op: 0.00038909912109375 seconds +10: Time to load utils op: 0.0006735324859619141 seconds +10: Time to load utils op: 0.0005943775177001953 seconds +10: Time to load utils op: 0.0006284713745117188 seconds +10: Time to load utils op: 0.0009889602661132812 seconds +10: Time to load utils op: 0.0007376670837402344 seconds +10: Time to load utils op: 0.0006976127624511719 seconds +10: Time to load utils op: 0.000820159912109375 secondsTime to load utils op: 0.0006151199340820312 seconds +10: + 0: Time to load utils op: 0.0003910064697265625 seconds + 6: Time to load utils op: 0.0003886222839355469 seconds + 6: Time to load utils op: 0.0003781318664550781 seconds + 6: Time to load utils op: 0.00038242340087890625 seconds + 6: Time to load utils op: 0.0003943443298339844 seconds + 0: Time to load utils op: 0.00038123130798339844 seconds + 0: Time to load utils op: 0.0003676414489746094 seconds + 9: Time to load utils op: 0.00042748451232910156 seconds + 1: Time to load utils op: 0.00046515464782714844 seconds + 1: Time to load utils op: 0.0003917217254638672 seconds + 1: Time to load utils op: 0.00048422813415527344 seconds + 1: Time to load utils op: 0.0004520416259765625 secondsTime to load utils op: 0.0004303455352783203 seconds + 1: + 1: Time to load utils op: 0.0005414485931396484 seconds + 1: Time to load utils op: 0.0004417896270751953 seconds + 1: Time to load utils op: 0.00039649009704589844 seconds + 4: Time to load utils op: 0.0010073184967041016 seconds + 4: Time to load utils op: 0.0008435249328613281 seconds + 5: Time to load utils op: 0.0009417533874511719 seconds + 4: Time to load utils op: 0.0011968612670898438 seconds + 4: Time to load utils op: 0.0012617111206054688 seconds + 4: Time to load utils op: 0.001207590103149414 seconds + 4: Time to load utils op: 0.0012249946594238281 seconds + 4: Time to load utils op: 0.0012078285217285156 seconds + 4: Time to load utils op: 0.0013058185577392578 seconds + 5: Time to load utils op: 0.0013842582702636719 seconds + 5: Time to load utils op: 0.0015778541564941406 seconds + 5: Time to load utils op: 0.0015528202056884766 secondsTime to load utils op: 0.0015101432800292969 seconds + 5: + 5: Time to load utils op: 0.0014429092407226562 seconds + 5: Time to load utils op: 0.0014927387237548828 seconds + 5: Time to load utils op: 0.0016520023345947266 seconds + 3: Time to load utils op: 0.0004315376281738281 secondsTime to load utils op: 0.000682830810546875 seconds + 3: + 3: Time to load utils op: 0.0006151199340820312 seconds + 3: Time to load utils op: 0.00044846534729003906 seconds + 3: Time to load utils op: 0.0004489421844482422 seconds + 3: Time to load utils op: 0.00042510032653808594 seconds + 3: Time to load utils op: 0.0004067420959472656 seconds + 3: Time to load utils op: 0.0004885196685791016 seconds + 2: Time to load utils op: 0.0010216236114501953 seconds + 2: Time to load utils op: 0.0008959770202636719 seconds + 2: Time to load utils op: 0.001132965087890625 seconds + 2: Time to load utils op: 0.0013988018035888672 seconds + 2: Time to load utils op: 0.0013632774353027344 seconds + 2: Time to load utils op: 0.0014195442199707031 secondsTime to load utils op: 0.0013887882232666016 seconds + 2: + 2: Time to load utils op: 0.001485586166381836 seconds + 0: [2023-03-16 21:23:45,201] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 + 0: [2023-03-16 21:23:45,202] [INFO] [utils.py:828:see_memory_usage] MA 5.25 GB Max_MA 5.25 GB CA 5.32 GB Max_CA 5 GB + 0: [2023-03-16 21:23:45,202] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.51 GB, percent = 6.3% + 0: [2023-03-16 21:23:45,330] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 + 0: [2023-03-16 21:23:45,331] [INFO] [utils.py:828:see_memory_usage] MA 10.67 GB Max_MA 10.67 GB CA 13.39 GB Max_CA 13 GB + 0: [2023-03-16 21:23:45,331] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.51 GB, percent = 6.3% + 0: [2023-03-16 21:23:45,437] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 + 0: [2023-03-16 21:23:45,438] [INFO] [utils.py:828:see_memory_usage] MA 10.67 GB Max_MA 10.67 GB CA 13.39 GB Max_CA 13 GB + 0: [2023-03-16 21:23:45,438] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.51 GB, percent = 6.3% + 0: [2023-03-16 21:23:45,545] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 + 0: [2023-03-16 21:23:45,546] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 21:23:45,546] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.51 GB, percent = 6.3% + 0: [2023-03-16 21:23:45,650] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 + 0: [2023-03-16 21:23:45,651] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 21:23:45,651] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.51 GB, percent = 6.3% + 0: [2023-03-16 21:23:45,759] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 + 0: [2023-03-16 21:23:45,760] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 21:23:45,760] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.51 GB, percent = 6.3% + 0: [2023-03-16 21:23:45,862] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer + 0: [2023-03-16 21:23:45,862] [INFO] [utils.py:828:see_memory_usage] MA 15.78 GB Max_MA 15.78 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 21:23:45,863] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.51 GB, percent = 6.3% + 0: [2023-03-16 21:23:45,971] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer + 0: [2023-03-16 21:23:45,972] [INFO] [utils.py:828:see_memory_usage] MA 15.94 GB Max_MA 15.94 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 21:23:45,972] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.51 GB, percent = 6.3% + 0: [2023-03-16 21:23:46,075] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer + 0: [2023-03-16 21:23:46,076] [INFO] [utils.py:828:see_memory_usage] MA 15.94 GB Max_MA 15.94 GB CA 21.01 GB Max_CA 21 GB + 0: [2023-03-16 21:23:46,076] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.51 GB, percent = 6.3% + 0: [2023-03-16 21:23:46,076] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam + 0: [2023-03-16 21:23:46,076] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler + 0: [2023-03-16 21:23:46,076] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = + 0: [2023-03-16 21:23:46,076] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] + 0: [2023-03-16 21:23:46,077] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: + 0: [2023-03-16 21:23:46,077] [INFO] [config.py:1011:print] activation_checkpointing_config { + 0: "partition_activations": false, + 0: "contiguous_memory_optimization": false, + 0: "cpu_checkpointing": false, + 0: "number_checkpoints": null, + 0: "synchronize_checkpoint_boundary": false, + 0: "profile": false + 0: } + 0: [2023-03-16 21:23:46,077] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} + 0: [2023-03-16 21:23:46,077] [INFO] [config.py:1011:print] amp_enabled .................. False + 0: [2023-03-16 21:23:46,077] [INFO] [config.py:1011:print] amp_params ................... False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] autotuning_config ............ { + 0: "enabled": false, + 0: "start_step": null, + 0: "end_step": null, + 0: "metric_path": null, + 0: "arg_mappings": null, + 0: "metric": "throughput", + 0: "model_info": null, + 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", + 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", + 0: "overwrite": true, + 0: "fast": true, + 0: "start_profile_step": 3, + 0: "end_profile_step": 5, + 0: "tuner_type": "gridsearch", + 0: "tuner_early_stopping": 5, + 0: "tuner_num_trials": 50, + 0: "model_info_path": null, + 0: "mp_size": 1, + 0: "max_train_batch_size": null, + 0: "min_train_batch_size": 1, + 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, + 0: "min_train_micro_batch_size_per_gpu": 1, + 0: "num_tuning_micro_batch_sizes": 3 + 0: } + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] bfloat16_enabled ............. True + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] comms_config ................. + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] communication_data_type ...... None + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa + 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] curriculum_enabled ........... False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] curriculum_params ............ False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] dataloader_drop_last ......... False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] disable_allgather ............ False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] dump_state ................... False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] elasticity_enabled ........... False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] flops_profiler_config ........ { + 0: "enabled": false, + 0: "profile_step": 1, + 0: "module_depth": -1, + 0: "top_modules": 1, + 0: "detailed": true, + 0: "output_file": null + 0: } + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] fp16_auto_cast ............... None + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] fp16_enabled ................. False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] global_rank .................. 0 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 2 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] load_universal_checkpoint .... False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] loss_scale ................... 1.0 + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] memory_breakdown ............. False + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] monitor_config ............... + 0: [2023-03-16 21:23:46,078] [INFO] [config.py:1011:print] nebula_config ................ { + 0: "enabled": false, + 0: "persistent_storage_path": null, + 0: "persistent_time_interval": 100, + 0: "num_of_version_in_retention": 2, + 0: "enable_nebula_load": true, + 0: "load_path": null + 0: } + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] optimizer_name ............... None + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] optimizer_params ............. None + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] pld_enabled .................. False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] pld_params ................... False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] prescale_gradients ........... False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] scheduler_name ............... None + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] scheduler_params ............. None + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] sparse_attention ............. None + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] steps_per_print .............. 2000 + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] train_batch_size ............. 512 + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 2 + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] use_node_local_storage ....... False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] world_size ................... 128 + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] zero_enabled ................. False + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 + 0: [2023-03-16 21:23:46,079] [INFO] [config.py:996:print_user_config] json = { + 0: "train_micro_batch_size_per_gpu": 2, + 0: "train_batch_size": 512, + 0: "gradient_clipping": 1.0, + 0: "zero_optimization": { + 0: "stage": 0 + 0: }, + 0: "bf16": { + 0: "enabled": true + 0: }, + 0: "steps_per_print": 2.000000e+03, + 0: "wall_clock_breakdown": false + 0: } + 0: Time to load utils op: 0.0004286766052246094 seconds + 0: [2023-03-16 21:23:46,080] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=2 micro_batch_size=2 + 0: [2023-03-16 21:23:46,102] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=41 [0, 41) STAGE_PARAMS=2809026560 (2809.027M) TOTAL_PARAMS=2809026560 (2809.027M) UNIQUE_PARAMS=2809026560 (2809.027M) + 8: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +12: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +12: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +13: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 5: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +14: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +11: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 4: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 1: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 1: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 3: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 6: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 7: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 5: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 4: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +15: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt... + 2: [2023-03-16 21:23:46,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/mp_rank_00_model_states.pt. +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +11: [2023-03-16 21:23:46,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +14: [2023-03-16 21:23:46,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:46,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +11: [2023-03-16 21:23:46,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +11: [2023-03-16 21:23:46,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:46,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +11: [2023-03-16 21:23:46,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:46,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:46,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:46,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:46,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:46,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:46,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:46,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +10: [2023-03-16 21:23:46,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:46,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +12: [2023-03-16 21:23:46,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:46,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:46,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:46,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:46,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +15: [2023-03-16 21:23:46,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +12: [2023-03-16 21:23:46,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:46,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 8: [2023-03-16 21:23:46,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... +13: [2023-03-16 21:23:46,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:46,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 7: [2023-03-16 21:23:46,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 4: [2023-03-16 21:23:46,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:46,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:46,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:46,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:46,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:46,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +13: [2023-03-16 21:23:46,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:46,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 6: [2023-03-16 21:23:46,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:46,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:46,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:46,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:46,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 1: [2023-03-16 21:23:46,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:46,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 0: [2023-03-16 21:23:46,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:46,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:46,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:46,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:46,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:46,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:46,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:46,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:46,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:46,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 3: [2023-03-16 21:23:46,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:46,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:46,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:46,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:46,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +14: [2023-03-16 21:23:46,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:46,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:46,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:46,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:46,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:46,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:46,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:47,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:47,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:47,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:47,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:47,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:47,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:47,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:47,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:47,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +15: [2023-03-16 21:23:47,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. +10: [2023-03-16 21:23:47,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_01-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +12: [2023-03-16 21:23:47,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +12: [2023-03-16 21:23:47,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:47,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:47,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:47,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:47,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:47,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +13: [2023-03-16 21:23:47,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +11: [2023-03-16 21:23:47,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +11: [2023-03-16 21:23:47,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +11: [2023-03-16 21:23:47,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +11: [2023-03-16 21:23:47,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +11: [2023-03-16 21:23:47,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +11: [2023-03-16 21:23:47,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:47,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:47,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:47,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:47,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +15: [2023-03-16 21:23:47,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +13: [2023-03-16 21:23:47,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +11: [2023-03-16 21:23:47,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +10: [2023-03-16 21:23:47,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt... +14: [2023-03-16 21:23:47,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +10: [2023-03-16 21:23:47,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +11: [2023-03-16 21:23:47,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +14: [2023-03-16 21:23:47,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. +15: [2023-03-16 21:23:47,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_03-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:47,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +12: [2023-03-16 21:23:47,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +11: [2023-03-16 21:23:47,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +14: [2023-03-16 21:23:47,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +12: [2023-03-16 21:23:47,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:47,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:47,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:47,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:47,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:47,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +11: [2023-03-16 21:23:47,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +10: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 4: [2023-03-16 21:23:47,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:47,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +15: [2023-03-16 21:23:47,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +14: [2023-03-16 21:23:47,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +13: [2023-03-16 21:23:47,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:47,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 6: [2023-03-16 21:23:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... +13: [2023-03-16 21:23:47,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 8: [2023-03-16 21:23:47,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 6: [2023-03-16 21:23:47,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +11: [2023-03-16 21:23:47,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +11: [2023-03-16 21:23:47,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +11: [2023-03-16 21:23:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +11: [2023-03-16 21:23:47,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:47,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:47,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 0: [2023-03-16 21:23:47,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:47,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 3: [2023-03-16 21:23:47,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:47,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:47,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +10: [2023-03-16 21:23:47,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:47,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 2: [2023-03-16 21:23:47,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. +15: [2023-03-16 21:23:47,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 5: [2023-03-16 21:23:47,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 9: [2023-03-16 21:23:47,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 7: [2023-03-16 21:23:47,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_04-model_00-model_states.pt. + 1: [2023-03-16 21:23:47,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:47,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:47,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:47,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:47,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:47,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:47,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:47,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:47,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:47,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:47,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:47,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:48,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:48,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:48,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:48,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:48,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:48,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:48,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:48,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:48,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:48,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:48,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:48,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:48,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +10: [2023-03-16 21:23:48,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:48,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +14: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:48,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:48,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:48,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:48,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +15: [2023-03-16 21:23:48,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +11: [2023-03-16 21:23:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +12: [2023-03-16 21:23:48,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +12: [2023-03-16 21:23:48,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +13: [2023-03-16 21:23:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +11: [2023-03-16 21:23:48,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +11: [2023-03-16 21:23:48,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +11: [2023-03-16 21:23:48,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... +11: [2023-03-16 21:23:48,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +13: [2023-03-16 21:23:48,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +14: [2023-03-16 21:23:48,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +10: [2023-03-16 21:23:48,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +15: [2023-03-16 21:23:48,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. +11: [2023-03-16 21:23:48,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_05-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 5: [2023-03-16 21:23:48,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:48,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +12: [2023-03-16 21:23:48,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:48,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 8: [2023-03-16 21:23:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:48,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:48,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:48,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +15: [2023-03-16 21:23:48,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +10: [2023-03-16 21:23:48,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +13: [2023-03-16 21:23:48,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +14: [2023-03-16 21:23:48,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +11: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 2: [2023-03-16 21:23:48,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +11: [2023-03-16 21:23:48,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +12: [2023-03-16 21:23:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt... +10: [2023-03-16 21:23:48,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:48,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:48,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:48,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:48,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:48,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:48,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:48,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:48,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:48,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:48,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:48,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:48,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:48,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:48,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:48,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +13: [2023-03-16 21:23:48,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:48,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:48,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:48,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +12: [2023-03-16 21:23:48,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:48,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:48,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +12: [2023-03-16 21:23:48,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:48,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 7: [2023-03-16 21:23:48,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +12: [2023-03-16 21:23:48,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +12: [2023-03-16 21:23:48,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 4: [2023-03-16 21:23:48,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:48,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +15: [2023-03-16 21:23:48,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:48,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 9: [2023-03-16 21:23:48,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 3: [2023-03-16 21:23:48,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 6: [2023-03-16 21:23:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:48,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:48,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:48,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:48,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:48,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:48,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. +14: [2023-03-16 21:23:48,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:48,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:48,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:48,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 0: [2023-03-16 21:23:48,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_06-model_00-model_states.pt. + 1: [2023-03-16 21:23:48,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:48,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:48,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +11: [2023-03-16 21:23:49,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:49,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:49,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:49,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:49,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:49,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:49,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:49,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:49,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:49,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:49,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:49,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:49,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +15: [2023-03-16 21:23:49,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:49,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:49,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +10: [2023-03-16 21:23:49,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:49,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:49,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:49,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +15: [2023-03-16 21:23:49,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +12: [2023-03-16 21:23:49,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +12: [2023-03-16 21:23:49,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +13: [2023-03-16 21:23:49,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +13: [2023-03-16 21:23:49,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +12: [2023-03-16 21:23:49,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +12: [2023-03-16 21:23:49,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +12: [2023-03-16 21:23:49,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +10: [2023-03-16 21:23:49,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +14: [2023-03-16 21:23:49,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... +14: [2023-03-16 21:23:49,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +12: [2023-03-16 21:23:49,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_07-model_00-model_states.pt. +11: [2023-03-16 21:23:49,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +11: [2023-03-16 21:23:49,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +11: [2023-03-16 21:23:49,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:49,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:49,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +13: [2023-03-16 21:23:49,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:49,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +12: [2023-03-16 21:23:49,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +15: [2023-03-16 21:23:49,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +15: [2023-03-16 21:23:49,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:49,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:49,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +10: [2023-03-16 21:23:49,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +10: [2023-03-16 21:23:49,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:49,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:49,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt... +14: [2023-03-16 21:23:49,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +13: [2023-03-16 21:23:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:49,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:49,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:49,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:49,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:49,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 4: [2023-03-16 21:23:49,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +12: [2023-03-16 21:23:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. +14: [2023-03-16 21:23:49,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:49,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:49,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 3: [2023-03-16 21:23:49,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:49,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:49,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:49,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 9: [2023-03-16 21:23:49,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 6: [2023-03-16 21:23:49,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_08-model_00-model_states.pt. + 1: [2023-03-16 21:23:49,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:49,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:49,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +11: [2023-03-16 21:23:49,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:49,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:49,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:49,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:49,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:49,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:49,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:49,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:49,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:49,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:49,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:49,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +11: [2023-03-16 21:23:49,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:49,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:49,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:49,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:49,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 5: [2023-03-16 21:23:49,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:49,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:49,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:49,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:49,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 7: [2023-03-16 21:23:49,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:49,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:49,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:50,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:50,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:50,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:50,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:50,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:50,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:50,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:50,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:50,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:50,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:50,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:50,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:50,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:50,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:50,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:50,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:50,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:50,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:50,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:50,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +15: [2023-03-16 21:23:50,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:50,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:50,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:50,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:50,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:50,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:50,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:50,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +15: [2023-03-16 21:23:50,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +10: [2023-03-16 21:23:50,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:50,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:50,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:50,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:50,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:50,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:50,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +12: [2023-03-16 21:23:50,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +12: [2023-03-16 21:23:50,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +10: [2023-03-16 21:23:50,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:50,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +14: [2023-03-16 21:23:50,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +13: [2023-03-16 21:23:50,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt... +14: [2023-03-16 21:23:50,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. +13: [2023-03-16 21:23:50,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_09-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +11: [2023-03-16 21:23:50,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +15: [2023-03-16 21:23:50,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +11: [2023-03-16 21:23:50,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +15: [2023-03-16 21:23:50,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +13: [2023-03-16 21:23:50,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +12: [2023-03-16 21:23:50,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +14: [2023-03-16 21:23:50,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt... +10: [2023-03-16 21:23:50,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +12: [2023-03-16 21:23:50,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +13: [2023-03-16 21:23:50,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +10: [2023-03-16 21:23:50,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. +14: [2023-03-16 21:23:50,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_10-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +11: [2023-03-16 21:23:50,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:50,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:50,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +13: [2023-03-16 21:23:50,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +12: [2023-03-16 21:23:50,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:50,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +15: [2023-03-16 21:23:50,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +15: [2023-03-16 21:23:50,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:50,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:50,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:50,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:50,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 5: [2023-03-16 21:23:50,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 5: [2023-03-16 21:23:50,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 8: [2023-03-16 21:23:50,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 0: [2023-03-16 21:23:50,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:50,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:50,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:50,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:50,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:50,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:50,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:50,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +11: [2023-03-16 21:23:50,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:50,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:50,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:50,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 3: [2023-03-16 21:23:50,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:50,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 7: [2023-03-16 21:23:50,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:50,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:50,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:50,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:50,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:50,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:50,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +13: [2023-03-16 21:23:50,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:50,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:50,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +12: [2023-03-16 21:23:50,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:50,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +10: [2023-03-16 21:23:50,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... +14: [2023-03-16 21:23:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:50,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +10: [2023-03-16 21:23:50,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:50,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:50,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:50,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:50,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 6: [2023-03-16 21:23:50,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:50,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 4: [2023-03-16 21:23:50,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 9: [2023-03-16 21:23:50,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 1: [2023-03-16 21:23:50,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. + 2: [2023-03-16 21:23:50,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:50,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:50,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:50,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:50,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:51,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:51,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:51,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:51,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:51,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:51,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:51,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_11-model_00-model_states.pt. +14: [2023-03-16 21:23:51,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:51,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:51,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:51,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:51,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +15: [2023-03-16 21:23:51,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +15: [2023-03-16 21:23:51,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:51,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:51,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:51,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:51,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:51,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:51,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:51,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:51,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:51,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +11: [2023-03-16 21:23:51,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +11: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:51,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:51,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:51,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +10: [2023-03-16 21:23:51,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:51,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:51,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +13: [2023-03-16 21:23:51,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +14: [2023-03-16 21:23:51,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt... +12: [2023-03-16 21:23:51,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +10: [2023-03-16 21:23:51,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +12: [2023-03-16 21:23:51,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +13: [2023-03-16 21:23:51,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. +14: [2023-03-16 21:23:51,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_12-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:51,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:51,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:51,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:51,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:51,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:51,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +15: [2023-03-16 21:23:51,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:51,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +15: [2023-03-16 21:23:51,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:51,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 3: [2023-03-16 21:23:51,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:51,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:51,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 3: [2023-03-16 21:23:51,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:51,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 8: [2023-03-16 21:23:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:51,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 0: [2023-03-16 21:23:51,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 0: [2023-03-16 21:23:51,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +11: [2023-03-16 21:23:51,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:51,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:51,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:51,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:51,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +11: [2023-03-16 21:23:51,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:51,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 7: [2023-03-16 21:23:51,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 7: [2023-03-16 21:23:51,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +13: [2023-03-16 21:23:51,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:51,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:51,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:51,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +10: [2023-03-16 21:23:51,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +10: [2023-03-16 21:23:51,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:51,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +12: [2023-03-16 21:23:51,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... +14: [2023-03-16 21:23:51,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 6: [2023-03-16 21:23:51,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 2: [2023-03-16 21:23:51,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 1: [2023-03-16 21:23:51,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:51,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +13: [2023-03-16 21:23:51,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +12: [2023-03-16 21:23:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 4: [2023-03-16 21:23:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. + 9: [2023-03-16 21:23:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:51,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_13-model_00-model_states.pt. +14: [2023-03-16 21:23:51,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:51,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:51,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 5: [2023-03-16 21:23:51,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:51,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:52,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:52,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:52,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +11: [2023-03-16 21:23:52,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:52,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:52,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:52,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +12: [2023-03-16 21:23:52,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:52,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:52,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:52,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:52,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:52,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:52,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +15: [2023-03-16 21:23:52,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +15: [2023-03-16 21:23:52,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +11: [2023-03-16 21:23:52,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:52,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:52,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:52,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:52,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +14: [2023-03-16 21:23:52,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +12: [2023-03-16 21:23:52,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +10: [2023-03-16 21:23:52,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +10: [2023-03-16 21:23:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:52,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... +13: [2023-03-16 21:23:52,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +13: [2023-03-16 21:23:52,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. +14: [2023-03-16 21:23:52,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_14-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +11: [2023-03-16 21:23:52,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +11: [2023-03-16 21:23:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:52,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:52,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +13: [2023-03-16 21:23:52,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:52,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:52,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:52,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:52,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +12: [2023-03-16 21:23:52,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +15: [2023-03-16 21:23:52,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +15: [2023-03-16 21:23:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +14: [2023-03-16 21:23:52,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt... +10: [2023-03-16 21:23:52,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:52,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +10: [2023-03-16 21:23:52,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:52,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:52,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 7: [2023-03-16 21:23:52,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:52,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:52,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 1: [2023-03-16 21:23:52,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:52,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:52,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +12: [2023-03-16 21:23:52,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 6: [2023-03-16 21:23:52,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +13: [2023-03-16 21:23:52,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:52,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:52,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:52,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. +14: [2023-03-16 21:23:52,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_15-model_00-model_states.pt. + 4: [2023-03-16 21:23:52,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:52,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:52,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:52,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:52,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:52,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:52,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 5: [2023-03-16 21:23:52,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:52,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:52,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:52,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:52,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:52,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:52,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:52,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:52,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:52,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:52,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:52,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:52,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:52,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:52,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:52,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:52,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:53,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:53,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:53,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:53,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:53,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:53,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:53,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +10: [2023-03-16 21:23:53,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:53,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:53,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:53,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:53,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:53,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:53,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:53,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +13: [2023-03-16 21:23:53,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:53,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:53,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:53,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:53,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:53,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:53,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:53,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:53,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:53,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:53,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:53,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:53,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +12: [2023-03-16 21:23:53,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +15: [2023-03-16 21:23:53,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +11: [2023-03-16 21:23:53,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +11: [2023-03-16 21:23:53,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:53,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +15: [2023-03-16 21:23:53,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:53,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:53,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:53,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... +14: [2023-03-16 21:23:53,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +13: [2023-03-16 21:23:53,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +12: [2023-03-16 21:23:53,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +10: [2023-03-16 21:23:53,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_16-model_00-model_states.pt. +14: [2023-03-16 21:23:53,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +10: [2023-03-16 21:23:53,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +11: [2023-03-16 21:23:53,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +11: [2023-03-16 21:23:53,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +13: [2023-03-16 21:23:53,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +10: [2023-03-16 21:23:53,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +13: [2023-03-16 21:23:53,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +15: [2023-03-16 21:23:53,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +12: [2023-03-16 21:23:53,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +12: [2023-03-16 21:23:53,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +14: [2023-03-16 21:23:53,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt... +15: [2023-03-16 21:23:53,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. +14: [2023-03-16 21:23:53,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_17-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +10: [2023-03-16 21:23:53,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +12: [2023-03-16 21:23:53,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:53,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:53,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:53,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:53,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:53,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:53,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:53,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:53,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:53,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:53,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:53,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:53,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +15: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +13: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:53,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 3: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:53,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +13: [2023-03-16 21:23:53,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:53,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 2: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 2: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 9: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +14: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 7: [2023-03-16 21:23:53,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 6: [2023-03-16 21:23:53,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:53,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... + 1: [2023-03-16 21:23:53,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt... +11: [2023-03-16 21:23:53,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:53,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:53,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:53,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:53,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:53,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:53,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 5: [2023-03-16 21:23:53,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:54,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:54,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:54,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:54,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:54,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:54,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:54,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:54,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:54,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:54,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:54,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:54,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +11: [2023-03-16 21:23:54,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:54,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:54,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:54,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:54,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +10: [2023-03-16 21:23:54,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:54,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +12: [2023-03-16 21:23:54,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:54,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:54,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:54,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +14: [2023-03-16 21:23:54,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:54,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. +15: [2023-03-16 21:23:54,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_18-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:54,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:54,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:54,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:54,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:54,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:54,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:54,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:54,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +13: [2023-03-16 21:23:54,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +13: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +12: [2023-03-16 21:23:54,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +12: [2023-03-16 21:23:54,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +15: [2023-03-16 21:23:54,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +10: [2023-03-16 21:23:54,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +11: [2023-03-16 21:23:54,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +14: [2023-03-16 21:23:54,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... +15: [2023-03-16 21:23:54,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 3: [2023-03-16 21:23:54,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +11: [2023-03-16 21:23:54,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +14: [2023-03-16 21:23:54,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 4: [2023-03-16 21:23:54,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:54,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. +10: [2023-03-16 21:23:54,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:54,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_19-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:54,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +14: [2023-03-16 21:23:54,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:54,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 8: [2023-03-16 21:23:54,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:54,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:54,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:54,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:54,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:54,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:54,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 5: [2023-03-16 21:23:54,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:54,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:54,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +12: [2023-03-16 21:23:54,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +13: [2023-03-16 21:23:54,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:54,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:54,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:54,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:54,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:54,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:54,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:54,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 7: [2023-03-16 21:23:54,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:54,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:54,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:54,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:54,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:54,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:54,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:54,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 6: [2023-03-16 21:23:54,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:54,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:54,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:54,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:54,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:54,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:54,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:54,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:54,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 9: [2023-03-16 21:23:54,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:54,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:54,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:54,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:54,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:54,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:54,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:54,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:55,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +12: [2023-03-16 21:23:55,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:55,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:55,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:55,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:55,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +11: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +11: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +13: [2023-03-16 21:23:55,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:55,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:55,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:55,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:55,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:55,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:55,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:55,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:55,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +15: [2023-03-16 21:23:55,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:55,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +15: [2023-03-16 21:23:55,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +14: [2023-03-16 21:23:55,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:55,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:55,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:55,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:55,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. +10: [2023-03-16 21:23:55,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:55,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt... +10: [2023-03-16 21:23:55,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_20-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +11: [2023-03-16 21:23:55,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +11: [2023-03-16 21:23:55,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +13: [2023-03-16 21:23:55,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +11: [2023-03-16 21:23:55,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +11: [2023-03-16 21:23:55,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +11: [2023-03-16 21:23:55,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +12: [2023-03-16 21:23:55,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +12: [2023-03-16 21:23:55,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +14: [2023-03-16 21:23:55,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +14: [2023-03-16 21:23:55,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +13: [2023-03-16 21:23:55,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +15: [2023-03-16 21:23:55,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +15: [2023-03-16 21:23:55,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt... +10: [2023-03-16 21:23:55,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. +10: [2023-03-16 21:23:55,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_21-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +11: [2023-03-16 21:23:55,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +11: [2023-03-16 21:23:55,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +12: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +11: [2023-03-16 21:23:55,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +11: [2023-03-16 21:23:55,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +11: [2023-03-16 21:23:55,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +11: [2023-03-16 21:23:55,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +10: [2023-03-16 21:23:55,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +15: [2023-03-16 21:23:55,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +13: [2023-03-16 21:23:55,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt... +14: [2023-03-16 21:23:55,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +14: [2023-03-16 21:23:55,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:55,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:55,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:55,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:55,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:55,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +11: [2023-03-16 21:23:55,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:55,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:55,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:55,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:55,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +11: [2023-03-16 21:23:55,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +11: [2023-03-16 21:23:55,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 2: [2023-03-16 21:23:55,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:55,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 1: [2023-03-16 21:23:55,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:55,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 0: [2023-03-16 21:23:55,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:55,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 8: [2023-03-16 21:23:55,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:55,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:55,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 6: [2023-03-16 21:23:55,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:55,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +12: [2023-03-16 21:23:55,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:55,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:55,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +13: [2023-03-16 21:23:55,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:55,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 9: [2023-03-16 21:23:55,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 5: [2023-03-16 21:23:55,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +11: [2023-03-16 21:23:55,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +11: [2023-03-16 21:23:55,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +11: [2023-03-16 21:23:55,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +15: [2023-03-16 21:23:55,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 3: [2023-03-16 21:23:55,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 7: [2023-03-16 21:23:55,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. + 4: [2023-03-16 21:23:55,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_22-model_00-model_states.pt. +10: [2023-03-16 21:23:55,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:55,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:55,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:55,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:55,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:55,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:55,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:55,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:55,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:56,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:56,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:56,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:56,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:56,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:56,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:56,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +15: [2023-03-16 21:23:56,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:56,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:56,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:56,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +12: [2023-03-16 21:23:56,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:56,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:56,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +11: [2023-03-16 21:23:56,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +11: [2023-03-16 21:23:56,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +11: [2023-03-16 21:23:56,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +11: [2023-03-16 21:23:56,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:56,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:56,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:56,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +13: [2023-03-16 21:23:56,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +14: [2023-03-16 21:23:56,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt... +10: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +14: [2023-03-16 21:23:56,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +10: [2023-03-16 21:23:56,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +15: [2023-03-16 21:23:56,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +11: [2023-03-16 21:23:56,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +13: [2023-03-16 21:23:56,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. +12: [2023-03-16 21:23:56,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_23-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +10: [2023-03-16 21:23:56,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:56,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +11: [2023-03-16 21:23:56,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +10: [2023-03-16 21:23:56,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +15: [2023-03-16 21:23:56,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:56,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:56,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:56,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:56,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:56,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:56,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:56,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +15: [2023-03-16 21:23:56,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:56,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:56,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:56,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:56,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:56,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:56,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 0: [2023-03-16 21:23:56,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 7: [2023-03-16 21:23:56,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 8: [2023-03-16 21:23:56,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:56,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +12: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +12: [2023-03-16 21:23:56,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:56,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:56,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:56,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:56,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:56,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:56,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 6: [2023-03-16 21:23:56,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:56,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:56,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:56,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 2: [2023-03-16 21:23:56,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 3: [2023-03-16 21:23:56,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +13: [2023-03-16 21:23:56,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +14: [2023-03-16 21:23:56,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... +14: [2023-03-16 21:23:56,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:56,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:56,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:56,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 2: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +11: [2023-03-16 21:23:56,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 1: [2023-03-16 21:23:56,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:56,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 9: [2023-03-16 21:23:56,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 4: [2023-03-16 21:23:56,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. +13: [2023-03-16 21:23:56,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_24-model_00-model_states.pt. + 5: [2023-03-16 21:23:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:56,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:56,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:56,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:56,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +10: [2023-03-16 21:23:57,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +10: [2023-03-16 21:23:57,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +14: [2023-03-16 21:23:57,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:57,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:57,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:57,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:57,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:57,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:57,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:57,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +15: [2023-03-16 21:23:57,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +11: [2023-03-16 21:23:57,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +12: [2023-03-16 21:23:57,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:57,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... +13: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +14: [2023-03-16 21:23:57,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +15: [2023-03-16 21:23:57,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +11: [2023-03-16 21:23:57,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +12: [2023-03-16 21:23:57,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_25-model_00-model_states.pt. +13: [2023-03-16 21:23:57,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 7: [2023-03-16 21:23:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:57,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:57,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:57,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 0: [2023-03-16 21:23:57,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:57,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 0: [2023-03-16 21:23:57,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:57,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:57,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +11: [2023-03-16 21:23:57,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:57,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +10: [2023-03-16 21:23:57,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:57,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +10: [2023-03-16 21:23:57,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:57,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:57,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 6: [2023-03-16 21:23:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:57,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:57,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:57,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +13: [2023-03-16 21:23:57,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +12: [2023-03-16 21:23:57,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:57,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +14: [2023-03-16 21:23:57,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +12: [2023-03-16 21:23:57,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:57,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... +15: [2023-03-16 21:23:57,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 2: [2023-03-16 21:23:57,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:57,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 5: [2023-03-16 21:23:57,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 6: [2023-03-16 21:23:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 9: [2023-03-16 21:23:57,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:57,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:57,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:57,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:57,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:57,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +14: [2023-03-16 21:23:57,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:57,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:57,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:57,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:57,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:57,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +11: [2023-03-16 21:23:57,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:57,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:57,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:57,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:57,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:57,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:57,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:57,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:57,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +13: [2023-03-16 21:23:57,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:57,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:57,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 1: [2023-03-16 21:23:57,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:57,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 3: [2023-03-16 21:23:57,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:57,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 8: [2023-03-16 21:23:57,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:57,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. +15: [2023-03-16 21:23:57,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_26-model_00-model_states.pt. + 4: [2023-03-16 21:23:57,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:57,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:57,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:58,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:58,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:58,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:58,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:58,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:58,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:58,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +12: [2023-03-16 21:23:58,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +10: [2023-03-16 21:23:58,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +10: [2023-03-16 21:23:58,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +12: [2023-03-16 21:23:58,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +12: [2023-03-16 21:23:58,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +12: [2023-03-16 21:23:58,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:58,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:58,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:58,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +13: [2023-03-16 21:23:58,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:58,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:58,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +15: [2023-03-16 21:23:58,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +14: [2023-03-16 21:23:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... +11: [2023-03-16 21:23:58,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +13: [2023-03-16 21:23:58,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +14: [2023-03-16 21:23:58,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +11: [2023-03-16 21:23:58,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. +15: [2023-03-16 21:23:58,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_27-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 7: [2023-03-16 21:23:58,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:58,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +11: [2023-03-16 21:23:58,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +12: [2023-03-16 21:23:58,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +12: [2023-03-16 21:23:58,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:58,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +12: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +12: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +12: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +13: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +12: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 2: [2023-03-16 21:23:58,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +14: [2023-03-16 21:23:58,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:58,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:58,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:58,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 9: [2023-03-16 21:23:58,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +12: [2023-03-16 21:23:58,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +15: [2023-03-16 21:23:58,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... +10: [2023-03-16 21:23:58,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:58,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:58,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:58,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:58,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +11: [2023-03-16 21:23:58,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 0: [2023-03-16 21:23:58,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:58,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:58,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:58,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:58,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:58,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:58,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:58,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 3: [2023-03-16 21:23:58,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:58,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:58,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:58,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:58,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:58,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:58,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:58,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 8: [2023-03-16 21:23:58,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:58,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +13: [2023-03-16 21:23:58,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:58,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:58,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:58,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:58,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +12: [2023-03-16 21:23:58,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:58,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:58,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 6: [2023-03-16 21:23:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:58,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +14: [2023-03-16 21:23:58,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:58,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +10: [2023-03-16 21:23:58,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:58,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:58,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:58,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:58,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:58,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. +15: [2023-03-16 21:23:58,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 1: [2023-03-16 21:23:58,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 5: [2023-03-16 21:23:58,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_28-model_00-model_states.pt. + 4: [2023-03-16 21:23:58,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:58,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:58,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:58,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:58,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:59,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:59,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:59,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:59,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:59,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:59,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:59,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:59,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:59,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:59,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:59,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:59,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:59,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:59,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:59,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:59,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:59,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:59,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:59,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +12: [2023-03-16 21:23:59,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:59,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:59,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:59,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:59,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:59,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:59,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:59,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:59,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:59,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:59,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:59,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +14: [2023-03-16 21:23:59,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +13: [2023-03-16 21:23:59,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +10: [2023-03-16 21:23:59,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +10: [2023-03-16 21:23:59,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +11: [2023-03-16 21:23:59,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +12: [2023-03-16 21:23:59,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +11: [2023-03-16 21:23:59,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +14: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +13: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:59,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:59,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:59,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:59,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:59,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... +15: [2023-03-16 21:23:59,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_29-model_00-model_states.pt. +15: [2023-03-16 21:23:59,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +12: [2023-03-16 21:23:59,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:23:59,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +12: [2023-03-16 21:23:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:23:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:23:59,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:23:59,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:23:59,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:23:59,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +10: [2023-03-16 21:23:59,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 0: [2023-03-16 21:23:59,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:23:59,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 7: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 7: [2023-03-16 21:23:59,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:23:59,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:23:59,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +13: [2023-03-16 21:23:59,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +13: [2023-03-16 21:23:59,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +11: [2023-03-16 21:23:59,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +14: [2023-03-16 21:23:59,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 9: [2023-03-16 21:23:59,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:23:59,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:23:59,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:23:59,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 2: [2023-03-16 21:23:59,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:23:59,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +10: [2023-03-16 21:23:59,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 8: [2023-03-16 21:23:59,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:23:59,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 6: [2023-03-16 21:23:59,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 3: [2023-03-16 21:23:59,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:23:59,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:23:59,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:23:59,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:23:59,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:23:59,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:23:59,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +14: [2023-03-16 21:23:59,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +11: [2023-03-16 21:23:59,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 5: [2023-03-16 21:23:59,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:23:59,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:23:59,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:23:59,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:23:59,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 1: [2023-03-16 21:23:59,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:23:59,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:23:59,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. + 4: [2023-03-16 21:23:59,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... +15: [2023-03-16 21:23:59,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:23:59,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_30-model_00-model_states.pt. +15: [2023-03-16 21:23:59,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:23:59,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:24:00,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:24:00,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:24:00,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:24:00,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +10: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +12: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +11: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +12: [2023-03-16 21:24:00,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:24:00,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +13: [2023-03-16 21:24:00,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:24:00,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:24:00,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:24:00,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:24:00,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:24:00,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:24:00,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +13: [2023-03-16 21:24:00,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:24:00,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +14: [2023-03-16 21:24:00,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:24:00,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:24:00,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:24:00,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... +15: [2023-03-16 21:24:00,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +14: [2023-03-16 21:24:00,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +11: [2023-03-16 21:24:00,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +10: [2023-03-16 21:24:00,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. +15: [2023-03-16 21:24:00,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_31-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +12: [2023-03-16 21:24:00,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +13: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +13: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +14: [2023-03-16 21:24:00,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +10: [2023-03-16 21:24:00,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +10: [2023-03-16 21:24:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 1: [2023-03-16 21:24:00,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +12: [2023-03-16 21:24:00,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:00,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:00,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:00,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:00,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:00,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:00,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:00,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:00,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:00,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +11: [2023-03-16 21:24:00,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:00,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... +15: [2023-03-16 21:24:00,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 3: [2023-03-16 21:24:00,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +14: [2023-03-16 21:24:00,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 4: [2023-03-16 21:24:00,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 7: [2023-03-16 21:24:00,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +11: [2023-03-16 21:24:00,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:00,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:00,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:00,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_32-model_00-model_states.pt. +15: [2023-03-16 21:24:00,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:00,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:00,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:00,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:00,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:00,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:00,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:00,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +10: [2023-03-16 21:24:00,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 0: [2023-03-16 21:24:00,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:00,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:00,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:00,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:00,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:00,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:00,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:00,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:00,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:00,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:00,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:00,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:00,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:00,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:01,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +11: [2023-03-16 21:24:01,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:01,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:01,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:01,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:01,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:01,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:01,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:01,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:01,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:01,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +13: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:01,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:01,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:01,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:01,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:01,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:01,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:01,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:01,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:01,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +14: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +11: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +11: [2023-03-16 21:24:01,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:01,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:01,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:01,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:01,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +15: [2023-03-16 21:24:01,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt... +12: [2023-03-16 21:24:01,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +12: [2023-03-16 21:24:01,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:01,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +10: [2023-03-16 21:24:01,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +14: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +15: [2023-03-16 21:24:01,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_33-model_00-model_states.pt. +13: [2023-03-16 21:24:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +12: [2023-03-16 21:24:01,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +12: [2023-03-16 21:24:01,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +11: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +11: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +11: [2023-03-16 21:24:01,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +11: [2023-03-16 21:24:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +11: [2023-03-16 21:24:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +11: [2023-03-16 21:24:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +11: [2023-03-16 21:24:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +14: [2023-03-16 21:24:01,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +14: [2023-03-16 21:24:01,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +11: [2023-03-16 21:24:01,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +11: [2023-03-16 21:24:01,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +10: [2023-03-16 21:24:01,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +15: [2023-03-16 21:24:01,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... +13: [2023-03-16 21:24:01,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +10: [2023-03-16 21:24:01,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +15: [2023-03-16 21:24:01,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. +13: [2023-03-16 21:24:01,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_34-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:01,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:01,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +12: [2023-03-16 21:24:01,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:01,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:01,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +12: [2023-03-16 21:24:01,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:01,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:01,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:01,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +14: [2023-03-16 21:24:01,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:01,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +14: [2023-03-16 21:24:01,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 0: [2023-03-16 21:24:01,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:01,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:01,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:01,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:01,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:01,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:01,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:01,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:01,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:01,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:01,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:01,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:01,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:01,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:01,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:01,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 8: [2023-03-16 21:24:01,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:01,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:01,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:01,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:01,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:01,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:01,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +13: [2023-03-16 21:24:01,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 6: [2023-03-16 21:24:01,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:01,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 1: [2023-03-16 21:24:01,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 4: [2023-03-16 21:24:01,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:01,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 3: [2023-03-16 21:24:01,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:01,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:01,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +15: [2023-03-16 21:24:01,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:01,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:01,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:01,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:01,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:01,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:01,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +10: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 5: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:01,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:01,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:01,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:01,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:01,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:01,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:01,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... +11: [2023-03-16 21:24:01,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 7: [2023-03-16 21:24:01,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt... + 9: [2023-03-16 21:24:01,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:02,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:02,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:02,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:02,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:02,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:02,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:02,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:02,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:02,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:02,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:02,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:02,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +15: [2023-03-16 21:24:02,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:02,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:02,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:02,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +10: [2023-03-16 21:24:02,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:02,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:02,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +13: [2023-03-16 21:24:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:02,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:02,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:02,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. +11: [2023-03-16 21:24:02,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_35-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +14: [2023-03-16 21:24:02,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +14: [2023-03-16 21:24:02,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +12: [2023-03-16 21:24:02,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: > overriding learning rate value to 0.0002 + 0: > overriding minimum learning rate value to 2e-05 + 0: > overriding warmup iterations value to 0 + 0: > overriding total number of iterations value to 1 + 0: > overriding decay style value to cosine + 9: [2023-03-16 21:24:02,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +12: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +12: [2023-03-16 21:24:02,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +14: [2023-03-16 21:24:02,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 0: [2023-03-16 21:24:02,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +12: [2023-03-16 21:24:02,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... +12: [2023-03-16 21:24:02,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... +12: [2023-03-16 21:24:02,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... +12: [2023-03-16 21:24:02,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... +12: [2023-03-16 21:24:02,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... +12: [2023-03-16 21:24:02,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... +12: [2023-03-16 21:24:02,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... +12: [2023-03-16 21:24:02,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... + 5: [2023-03-16 21:24:02,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +14: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... +14: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... +14: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... +14: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... +14: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... +14: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... +14: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... +14: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... + 5: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 0: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... + 0: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... + 0: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... + 0: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... + 0: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... + 0: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... + 0: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... + 0: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... + 9: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... + 9: [2023-03-16 21:24:02,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... + 4: [2023-03-16 21:24:02,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... + 2: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +10: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... +10: [2023-03-16 21:24:02,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 9: [2023-03-16 21:24:02,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... + 9: [2023-03-16 21:24:02,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 9: [2023-03-16 21:24:02,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... + 2: [2023-03-16 21:24:02,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 2: [2023-03-16 21:24:02,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 6: [2023-03-16 21:24:02,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 5: [2023-03-16 21:24:02,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +13: [2023-03-16 21:24:02,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 2: [2023-03-16 21:24:02,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... + 2: [2023-03-16 21:24:02,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... + 2: [2023-03-16 21:24:02,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... + 2: [2023-03-16 21:24:02,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... + 2: [2023-03-16 21:24:02,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... + 2: [2023-03-16 21:24:02,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... + 2: [2023-03-16 21:24:02,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... + 2: [2023-03-16 21:24:02,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +11: [2023-03-16 21:24:02,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... +15: [2023-03-16 21:24:02,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 6: [2023-03-16 21:24:02,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... + 6: [2023-03-16 21:24:02,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... + 6: [2023-03-16 21:24:02,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... + 6: [2023-03-16 21:24:02,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... + 6: [2023-03-16 21:24:02,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... + 6: [2023-03-16 21:24:02,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... + 6: [2023-03-16 21:24:02,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... + 6: [2023-03-16 21:24:02,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... + 7: [2023-03-16 21:24:02,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 5: [2023-03-16 21:24:02,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... + 5: [2023-03-16 21:24:02,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... + 5: [2023-03-16 21:24:02,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... + 5: [2023-03-16 21:24:02,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... + 5: [2023-03-16 21:24:02,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... + 5: [2023-03-16 21:24:02,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... + 5: [2023-03-16 21:24:02,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... + 5: [2023-03-16 21:24:02,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +10: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +13: [2023-03-16 21:24:02,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 4: [2023-03-16 21:24:02,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +11: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. +15: [2023-03-16 21:24:02,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_36-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +11: [2023-03-16 21:24:02,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 3: [2023-03-16 21:24:02,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 3: [2023-03-16 21:24:02,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +10: [2023-03-16 21:24:02,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +10: [2023-03-16 21:24:02,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 7: [2023-03-16 21:24:02,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 7: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... + 7: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... + 7: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... + 7: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... + 7: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... + 7: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... + 7: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... + 7: [2023-03-16 21:24:02,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... + 1: [2023-03-16 21:24:02,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +13: [2023-03-16 21:24:02,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +13: [2023-03-16 21:24:02,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... + 1: [2023-03-16 21:24:02,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 1: [2023-03-16 21:24:02,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 8: [2023-03-16 21:24:02,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... + 8: [2023-03-16 21:24:02,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... + 8: [2023-03-16 21:24:02,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... + 8: [2023-03-16 21:24:02,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... + 8: [2023-03-16 21:24:02,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... + 8: [2023-03-16 21:24:02,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... + 8: [2023-03-16 21:24:02,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... + 8: [2023-03-16 21:24:02,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +11: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... +11: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... +11: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... +11: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... +11: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... +11: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... +11: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... +11: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. +15: [2023-03-16 21:24:02,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt... +15: [2023-03-16 21:24:02,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/layer_38-model_00-model_states.pt. + 4: [2023-03-16 21:24:02,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... + 4: [2023-03-16 21:24:02,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... + 4: [2023-03-16 21:24:02,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... + 4: [2023-03-16 21:24:02,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... + 4: [2023-03-16 21:24:02,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... + 4: [2023-03-16 21:24:02,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... + 4: [2023-03-16 21:24:02,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... + 4: [2023-03-16 21:24:02,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... +10: [2023-03-16 21:24:02,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... +10: [2023-03-16 21:24:02,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... +10: [2023-03-16 21:24:02,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... +10: [2023-03-16 21:24:02,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... +10: [2023-03-16 21:24:02,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... +10: [2023-03-16 21:24:02,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... +10: [2023-03-16 21:24:02,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... +10: [2023-03-16 21:24:02,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... + 3: [2023-03-16 21:24:02,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... + 1: [2023-03-16 21:24:02,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... + 1: [2023-03-16 21:24:02,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... + 1: [2023-03-16 21:24:02,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... + 1: [2023-03-16 21:24:02,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... + 1: [2023-03-16 21:24:02,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... + 1: [2023-03-16 21:24:02,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... + 1: [2023-03-16 21:24:02,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... + 1: [2023-03-16 21:24:02,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... +13: [2023-03-16 21:24:02,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... +13: [2023-03-16 21:24:02,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... +13: [2023-03-16 21:24:02,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... +13: [2023-03-16 21:24:02,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... +13: [2023-03-16 21:24:02,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... +13: [2023-03-16 21:24:02,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... +13: [2023-03-16 21:24:02,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... +13: [2023-03-16 21:24:02,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... +15: [2023-03-16 21:24:02,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... + 9: [2023-03-16 21:24:03,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. + 9: [2023-03-16 21:24:03,283] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 75 + 9: [2023-03-16 21:24:03,292] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 75 +12: [2023-03-16 21:24:03,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. +12: [2023-03-16 21:24:03,319] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 101 +12: [2023-03-16 21:24:03,328] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 101 +14: [2023-03-16 21:24:03,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. +14: [2023-03-16 21:24:03,359] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 117 +14: [2023-03-16 21:24:03,370] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 117 + 9: [2023-03-16 21:24:03,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. + 9: [2023-03-16 21:24:03,368] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 74 + 9: [2023-03-16 21:24:03,377] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 74 + 0: [2023-03-16 21:24:03,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. + 0: [2023-03-16 21:24:03,379] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 6 + 0: [2023-03-16 21:24:03,388] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 6 + 0: [2023-03-16 21:24:03,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. + 0: [2023-03-16 21:24:03,423] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 4 + 5: [2023-03-16 21:24:03,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. + 5: [2023-03-16 21:24:03,427] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 40 + 5: [2023-03-16 21:24:03,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. + 5: [2023-03-16 21:24:03,427] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 45 + 9: [2023-03-16 21:24:03,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. + 9: [2023-03-16 21:24:03,428] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 79 +14: [2023-03-16 21:24:03,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. +14: [2023-03-16 21:24:03,430] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 119 + 0: [2023-03-16 21:24:03,431] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 4 +12: [2023-03-16 21:24:03,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. +12: [2023-03-16 21:24:03,435] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 102 + 5: [2023-03-16 21:24:03,437] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 40 + 5: [2023-03-16 21:24:03,437] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 45 +14: [2023-03-16 21:24:03,437] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 119 + 9: [2023-03-16 21:24:03,438] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 79 +14: [2023-03-16 21:24:03,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. +14: [2023-03-16 21:24:03,440] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 118 +14: [2023-03-16 21:24:03,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. +14: [2023-03-16 21:24:03,441] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 113 +12: [2023-03-16 21:24:03,444] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 102 + 2: [2023-03-16 21:24:03,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. + 2: [2023-03-16 21:24:03,444] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 20 +14: [2023-03-16 21:24:03,448] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 118 +14: [2023-03-16 21:24:03,450] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 113 + 2: [2023-03-16 21:24:03,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. + 2: [2023-03-16 21:24:03,450] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 21 + 0: [2023-03-16 21:24:03,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. + 0: [2023-03-16 21:24:03,456] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 5 + 0: [2023-03-16 21:24:03,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. + 2: [2023-03-16 21:24:03,456] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 20 + 0: [2023-03-16 21:24:03,457] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 3 + 2: [2023-03-16 21:24:03,460] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 21 +12: [2023-03-16 21:24:03,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. +12: [2023-03-16 21:24:03,461] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 99 +12: [2023-03-16 21:24:03,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. + 3: [2023-03-16 21:24:03,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. +12: [2023-03-16 21:24:03,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. +12: [2023-03-16 21:24:03,462] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 96 +12: [2023-03-16 21:24:03,462] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 100 + 3: [2023-03-16 21:24:03,462] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 27 +10: [2023-03-16 21:24:03,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. +10: [2023-03-16 21:24:03,466] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 82 + 0: [2023-03-16 21:24:03,466] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 3 + 0: [2023-03-16 21:24:03,466] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 5 + 0: [2023-03-16 21:24:03,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. + 0: [2023-03-16 21:24:03,468] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 7 + 3: [2023-03-16 21:24:03,471] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 27 +12: [2023-03-16 21:24:03,472] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 100 +14: [2023-03-16 21:24:03,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. +14: [2023-03-16 21:24:03,473] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 116 +12: [2023-03-16 21:24:03,474] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 99 +12: [2023-03-16 21:24:03,474] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 96 +10: [2023-03-16 21:24:03,475] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 82 + 0: [2023-03-16 21:24:03,476] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 7 + 7: [2023-03-16 21:24:03,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. + 7: [2023-03-16 21:24:03,480] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 56 + 6: [2023-03-16 21:24:03,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. +14: [2023-03-16 21:24:03,482] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 116 + 6: [2023-03-16 21:24:03,482] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 49 + 4: [2023-03-16 21:24:03,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. + 4: [2023-03-16 21:24:03,487] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 39 +14: [2023-03-16 21:24:03,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. +14: [2023-03-16 21:24:03,488] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 112 + 7: [2023-03-16 21:24:03,489] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 56 +14: [2023-03-16 21:24:03,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. +14: [2023-03-16 21:24:03,490] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 114 + 5: [2023-03-16 21:24:03,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. + 5: [2023-03-16 21:24:03,491] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 43 + 6: [2023-03-16 21:24:03,491] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 49 +14: [2023-03-16 21:24:03,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. +14: [2023-03-16 21:24:03,495] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 115 + 4: [2023-03-16 21:24:03,496] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 39 +14: [2023-03-16 21:24:03,498] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 112 + 5: [2023-03-16 21:24:03,498] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 43 +14: [2023-03-16 21:24:03,499] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 114 + 9: [2023-03-16 21:24:03,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. + 9: [2023-03-16 21:24:03,502] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 78 +14: [2023-03-16 21:24:03,505] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 115 + 6: [2023-03-16 21:24:03,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. + 6: [2023-03-16 21:24:03,508] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 53 + 6: [2023-03-16 21:24:03,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. + 2: [2023-03-16 21:24:03,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. + 6: [2023-03-16 21:24:03,509] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 51 + 2: [2023-03-16 21:24:03,509] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 18 + 7: [2023-03-16 21:24:03,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. + 5: [2023-03-16 21:24:03,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. + 7: [2023-03-16 21:24:03,510] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 63 + 5: [2023-03-16 21:24:03,510] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 47 + 2: [2023-03-16 21:24:03,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. + 2: [2023-03-16 21:24:03,513] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 22 +12: [2023-03-16 21:24:03,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. +12: [2023-03-16 21:24:03,514] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 98 + 9: [2023-03-16 21:24:03,514] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 78 + 0: [2023-03-16 21:24:03,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. + 0: [2023-03-16 21:24:03,515] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 1 +11: [2023-03-16 21:24:03,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. +11: [2023-03-16 21:24:03,516] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 94 + 6: [2023-03-16 21:24:03,516] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 53 + 2: [2023-03-16 21:24:03,517] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 18 + 7: [2023-03-16 21:24:03,518] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 63 + 5: [2023-03-16 21:24:03,518] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 47 + 2: [2023-03-16 21:24:03,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. + 6: [2023-03-16 21:24:03,518] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 51 + 2: [2023-03-16 21:24:03,519] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 23 + 9: [2023-03-16 21:24:03,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. + 9: [2023-03-16 21:24:03,518] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 73 + 6: [2023-03-16 21:24:03,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. + 6: [2023-03-16 21:24:03,519] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 52 + 7: [2023-03-16 21:24:03,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. + 7: [2023-03-16 21:24:03,520] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 57 + 2: [2023-03-16 21:24:03,523] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 22 +11: [2023-03-16 21:24:03,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. +11: [2023-03-16 21:24:03,524] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 88 +11: [2023-03-16 21:24:03,525] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 94 + 0: [2023-03-16 21:24:03,525] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 1 + 3: [2023-03-16 21:24:03,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. + 3: [2023-03-16 21:24:03,526] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 30 + 5: [2023-03-16 21:24:03,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. + 5: [2023-03-16 21:24:03,526] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 41 + 2: [2023-03-16 21:24:03,527] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 23 + 9: [2023-03-16 21:24:03,528] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 73 + 7: [2023-03-16 21:24:03,530] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 57 +12: [2023-03-16 21:24:03,530] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 98 + 0: [2023-03-16 21:24:03,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. + 0: [2023-03-16 21:24:03,533] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 2 + 6: [2023-03-16 21:24:03,535] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 52 + 3: [2023-03-16 21:24:03,535] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 30 + 9: [2023-03-16 21:24:03,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. + 5: [2023-03-16 21:24:03,535] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 41 + 9: [2023-03-16 21:24:03,535] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 72 +11: [2023-03-16 21:24:03,535] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 88 + 9: [2023-03-16 21:24:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. + 6: [2023-03-16 21:24:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. + 8: [2023-03-16 21:24:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. + 9: [2023-03-16 21:24:03,536] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 77 + 8: [2023-03-16 21:24:03,536] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 66 + 6: [2023-03-16 21:24:03,536] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 48 +13: [2023-03-16 21:24:03,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. +13: [2023-03-16 21:24:03,537] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 104 +11: [2023-03-16 21:24:03,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. +11: [2023-03-16 21:24:03,537] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 92 + 0: [2023-03-16 21:24:03,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. + 0: [2023-03-16 21:24:03,539] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 0 +12: [2023-03-16 21:24:03,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. +12: [2023-03-16 21:24:03,541] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 103 + 0: [2023-03-16 21:24:03,542] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 2 + 6: [2023-03-16 21:24:03,544] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 48 + 8: [2023-03-16 21:24:03,545] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 66 + 1: [2023-03-16 21:24:03,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. + 9: [2023-03-16 21:24:03,546] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 72 +13: [2023-03-16 21:24:03,546] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 104 + 1: [2023-03-16 21:24:03,546] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 9 + 9: [2023-03-16 21:24:03,547] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 77 + 0: [2023-03-16 21:24:03,548] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 0 +12: [2023-03-16 21:24:03,550] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 103 +11: [2023-03-16 21:24:03,551] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 92 + 3: [2023-03-16 21:24:03,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. + 3: [2023-03-16 21:24:03,553] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 29 + 0: could not find arguments in the checkpoint ... + 0: checkpoint version 3.0 + 1: [2023-03-16 21:24:03,554] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 9 + 4: [2023-03-16 21:24:03,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. + 9: [2023-03-16 21:24:03,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. + 9: [2023-03-16 21:24:03,559] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 76 + 4: [2023-03-16 21:24:03,559] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 37 + 3: [2023-03-16 21:24:03,561] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 29 +11: [2023-03-16 21:24:03,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. +11: [2023-03-16 21:24:03,564] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 95 + 9: [2023-03-16 21:24:03,566] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 76 + 5: [2023-03-16 21:24:03,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. + 5: [2023-03-16 21:24:03,567] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 46 + 4: [2023-03-16 21:24:03,567] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 37 + 8: [2023-03-16 21:24:03,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. + 8: [2023-03-16 21:24:03,573] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 70 +11: [2023-03-16 21:24:03,573] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 95 + 5: [2023-03-16 21:24:03,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. + 5: [2023-03-16 21:24:03,573] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 42 + 6: [2023-03-16 21:24:03,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. + 6: [2023-03-16 21:24:03,575] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 54 + 5: [2023-03-16 21:24:03,576] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 46 +12: [2023-03-16 21:24:03,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. +12: [2023-03-16 21:24:03,577] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 97 + 8: [2023-03-16 21:24:03,582] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 70 + 2: [2023-03-16 21:24:03,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. + 2: [2023-03-16 21:24:03,582] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 17 + 5: [2023-03-16 21:24:03,583] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 42 + 6: [2023-03-16 21:24:03,584] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 54 +12: [2023-03-16 21:24:03,587] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 97 + 2: [2023-03-16 21:24:03,591] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 17 + 1: [2023-03-16 21:24:03,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. + 1: [2023-03-16 21:24:03,592] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 14 + 2: [2023-03-16 21:24:03,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. + 2: [2023-03-16 21:24:03,592] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 16 +10: [2023-03-16 21:24:03,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. +10: [2023-03-16 21:24:03,598] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 84 +10: [2023-03-16 21:24:03,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. + 1: [2023-03-16 21:24:03,600] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 14 +10: [2023-03-16 21:24:03,601] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 86 + 7: [2023-03-16 21:24:03,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. + 7: [2023-03-16 21:24:03,601] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 59 + 2: [2023-03-16 21:24:03,602] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 16 + 3: [2023-03-16 21:24:03,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. + 3: [2023-03-16 21:24:03,604] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 26 +10: [2023-03-16 21:24:03,606] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 84 + 6: [2023-03-16 21:24:03,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. + 6: [2023-03-16 21:24:03,606] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 55 + 3: [2023-03-16 21:24:03,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. + 3: [2023-03-16 21:24:03,608] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 25 + 4: [2023-03-16 21:24:03,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. + 4: [2023-03-16 21:24:03,610] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 34 + 7: [2023-03-16 21:24:03,610] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 59 +10: [2023-03-16 21:24:03,613] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 86 + 3: [2023-03-16 21:24:03,614] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 26 + 6: [2023-03-16 21:24:03,615] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 55 + 1: [2023-03-16 21:24:03,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. +10: [2023-03-16 21:24:03,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. +10: [2023-03-16 21:24:03,616] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 81 + 1: [2023-03-16 21:24:03,616] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 12 + 4: [2023-03-16 21:24:03,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. + 3: [2023-03-16 21:24:03,617] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 25 + 4: [2023-03-16 21:24:03,617] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 36 + 4: [2023-03-16 21:24:03,618] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 34 +11: [2023-03-16 21:24:03,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. +11: [2023-03-16 21:24:03,620] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 91 + 8: [2023-03-16 21:24:03,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. + 8: [2023-03-16 21:24:03,623] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 71 + 1: [2023-03-16 21:24:03,624] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 12 +10: [2023-03-16 21:24:03,625] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 81 + 8: [2023-03-16 21:24:03,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. + 8: [2023-03-16 21:24:03,625] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 69 + 4: [2023-03-16 21:24:03,626] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 36 + 7: [2023-03-16 21:24:03,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. + 7: [2023-03-16 21:24:03,629] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 60 +11: [2023-03-16 21:24:03,629] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 91 +13: [2023-03-16 21:24:03,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. + 8: [2023-03-16 21:24:03,632] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 71 +13: [2023-03-16 21:24:03,632] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 107 +13: [2023-03-16 21:24:03,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. +13: [2023-03-16 21:24:03,633] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 105 + 8: [2023-03-16 21:24:03,634] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 69 +10: [2023-03-16 21:24:03,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. +10: [2023-03-16 21:24:03,634] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 80 + 7: [2023-03-16 21:24:03,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. + 7: [2023-03-16 21:24:03,637] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 58 +13: [2023-03-16 21:24:03,641] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 105 +10: [2023-03-16 21:24:03,643] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 80 +13: [2023-03-16 21:24:03,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. +13: [2023-03-16 21:24:03,644] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 106 + 3: [2023-03-16 21:24:03,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. + 3: [2023-03-16 21:24:03,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. + 3: [2023-03-16 21:24:03,645] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 31 + 3: [2023-03-16 21:24:03,645] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 24 + 7: [2023-03-16 21:24:03,645] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 60 +13: [2023-03-16 21:24:03,646] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 107 + 7: [2023-03-16 21:24:03,647] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 58 + 4: [2023-03-16 21:24:03,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. + 7: [2023-03-16 21:24:03,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. + 4: [2023-03-16 21:24:03,648] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 33 + 7: [2023-03-16 21:24:03,648] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 61 + 8: [2023-03-16 21:24:03,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. + 8: [2023-03-16 21:24:03,651] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 68 +13: [2023-03-16 21:24:03,652] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 106 + 3: [2023-03-16 21:24:03,655] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 24 + 3: [2023-03-16 21:24:03,655] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 31 +11: [2023-03-16 21:24:03,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. +11: [2023-03-16 21:24:03,657] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 93 + 7: [2023-03-16 21:24:03,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. + 7: [2023-03-16 21:24:03,657] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 62 + 7: [2023-03-16 21:24:03,659] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 61 + 1: [2023-03-16 21:24:03,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. + 8: [2023-03-16 21:24:03,660] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 68 + 1: [2023-03-16 21:24:03,661] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 11 +15: [2023-03-16 21:24:03,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. +15: [2023-03-16 21:24:03,662] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 122 + 4: [2023-03-16 21:24:03,663] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 33 +11: [2023-03-16 21:24:03,665] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 93 +15: [2023-03-16 21:24:03,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. +15: [2023-03-16 21:24:03,666] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 124 + 7: [2023-03-16 21:24:03,667] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 62 +10: [2023-03-16 21:24:03,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. +10: [2023-03-16 21:24:03,669] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 83 +15: [2023-03-16 21:24:03,671] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 122 +15: [2023-03-16 21:24:03,675] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 124 +13: [2023-03-16 21:24:03,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. +13: [2023-03-16 21:24:03,675] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 111 +10: [2023-03-16 21:24:03,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. +10: [2023-03-16 21:24:03,677] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 87 + 1: [2023-03-16 21:24:03,678] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 11 +10: [2023-03-16 21:24:03,678] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 83 +13: [2023-03-16 21:24:03,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. +13: [2023-03-16 21:24:03,682] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 110 + 1: [2023-03-16 21:24:03,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. +13: [2023-03-16 21:24:03,683] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 111 + 1: [2023-03-16 21:24:03,683] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 15 +10: [2023-03-16 21:24:03,686] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 87 +11: [2023-03-16 21:24:03,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. +11: [2023-03-16 21:24:03,689] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 90 + 8: [2023-03-16 21:24:03,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. +11: [2023-03-16 21:24:03,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. +11: [2023-03-16 21:24:03,690] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 89 + 8: [2023-03-16 21:24:03,690] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 65 +15: [2023-03-16 21:24:03,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. +15: [2023-03-16 21:24:03,690] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 120 + 8: [2023-03-16 21:24:03,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. + 8: [2023-03-16 21:24:03,691] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 67 + 1: [2023-03-16 21:24:03,692] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 15 +13: [2023-03-16 21:24:03,692] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 110 +15: [2023-03-16 21:24:03,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. +15: [2023-03-16 21:24:03,699] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 123 +15: [2023-03-16 21:24:03,699] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 120 +11: [2023-03-16 21:24:03,699] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 90 + 8: [2023-03-16 21:24:03,699] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 65 +11: [2023-03-16 21:24:03,702] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 89 + 8: [2023-03-16 21:24:03,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. + 8: [2023-03-16 21:24:03,703] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 67 + 8: [2023-03-16 21:24:03,704] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 64 + 4: [2023-03-16 21:24:03,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. + 4: [2023-03-16 21:24:03,705] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 35 +15: [2023-03-16 21:24:03,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. +15: [2023-03-16 21:24:03,706] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 125 +13: [2023-03-16 21:24:03,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. +13: [2023-03-16 21:24:03,707] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 109 +15: [2023-03-16 21:24:03,712] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 123 + 8: [2023-03-16 21:24:03,712] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 64 + 4: [2023-03-16 21:24:03,716] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 35 +13: [2023-03-16 21:24:03,717] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 109 +15: [2023-03-16 21:24:03,717] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 125 + 1: [2023-03-16 21:24:03,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. + 1: [2023-03-16 21:24:03,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. + 1: [2023-03-16 21:24:03,717] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 10 + 1: [2023-03-16 21:24:03,717] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 8 + 1: [2023-03-16 21:24:03,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. + 1: [2023-03-16 21:24:03,719] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 13 + 1: [2023-03-16 21:24:03,726] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 10 + 1: [2023-03-16 21:24:03,728] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 8 + 4: [2023-03-16 21:24:03,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. + 4: [2023-03-16 21:24:03,731] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 38 +15: [2023-03-16 21:24:03,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. +15: [2023-03-16 21:24:03,732] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 126 + 1: [2023-03-16 21:24:03,734] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 13 +10: [2023-03-16 21:24:03,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. +10: [2023-03-16 21:24:03,738] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 85 + 4: [2023-03-16 21:24:03,741] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 38 +15: [2023-03-16 21:24:03,741] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 126 +13: [2023-03-16 21:24:03,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. +13: [2023-03-16 21:24:03,743] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 108 + 4: [2023-03-16 21:24:03,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. + 4: [2023-03-16 21:24:03,747] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 32 +10: [2023-03-16 21:24:03,748] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 85 +13: [2023-03-16 21:24:03,753] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 108 + 4: [2023-03-16 21:24:03,757] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 32 + 2: [2023-03-16 21:24:03,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. + 2: [2023-03-16 21:24:03,777] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 19 + 2: [2023-03-16 21:24:03,788] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 19 +15: [2023-03-16 21:24:03,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. +15: [2023-03-16 21:24:03,789] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 127 +15: [2023-03-16 21:24:03,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. +15: [2023-03-16 21:24:03,789] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 121 + 6: [2023-03-16 21:24:03,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. + 6: [2023-03-16 21:24:03,796] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 50 +15: [2023-03-16 21:24:03,798] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 127 +15: [2023-03-16 21:24:03,802] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 121 + 6: [2023-03-16 21:24:03,805] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 50 + 5: [2023-03-16 21:24:03,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. + 5: [2023-03-16 21:24:03,863] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 44 + 5: [2023-03-16 21:24:03,875] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 44 + 3: [2023-03-16 21:24:03,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_2b8100m100m/global_step95/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. + 3: [2023-03-16 21:24:03,899] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 128 ZeRO state_dicts for rank 28 + 3: [2023-03-16 21:24:03,912] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 128 zero partition checkpoints for rank 28 + 0: successfully loaded checkpoint from checkpoints_2b8100m100m at iteration 0 +15: time (ms) | load-checkpoint: 17826.37 + 0: estimated model parameters: 2.80902656 + 0: estimated model parameters without embeddings: 2.67500544 + 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 21:24:04 + 0: > building train, validation, and test datasets ... + 0: > datasets target sizes (minimum size): + 0: train: 1 + 0: validation: 51200 + 0: test: 51200 + 0: > building train, validation, and test datasets for GPT ... + 0: > building dataset index ... + 0: reading sizes... + 0: reading pointers... + 0: reading document index... + 0: creating numpy buffer of mmap... + 0: creating memory view of numpy buffer... + 0: > finished creating indexed dataset in 0.027813 seconds + 0: number of documents: 208931 + 0: > dataset split: + 0: train: + 0: document indices in [0, 208931) total of 208931 documents + 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_doc_idx.npy + 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_sample_idx.npy + 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_1ns_2048sl_1234s_shuffle_idx.npy + 0: loaded indexed file in 0.074 seconds + 0: total number of samples: 48805 + 0: total number of epochs: 1 + 0: > building dataset index ... + 0: reading sizes... + 0: reading pointers... + 0: reading document index... + 0: creating numpy buffer of mmap... + 0: creating memory view of numpy buffer... + 0: > finished creating indexed dataset in 0.062354 seconds + 0: number of documents: 364608 + 0: > dataset split: + 0: validation: + 0: document indices in [0, 364608) total of 364608 documents + 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_51200ns_2048sl_1234s_doc_idx.npy + 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_51200ns_2048sl_1234s_sample_idx.npy + 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy + 0: loaded indexed file in 0.075 seconds + 0: total number of samples: 84978 + 0: total number of epochs: 1 + 0: > finished creating GPT datasets ... + 0: [after dataloaders are built] datetime: 2023-03-16 21:24:19 + 0: done with setup ... + 0: training ... +15: time (ms) | model-and-optimizer-setup: 41133.06 | train/valid/test-data-iterators-setup: 13588.50 + 0: [after training is done] datetime: 2023-03-16 21:24:19 +15: ----------------------------------------------------------------------------------------------------------------- +15: validation loss at the end of training for val data | lm loss value: 7.518863E+00 | lm loss PPL: 1.842472E+03 | +15: ----------------------------------------------------------------------------------------------------------------- +END 3325710: Thu 16 Mar 2023 09:25:50 PM EET diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..665806e703c0ee525a630b4109af7deefe65f212 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9152ad3a76f36a230e0493cafa6b9d34e39da793e2cfdd42db85700775309313 +size 263350743 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..32d92ca83c0eac4bc2e1160cdafe434082af611d --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed62d895a5973722492ef0d642762527868f206302fbb01ee101220f145a56e5 +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7ab7f45bca96dc941b1de48559939da70faa742a --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:75a23826669aba39a955e170fd34f89e6c39f34525721e945c7b78fb117e164c +size 263350829 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4b81101dd514127c6f79b8e5c4a64be77123cf2f --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:95626f3efe82ccff3ccb8154e7875a1aab069e57569d4926a3382efafcef5155 +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e936e50022cdccc8824f38fb14ef4e8f4ca94905 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8cf7b8f25d1e1899475bfb9a36b9dc3f2e1ba4309af8e0628f7b9ca40731e59e +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c03c0ad2d4f4b87a3085101f1874301f3a5c5178 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17a10e0ebfa4d018e3111ca900c2dd6792fe8fab1b2e73294077377c7c81676c +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0fb3ca85a0c7b79ef84c78fbd32f89122b2810c8 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4bb1cf54e31807ba1571eedd886660d40e299c051947dc81f33840f2ce08e791 +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8df3ac304c7325da4fac5fff49709fd08026c32b --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ebade230eec017cf268c8cfc0f9e275de43637719040d0c8b3625e39196fce2 +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8f9009c389142c3f0a2aad3010e1b39dedc08cc8 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69c6aecfc015df67ae666ecb578e05f2aab15a89bca764f1c51e7cf9656c0423 +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3d7e9c1fcb7f09d7b151a506525fac776378a675 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8cdb3fb419c44abd0745a18c1c6388d08f9ab3ce1068e6e7107cb2a6130db5cb +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9951d3a4d95613a498b029d070321d8b197930e3 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:976d98c92a28ec94e0073c4612c64fe51821f0b57139608af7c6119d296551e6 +size 263350829 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ff04221241ddac972fe03747c6e18475f7771b9b --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d094d812be777803b8ad40b0f203e5f7124928e689c4af93e133e4947fa05d1 +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1241c1f6e9bf9e36c7ed5917d58cb8ac60d309c7 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:da7c45732f5b89321ae289cd70d8bc7eee21141c371fa99b4629add9ad4d4649 +size 263351021 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..07ab9b46c7057509d44f222cb171b1b8febe3176 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7eac41a4b64dc1d0cea5797eb3db734924863b3da46afcb4f5b7eaccdfc20c13 +size 263350829 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..035e61cb068193a439ce904a7a41db77b5480344 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:284bb233337e1b61810d5df7603135466905917c450c23eefec9fc257f370f89 +size 263350829 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3c3b3980da0c5ceac67cee54e8135abfb0bcf07e --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c1514216f02181a938928b6601578846a7ad6cd789421d1e850a5cb129ae91c3 +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..16118a47b2a85b9fce13134d6c2a678af7018ade --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb62014bf206eb1065ce28d754839bb694fe67a19f311d45a7d9bb9371e2545c +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ce2d467f085f9422fcea41576eebcc9121af9e6b --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db04f4831c878f35304dcfc72d467c1f9fccb302797596cf2481c1724de20577 +size 263350829 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0df5db53bca51af5e06423ff61ba5e640b09a9f6 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ad90f5e309b621038cc9d7022b601169b75e3422266484a31e7fefc12582e39 +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..379824cc91d3e88bc6ae15b0a9bf394355c83748 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82e722dae72daa04b4ffe3124b2f4cb54bf68d03373d2f404cfeb77fa1d07782 +size 263350829 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..61ac9c5c8e821ff28632c7e7acafd327964a6920 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dea5c24061a6d2b7a4cf5489edf098442300820663ee17ff8f82b51653f889eb +size 263351021 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cad5043210d60a86069ff12a2966275633237b54 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84064d30fd26a37ada6c48e3a83d264083858de7217660fb303c16fbce43231d +size 263350765 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ffcb19d942ca644498113e610c5fca9170e51d7b --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a960af1615928643e5cfb0c29ee9b2824476beca2e8f728010ea2e0b8495cd63 +size 263351074 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bcb266026562f4eaeac886990eeedf7874ed6080 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ed02944d43269ab23e7e2f53c8cd88427420c1c0abb9afcbc8870663c0d3d12 +size 263350893 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..df3611905a4291996aee5c966a135e5e31cee0b6 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de195dd97ba8f9e030ad704a58463d45bdfec5359d112c5cc4e1fe2b8bdfb830 +size 263350957 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9f639e8ed164d4374a4291440cd201b072936544 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9c1b1d039336b0b92de56d080d496fe09c33f76090695e21963933fc1abf47b4 +size 263350829 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..98b6c11f75cad839e13394cefe440e1e0664c956 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:525def2f7beb9f0c2e9bbdf01ae9d16a93ac326bfdad45e1d24a3e4b779cd0f8 +size 263350957 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b66cae0571cf82ac497eb4d19dd36d21d9040413 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6cd66e5e1558fedac3a305ac5b0805847e0338187d5093a18e6e1be0315dcfed +size 263350765 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a63d0cc4ae1e3c0e3ca41f12a1ef9180b679fb5f --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d6424a9cec2fbb1f602136f878669545ac52e1dd51e514a0fe1833b57240b5f1 +size 263351085 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1fe62b87b59dcf1adf7d3b4d9065441e2dbaae8b --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:459d18336cf7c4ce3810e6c3d19c5668b8369835760fe587a28e832369b58bd5 +size 263350637 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fa8dfca59a0a0a26ff2bc7730936bf502abf4d76 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72b43497a27f786dd42363aa31ca1fd9f5941e599748544e3ce7288f0ab93a68 +size 263350765 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a98d6ee39d0b3bcdf929a39a3339bc5bfc088543 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:089ea84f112bd6fbe3f38d911cc264da26c492798375aebe38ff1e5f1750d202 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f1dabe65cd04ef3341434b78aea976c1bd42bff5 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e65ff997f5df6cdad9e556bd07acd39dd262fd3e7f8e9d64dfe45b21146eea6d +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb04fc2f3866575ac502d704d20d3346303f4c70 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:df2de9cb6666ffa31b91c43f598882b96edf7d91d7dda57f3eab6d58f25fe9b2 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b402fd50a71fb299f12fde49679caee7e9cfc35 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11b53a6efcf842cc09ae7e61dbccadbe16ec4ff3733563220998ad3e068750d5 +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d67bd8e0819c13885c393f4be16f3e78b537d701 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb96336a5656d997f16a1422d8f4b786ed8c8e584b970bb007a1aab5fd52ee1b +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..92292a69563932d61c52af17a1f258b8b61918ef --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23c860298986e3537da558faba277ecf91da01fe2c0b6334b5bda7a4ac172763 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..71a98fe565347844e34e2e615bf1bc2aa825a449 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d92dd24229248df3d71f5def50699ed777e873b9e653445653099896979865e +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b1852a5f8489980b5f70ca5829304808623154d0 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7329077f82d4b687ef828655ba5f5def3bb756a24e18991a80997585c24c27ba +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..225de6c60fe3044f1388dd1409c77829b86e3ae3 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b54250c1424717a87ed3c8e687bada7dec4de1581c4682d5f7efef91918cdf51 +size 263350807 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0113b8916a3a0e9f753a605c51400f237ad4938c --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f94dec7a6693bb7ce36aa2c18b445f2cddc560cd9a4e2af98cc0be6a602edf8a +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..426e01d1a0c777c60337ea405a8d8081c533ba3a --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f4cbf910d33168e1c604a8f393b248ee5658268a15eb55e0192a92ac32826ec +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d2be051fb5259005310f042fa7748f501424bd4d --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:277b7124333bbffa96bcb4037535eb56009c119696a93bd0219a3fade54c2241 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e73b635d5ea56859309059bb043a5a13e98b6ef9 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9d639c1a5a4b9b563d063cfacf16920dec8b90492a3c6c0a57ed41f26b40f7c +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..66c5d4ea3d08d70114a654084aab6fa81e4a5cf9 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c01bc6eb0cb1fbba360096b09f8d7b608f608d6720e55c13351c9316728b9727 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3dd5634e0bed3ac57294af337133146776d4467b --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bda77cc9f14e76b36c7021a2bc523c8f6edcb09aab1729123105174ee9b00dcf +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3c33e2ddb3ff95f3c7b1543745bf1b7d67b74561 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef5516ebdfe9b9db2b113a8fcbde40733caa286484b407c1d723fe527c5b2d7d +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7edef31fec4f50e6c3f896ebb1909b56ed640713 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7582f8a3401e8d8c131bfdaba6a613d5eb0f110ccadf4350a9ecd6191efc8b9c +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..97a3a05be44c7540200e0d2758824124a1272c6f --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c7b3799c3de82ef0664494413d977bf3019731ec4e3b60797659299b306a000a +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..34e987e3c5cbba6ad2716e588db5e4660622e469 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3fda9e6bcfc8f28b824ac771de8beeda071e87e3f497560ca859934ed29a981b +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..56bb208437338fce8cdf46ea8126552adb124c1c --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2aeba46e8bf912ad3fea586b625814bb6ab5f3161fc97150e10497334ea4607 +size 263350871 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a3a803b52311540f861bc53f76c459fe4c64bb25 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e7e2f40c6b34bf5325a1d10802394f9b31665c695a0ae3f30f2421742cc72d58 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..278d0ea06d5579829101d5fca4297475a3827ad4 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f017b1d89a7a1b1465fa8d006cb578e7db1994ba57c445541af64c17fd6ddd64 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..84e85a58494fb745066ee4cdd787e64fa2861f14 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6952ffd2b23a751780494238e32f473b75c85b76be674b9b6f96916c3d2d5d54 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2d3f5508bdd098428b7159038f2928845f9ac10c --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a15a763b61900be01f048e5b796412613d55f2f92fbf5a231e53b5414108e741 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5880e38927a4502877b94db4f45f89d818b2be38 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e423ce6c50063ec4889962e7c7bbc0680dfc779e5757bc8199f38a6962743c72 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fc19f374d65534c94bdd93c8bf622704bede7ec4 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6851ba8d879d145b0f879e75d3cc19ad8164c5168a72d97c9fa95032907f6d3a +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b34391ea179b669a55517c18eed13986b1bae3eb --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6edb6835597c58502cebb9c5439569ccc1a621d9a94b955375932581782c08ed +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b9462dfc6ed68547c5c0bfb0ae2c102410f949e --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34b54a9d0d15595e77756b0a0ffc9d14815e0fda4a5dfa3f905102e0fec7d31c +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9d3e00ce1a8c02e6b89d5b287334df45219b6e19 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f66240a856f49daa81f36ec7370aa987630e4a6509c809c07391a32e8847358 +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..24675f44972164329b7fc1bbe0609b4290adf412 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1591089559cf3fce4d54b1582b1584a7b5f420acae0fe04acdbd800b804dc941 +size 263351010 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e44f1ed82ff4a49dabb77ae607615a5966619bc0 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d18700745f59f4fcd04aea815f83f37f176296ddc8dee07ad94112bcad6e3656 +size 263350807 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..67f63f2466a67dbf0ca945be7996e7c9410fd8e0 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d811370146c03019ba1317363d1078db20600f7ea592c26e2966230f588a206b +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7e32b1e467fbb7501f45ac3a0bc17ae558e8f4a9 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c7b171c54e92f849008ffa0898d23a3d45d9ef0abdfaafb0b6ea759bedd734db +size 263351010 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c039d096c5ae90a5eadcfb6ef61813fa6816d64 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02ca777a14024d0a38aac4482d2b01273217999a3b1fa91d5a4134ed6345f459 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..55a75ed98723c1f890dcbd69d38c337be8bdeac0 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9cbd48019422f392aa63d582ef79b93eec26d9bb925921024f3881807b4adb0 +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..628650117c9c6fd3863ec47bfe4aa47abed18882 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:99920a21ad9c14ba64d5bfbd4d2e7f0ea615fc64a27162e89fce4605e32230ad +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f82426222367b18bf83d5fb629bbb5b4bc5fd87d --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6b670d41ea789ff631605ead946d17c33b8d0db9577ac94ee5956aded833e99 +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..53070e6b902df922a6ace28f521ccb33c2e0d1a9 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b98fe74a832760ce709edbded412d705d99b41ae37aaa2f5a00253751b35b16 +size 263351010 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f312416333053e3fe53702e98178082ab1530235 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47a45cea60b460f2b918fabae7094a8eaba0dab6822bbb9406decc607fe0d3a0 +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9acf6b935c51de4d3186a4be611b3faecca0688e --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35d8c62fc2cc94b67613b0ecb3b38893f4c98db2a0e9297e9d84871733fec67c +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..8274279abc2705cc587a96cd84af5199aca19769 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b51c72d36c1ecac37cddf4852c559502d6d838ff4f7f2bb2a3d9b5422874f03f +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..edac8b42e8520e171c62e164168c96be0a197996 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4cd6e845498f62c1769aff1ef46b0a141081a7c9ea895ae2a46c4d14129ac121 +size 263350935 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fdfaa7e1705f171fd106a4ad67ab65f90b64314f --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:071ae48d60d9b658266c12f0700e1c9c449d64144054e38a2067101c832d40c8 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e8bf0d53e868847f4946134e68f2839c27dabc67 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed947fe8e9f4cdd4c3a2300ef27026d54c8c62b88869fafad20fab48dd5c4f1b +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee5d18bcadeb00f889b40ff19b46fd42df8b871c --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c3b75f4028d3648d9117aaab6b7586a2a71e8041f4762510f11785b7cc565783 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..708133c2466084afc44812c1572c9d21937f8f15 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:76b9bc4bca18629d8f2fcce202bde68879614b3791caf5cc1e5607287a458a74 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..63b906b3176f08c6e595aa1e3d435d4767ffe206 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c0f72cbb55e5f4ff5f83b669d1423eeaeffa63f1be03506242b893bc6cb16deb +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..fb689f072401de648df60a6b7bd2c6b3d548ea76 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:62566e12cd6fa05a52641e9f179d3f988a0170df8dd790fb76758e7f044929b4 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5d45195d745b9aeb527939dbb473cc8942721303 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e7d76c1c05b2eeff774d048e0f5123cf058c3b640f33ac33820781e09575cb06 +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..480ff3de7350adda4b2f000dcc962afa8a4ee064 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b95b213d130f0c81865aff97a6c81786d5d7ae5ca05c73e11ef63efe611d450 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bf2bbed0676ab5197e35903fdfc588900de437f1 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b02d621bf4057eec89514e2824d71d96f51a0504cc377944ebca32da9f7ac15 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f9bd529010536a528cdde4851ffcd02f0215f261 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6328d3cd753ded35cd56fe2ff0ccfd1e9c7c5fe74dbc272454520ff3477e6499 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..253947d62e027aa013abe376ede8b4bf78b73aad --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e74416d223361b2113cded7f15aebd61ca06bba707f72b79319b4a7909b66b55 +size 263350871 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c67ab37ef200f88e1a567245f9c61dd45c84ff31 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5f5ead229eee22a4a0e5bde66e335c27b3d21df830b07acdc1ccc05f16dedd1 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0598de7dc006c7b55b41dbb7a0586cdc741dd790 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:715ae08d1dd69bf6019655d8c7910801883eb728407081baca3e0f97200f7f14 +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a6af9b0474d41de5d81f19575b0fe0db086ac2be --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7f22e87daa66c51c193a03447a2a713ec9f4a6b680390ddea81e906a9011927 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a8050bdedcd20fe981056aecf4c90151a39807a --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e0bca1b100e01f2c39bf9cbb70d452d7696c7b030e8200ec2b59935e802d6c59 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3662a3d6991c1b0a3b2b334201389f43aca2950c --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5fddbda503f4e18096608e1c13d1690b47f4d1a1ff4d7ceff131bac6d5da034 +size 263350690 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2ef7c6b17001bedfca23c5ec288a7542bb519dce --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:208a710f39052dac315bbd3df80512e26f590cdbadda0a6d4cca935bf7f37646 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..e866cc82f54eb35e1a7a8cf643f36a51196a7045 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c8e712932020a792449b7383131636b6c3c1c2cccad789733d90665bd45586d0 +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c398aa22335501c84334634dab6a8851ddaf6667 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:397c5fe3dff8751525c9de52aeafbf17e769426593174d78bad155e73ad52a4c +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f7eb174347c80ed870f1f1abf9a6d57541c18ce9 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b96baa90dbfec1bbf2113fdd9103cbb3997b521d725f21d135ec2b4d7b207faa +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ad40d0b49883143d33be5e0c08ef23fae905065e --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fcc95b1c5e58666a355862bb15bb58827f0a384e062245693c16207f3c69b242 +size 263351010 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..4c52848e5d8b4b42675141d7cfb23d6ddc01625f --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8cd02adde7be99292efd2d3f51b4c44ac575d857930cd54245a202015bef7e35 +size 263350743 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6004f437dc27954b984046721f5377ff4044fb4c --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3b20f7e7fd9b19ca698f32480b88a5edc89e9cc338f21f0ace1909dedfd7892 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9da829739c9a5e2905c6c420cc775dc4d993d89c --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3d07930a35be8db0ed366fed15580638c76ba17b5365afd6004d8186b105d67 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..56cdb2d87802b4c98efd5670e8e7610b7de34486 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69d3268c24fbe3320b721025647b43da3d71e23c5c0b3677e9aad4e1aec52b38 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..872546389ccf9f0afe25253c37cd95d573c60b35 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b7bb23b848d7bc5c40113a3a68fca06a55219ba8d6f34a791af551d4866e7c8 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..046ce02ea94b0889bea155fbf1908993119942d9 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b4da934921cf6c23989ea82ad8f7362a23fe7f5f98d1dc034a2291a381b878f4 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b81bd77f25884249aa60c19957a98b5ae723999c --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2246f47acfc8fcf681233d3cd2c9c4f4703b5ce0d611baabfda1aa2ec9fa3e9 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..274a9b99364b4e769dc75e2d201d17a1ba8b1ff1 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:585d236a9bb60fa5c3eb88392f221305cf5cd6af4186764c687339cce6f53fd9 +size 263351074 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..637ba7f4d6009c88415e6fb384c0a902e5498ea6 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8cc3a9327fd9c18bb3daa85b9322d76305c4a2374d0da45025007c23135bfe36 +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..45634e92de92c8b6d5ce292f598b3353c9689c4e --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5cad7e37b5ecc34897785186a5ae18f13db30bedb89e9b45814f609ad70dee7e +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb36de2639b9c5e340f805178d6e7adfacbc97c7 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed00e521f5f65a4cc7f3b3a9bd81d02ec719c59cce160c55ca12f842ef54581d +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0151b2f3693a88ba4b92956253eab4bd4a077341 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11546ebd728522876b7b71947a86bcdcd9b8c39f95269d16fb1681e5269a785d +size 263350871 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f3d895d56fa5a2a30a98c5d7bb30ab4879773099 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:219a997972a0216332118cb28e8b79e71ca381b58f41612e798b80da74f96254 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..624513989b9243a9925b797e372dbfdef2844f30 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3adfe9b2eb508d105b65ad4be4cbbda823af51da8178347cc962e5a71363dd23 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2b5764adb3b4214ba7c1a00361363fca4ca72fbd --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b8329655a973975de2ed6af125d2b5f8e1c61c8e7b01c334eaae874287896f77 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c32582b50262bac54398635fa3f916466bff1728 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9dbff19ecd04d96a97adc8301bef46a20a588dc806f8c9e8def482a852094778 +size 263351010 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..f312b19422680d3c8cfb73bf70b544bc6a666830 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4385558a99c0473c6100b525eca7e3676ea305d1fd8da2e13b537f9c7a4840a +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..712ca7ce1f0e0f1c56ef0ce0a40f511f67ae2092 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:841f3020ba4b90b0d1b7de1f6c1642cf36740862bd9221a72e42104a6637a73e +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b899b7c0aa516c1c9ce4d227e269cd9cd86e1333 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b2d7123e5754e47d496cc73eb92560fc71aa605f6afdf498ac4e08d6c879d2d9 +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3bb4bef1e312576d3fc1c27e245247895d1ff5fb --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5f1fda21ac963989232a419be4226920c65625ffb446aade69065d98ed957e7a +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..26d20392842ef02b56621db9eb5e9cc0ec294c0f --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2897c897b89571679c387a9756044f1078c3f9a545c826cf59be956b1643d0f +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..7999d79fa22e21b80b1c5e99385646bc68b1631e --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:303dd5f22afdcf3fd97407ebc98a0893e9503fb4cf46f5d6b9c2c0ab400e5ea1 +size 263350754 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..30fd6b9e2991e071b217a09c7ba2200b5440b563 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6cceecfe2173910240f46247554d9242b98913f97cd8defe09876bdb8789d13f +size 263350807 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bb399fb50c7b959dea2c8165a1bb4ce1baedb8fd --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c0f09ffe5873678accd334d2438388ec935088248d04f111f18ce7cdde30f90a +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..483c02dd267000e8020257d9a8997050daaf75f6 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:155233a344feed95773091ff1b9e033d389fb3dd2a59d96504c811e3f2c52194 +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a42829ce57ddaa24e543d670be42c6c5394ef37 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:16e801cc08b4e7ddd9cdad39d9f25de40abb690f60bd07efab93dccbdfdcf373 +size 263350818 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..527af44d6358da4e4100b7a12cd40b1092a96ed9 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e40313520a2101727105978137c72bb76fb56f4a81146bac537fabc8506c8b1 +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c30eda9f989667229387e5b9f64fd3d21df59be8 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a3bd86197531ded0b9c23ad9d7c616e803f459a69f66b4ce7c170212ba81f6fe +size 263350690 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..bfbcb5fa217dbf52c6b25a90e49f5bc62df9ecc3 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:722bdc6be5a2f211192e5250dc593d06f6171b37c4cb843e8ea15a7b8b83e64f +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c8ecd2dba3c14923585b4a83cd071a6c9c407e8a --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8a9710e6689db458112c6e0dfbb645005a7fd26d540a2ee232a52e3ae9ee5e0b +size 263350690 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1ee91a9777bebea76c775a5ad00427b22c55209a --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e624b3ac82c5d1961ad9507ab401b55c4065dcc0339e0fdcac46c30d11df6e4 +size 263350946 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d6c3953117a1e2738b4149e820d48f163465e105 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f973d1c164a5ad342f07c4bd77d67a765b6f3b97df52bc6b7b450f030badeef4 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..1911911621592d6d2b1edd72cac80d1fdcf2ee13 --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c079abc697b2c216f6755f0b94f246399825894473b7c48e1a4c42ed1d9c9e4 +size 263350882 diff --git a/2b8100m100m/global_step95/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/2b8100m100m/global_step95/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0fd0fe0816df3ccd7be8f0883e4088662f78639b --- /dev/null +++ b/2b8100m100m/global_step95/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1077f70669f5361030d2f30487d91959118401f2051956326ea1bf9187778743 +size 263350935 diff --git a/2b8100m100m/global_step95/layer_01-model_00-model_states.pt b/2b8100m100m/global_step95/layer_01-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..85b30483533a3be74c6e50e0c8890c2fcdc36498 --- /dev/null +++ b/2b8100m100m/global_step95/layer_01-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5104387f443f9345428a350c04f60454745a2193110844670034aa6bdd5fe084 +size 268043523 diff --git a/2b8100m100m/global_step95/layer_03-model_00-model_states.pt b/2b8100m100m/global_step95/layer_03-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..318a8811eaf1b664d298cc6348caafeac7e2a50f --- /dev/null +++ b/2b8100m100m/global_step95/layer_03-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2e01838575c1ce197a43a04d09ef3b6114e37a3f4ecff7063e229d43f116774 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_04-model_00-model_states.pt b/2b8100m100m/global_step95/layer_04-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..37f8e596f3bfa90528e48aaafc2fa45569828ade --- /dev/null +++ b/2b8100m100m/global_step95/layer_04-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67c03b3cec53eb74e285dbbac890853df23734a7797265adb4ca6a6c04b4d600 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_05-model_00-model_states.pt b/2b8100m100m/global_step95/layer_05-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3446492af13caeb8705809d7f5323a0e449cbccf --- /dev/null +++ b/2b8100m100m/global_step95/layer_05-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:467aca9fd38381bbb060c094b4f8f8e9ed1e60cc451ddca288e85582ab08d276 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_06-model_00-model_states.pt b/2b8100m100m/global_step95/layer_06-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..df62f612c68ca7934a04825ac19573ee5750091a --- /dev/null +++ b/2b8100m100m/global_step95/layer_06-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69f0ecfe656917e5a1e49cb71e7e2b550b755a236b2321b10dca9929c3f7f668 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_07-model_00-model_states.pt b/2b8100m100m/global_step95/layer_07-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..9a687955ff1d1687cca341aa60261a93e06bbd9e --- /dev/null +++ b/2b8100m100m/global_step95/layer_07-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:83d188eb29a252c7d4218c15870ff080b2f3067dd01f07afd4a01b4207eeaef9 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_08-model_00-model_states.pt b/2b8100m100m/global_step95/layer_08-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..80ed5116abda0e59db79ef1c738f4ef7267c79aa --- /dev/null +++ b/2b8100m100m/global_step95/layer_08-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:10ef1b256e999c791d33953b3f5e4117d0af7f9c1cd0d4cc5e32752ef0223d48 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_09-model_00-model_states.pt b/2b8100m100m/global_step95/layer_09-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..0da7dcbdf3652be99b8a062c8cfb74dddd6bb601 --- /dev/null +++ b/2b8100m100m/global_step95/layer_09-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a868f0150d738405a0f905c8ee071291ffa285cb6dc8dcb6520b1363e82c7165 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_10-model_00-model_states.pt b/2b8100m100m/global_step95/layer_10-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..efaf58df0fba49be39bc4fa19c6faeeac582bdcc --- /dev/null +++ b/2b8100m100m/global_step95/layer_10-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bace3505b67be3bb0f5e0dfb6c2a03c2179aaf076de26fb03618f0cf715c0864 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_11-model_00-model_states.pt b/2b8100m100m/global_step95/layer_11-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d8671fc6b906eb20eec6487da74365d02c749fad --- /dev/null +++ b/2b8100m100m/global_step95/layer_11-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:686697fa53475d9b2e9ffb4959c4f0fb6f153cb6c0c811dd150d458c1c71b910 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_12-model_00-model_states.pt b/2b8100m100m/global_step95/layer_12-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..798759f33c4f5a63d34066880afe23fb1d6dc5de --- /dev/null +++ b/2b8100m100m/global_step95/layer_12-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f65d43b708b7bf775346a884d2de86fb832b3a8b7325f87deac37cbed8ce902f +size 157357315 diff --git a/2b8100m100m/global_step95/layer_13-model_00-model_states.pt b/2b8100m100m/global_step95/layer_13-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..01e3706d5aeb52527ca182254e3d9dca364d9464 --- /dev/null +++ b/2b8100m100m/global_step95/layer_13-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2fb478ce1be15c2207ccc12f74d0b5da41679c770b1a0a1dea5892d440a30cdd +size 157357315 diff --git a/2b8100m100m/global_step95/layer_14-model_00-model_states.pt b/2b8100m100m/global_step95/layer_14-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..eaec365bcb0dee14d44ad6ca6e3d7b95e4be924b --- /dev/null +++ b/2b8100m100m/global_step95/layer_14-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1750daed939442843d3b0995347883c07e334740bd9f14ed228c02affadeae1d +size 157357315 diff --git a/2b8100m100m/global_step95/layer_15-model_00-model_states.pt b/2b8100m100m/global_step95/layer_15-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..96bedb2c8e176ee6708ed26114e1adaa0d6a1f55 --- /dev/null +++ b/2b8100m100m/global_step95/layer_15-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:650a149d76dbc9e33773a04f57ee3c9c9b10f8edc528230f5a8590732d10a46c +size 157357315 diff --git a/2b8100m100m/global_step95/layer_16-model_00-model_states.pt b/2b8100m100m/global_step95/layer_16-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..cfef85868fd9e022e6cf4fbb513664018b899e32 --- /dev/null +++ b/2b8100m100m/global_step95/layer_16-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6fecc31d7621ac16575e4b9992cb5095888cd20113639ebae329b81d865ccd25 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_17-model_00-model_states.pt b/2b8100m100m/global_step95/layer_17-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3944a4adff6de4a7d8b0749fd04fb92966b6107c --- /dev/null +++ b/2b8100m100m/global_step95/layer_17-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5ba81407498067ca323d095ad285cf065bd3b043042adce3bfbeb5846940337 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_18-model_00-model_states.pt b/2b8100m100m/global_step95/layer_18-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d672f5aa06ae7932c176690c735757e75851fef1 --- /dev/null +++ b/2b8100m100m/global_step95/layer_18-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0a12f2ee14e4718c2ab956fdeb2dee8116a40a66940de3787f2a3cd34a578ace +size 157357315 diff --git a/2b8100m100m/global_step95/layer_19-model_00-model_states.pt b/2b8100m100m/global_step95/layer_19-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..3adc5ab0bfb2866891b106dada35ca18fb219219 --- /dev/null +++ b/2b8100m100m/global_step95/layer_19-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7499db541f0e33db5d641d27c06763997d567f424f4ac2026ec456242982650e +size 157357315 diff --git a/2b8100m100m/global_step95/layer_20-model_00-model_states.pt b/2b8100m100m/global_step95/layer_20-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..d129531825d8df45b61c7a42bf289afd3f395e57 --- /dev/null +++ b/2b8100m100m/global_step95/layer_20-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b342fbd3b028d189c0bfc43ae0ecd59a9938631fd2bc77a298a63b0692535d68 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_21-model_00-model_states.pt b/2b8100m100m/global_step95/layer_21-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..860f1aed4e6ce38ffe04fcbcdc27333ca69ce8cf --- /dev/null +++ b/2b8100m100m/global_step95/layer_21-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c421ebb39985e3147448e4c989c56b45bd2fe5200d66201cce17652349d0e548 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_22-model_00-model_states.pt b/2b8100m100m/global_step95/layer_22-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ec6663897ebf1ce49caa66dd39f5445fdb7c3faf --- /dev/null +++ b/2b8100m100m/global_step95/layer_22-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:060a1678912ccde618948b89fffc0b3279c1b36c9fe844083f3403405471f2d5 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_23-model_00-model_states.pt b/2b8100m100m/global_step95/layer_23-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..37987b3b7016e9cab1c1ec32a08a9b550415d3b4 --- /dev/null +++ b/2b8100m100m/global_step95/layer_23-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0155816be25620d2ae46ea3605258ef85794c2fc29ec1c91b84badb3b706f2da +size 157357315 diff --git a/2b8100m100m/global_step95/layer_24-model_00-model_states.pt b/2b8100m100m/global_step95/layer_24-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a26f4ae37f469e7b5bfb7500264624726090f3b5 --- /dev/null +++ b/2b8100m100m/global_step95/layer_24-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5abb74065af748f7a5a792b77be7cdcc7e3e163018a7f24dbdbcc6cff4383055 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_25-model_00-model_states.pt b/2b8100m100m/global_step95/layer_25-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..c5660da6f01b06cbd7fc989b039b0db598372f94 --- /dev/null +++ b/2b8100m100m/global_step95/layer_25-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6419458cfe2fe6a4857b0f422714b59a5fbec9c2c4a36e15148e42097b8e719 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_26-model_00-model_states.pt b/2b8100m100m/global_step95/layer_26-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..58d5608a015ab8c338a1041a74d03d8ba0bf15ba --- /dev/null +++ b/2b8100m100m/global_step95/layer_26-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3d91b9d58400309bc3c8cdc3b5cac833573d35fec0882d4b37ae16158455528 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_27-model_00-model_states.pt b/2b8100m100m/global_step95/layer_27-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..ffcd73eb7e89cbb5e1fafb8ba1de211a619aac05 --- /dev/null +++ b/2b8100m100m/global_step95/layer_27-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fc05abb807093fa397739fe7b4fce636394fb4b69bb965085aa37ff70de499ca +size 157357315 diff --git a/2b8100m100m/global_step95/layer_28-model_00-model_states.pt b/2b8100m100m/global_step95/layer_28-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c3a9f526dffc184ecb673407900b7132223b747 --- /dev/null +++ b/2b8100m100m/global_step95/layer_28-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b9c26eaaf240f7581697f6a7f648bc0fe77fef75dd26f016765e89396583a85d +size 157357315 diff --git a/2b8100m100m/global_step95/layer_29-model_00-model_states.pt b/2b8100m100m/global_step95/layer_29-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..90e565b446e0323fccfe01507c35f3cd51c365a7 --- /dev/null +++ b/2b8100m100m/global_step95/layer_29-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca2a18d379bcef5edd49800dc1865f61fae5a2d5c8f39dc839217ec40064f472 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_30-model_00-model_states.pt b/2b8100m100m/global_step95/layer_30-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..2eaada5bb6d6f51cc47d6a383af55bb777453f4e --- /dev/null +++ b/2b8100m100m/global_step95/layer_30-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa67acefc0ab19a9db9fc54ed5b4735893d0a652769c42bcdabd19e8cbd931d9 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_31-model_00-model_states.pt b/2b8100m100m/global_step95/layer_31-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..645fafab5618454626e8d09c046044cdbbafbd19 --- /dev/null +++ b/2b8100m100m/global_step95/layer_31-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eba702ec7088a9e93984384b23dc8ee6b6345e305c2282baeed8f71b0768dfc9 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_32-model_00-model_states.pt b/2b8100m100m/global_step95/layer_32-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..a372de406b2b27a08d2366a197591ddea58dbf19 --- /dev/null +++ b/2b8100m100m/global_step95/layer_32-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d6ecf8d574ca0044697831db35d9282a9ec3612ffe1e3af6203a5de71ae0142c +size 157357315 diff --git a/2b8100m100m/global_step95/layer_33-model_00-model_states.pt b/2b8100m100m/global_step95/layer_33-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..5aefac2a4612a3165ab5f0309f04542b7f3439c9 --- /dev/null +++ b/2b8100m100m/global_step95/layer_33-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8524dbacb968c4eee7619b46ef0f8af65a45750a64da4059fadc02a248de35e +size 157357315 diff --git a/2b8100m100m/global_step95/layer_34-model_00-model_states.pt b/2b8100m100m/global_step95/layer_34-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..70471d31867101b5c9e2b2de9fe3e81b9bea6dfd --- /dev/null +++ b/2b8100m100m/global_step95/layer_34-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd3cb2ff0cb0d59c66ec691e2ccb9730f200a730c1f804446561217423eefbbf +size 157357315 diff --git a/2b8100m100m/global_step95/layer_35-model_00-model_states.pt b/2b8100m100m/global_step95/layer_35-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b478d024106770a9377867c28f6eb08542bb6aba --- /dev/null +++ b/2b8100m100m/global_step95/layer_35-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a12bd3ff6258d71dd7d5355a88885efbd83e3faec595336f66ad91b7c4f9f6f2 +size 157357315 diff --git a/2b8100m100m/global_step95/layer_36-model_00-model_states.pt b/2b8100m100m/global_step95/layer_36-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..b955bd3ff4c8203d5bf380ba525194545664f097 --- /dev/null +++ b/2b8100m100m/global_step95/layer_36-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6aa592c43e70a5fb434a7ffb27a6488e6a366c9a27d61f60652737cb2206772b +size 157357315 diff --git a/2b8100m100m/global_step95/layer_38-model_00-model_states.pt b/2b8100m100m/global_step95/layer_38-model_00-model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..976aae1aa1591d167bf846d171e7c1cbb24ee495 --- /dev/null +++ b/2b8100m100m/global_step95/layer_38-model_00-model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb3984190130954c413bc0607a2559e278ed89993acb80442e8bbff50a0e4f8c +size 11459 diff --git a/2b8100m100m/global_step95/mp_rank_00_model_states.pt b/2b8100m100m/global_step95/mp_rank_00_model_states.pt new file mode 100644 index 0000000000000000000000000000000000000000..6ed2b5c05f0cf7c06fd87e9cf3779da7f52c94b0 --- /dev/null +++ b/2b8100m100m/global_step95/mp_rank_00_model_states.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9e6acc7dcd5ecca5ffe84d38a9f053f4f4b0cbe93751aa19be8c313ee0bd13f +size 49907 diff --git a/2b8100m100m/sbatch_2b8100m100m.sh b/2b8100m100m/sbatch_2b8100m100m.sh new file mode 100644 index 0000000000000000000000000000000000000000..b6deba8b426604a28695191a8b17760cd03b7876 --- /dev/null +++ b/2b8100m100m/sbatch_2b8100m100m.sh @@ -0,0 +1,168 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=16 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 2-0:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=2b8100m100m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT +TENSORBOARD_PATH=tensorboard_$VARIANT +mkdir -p $CHECKPOINT_PATH +mkdir -p $TENSORBOARD_PATH + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=2 +GRADIENT_ACCUMULATION_STEPS=2 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_2980M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 100 000 000 +# -> Samples: 48828.125 +TRAIN_SAMPLES=48_828 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 488 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --loss-scale 12 \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + --checkpoint-activations \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1000 \ + --eval-iters 1 \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/2b8100m100m/sbatch_2b8100m100mval.sh b/2b8100m100m/sbatch_2b8100m100mval.sh new file mode 100644 index 0000000000000000000000000000000000000000..2928b5a44c6ae660c1b21b9d10772d8b5d129681 --- /dev/null +++ b/2b8100m100m/sbatch_2b8100m100mval.sh @@ -0,0 +1,168 @@ +#!/bin/bash +#SBATCH --exclude=nid007571,nid007112,nid006774,nid007502,nid007506,nid007507,nid005145,nid006692,nid007218,nid007123,nid006124,nid006123,nid007496,nid007237,nid006852,nid007206,nid006947,nid007212,nid006977,nid007222,nid005444,nid007219,nid007493,nid007221,nid005300,nid005619,nid006118,nid005203,nid006113,nid006481,nid007077,nid005208,nid005207,nid005879,nid005901 +#SBATCH --nodes=16 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=32 +#SBATCH --mem=256G +#SBATCH -p standard-g +#SBATCH -t 48:00:00 +#SBATCH --gpus-per-node=mi250:8 +#SBATCH --exclusive=user +#SBATCH --hint=nomultithread +#SBATCH --account=project_462000119 +#SBATCH -o logs/%j.out +#SBATCH -e logs/%j.err + +VARIANT=2b8100m100mval +VARIANT_CKPT=2b8100m100m + +# if run without sbatch, invoke here +if [ -z $SLURM_JOB_ID ]; then + mkdir -p logs + sbatch "$0" + exit +fi + +set -euo pipefail + +# symlink logs/latest.out and logs/latest.err +ln -f -s $SLURM_JOB_ID.out logs/latest.out +ln -f -s $SLURM_JOB_ID.err logs/latest.err + +KILL_SWITCH_PATH=kill-switch-$VARIANT +CHECKPOINT_PATH=checkpoints_$VARIANT_CKPT +TENSORBOARD_PATH=tensorboard_$VARIANT + +# Data +VOCAB_FILE="gpt2/vocab.json" +MERGE_FILE="gpt2/merges.txt" +#DATA_PATH="/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document" +TRAIN_DATA_PATH=train100m.txt +# "train: 1.0 0:1 /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_4B8_text_document" +VALID_DATA_PATH=val.txt +# "validation: 1.0 0:1 /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document" + +PP_SIZE=1 +TP_SIZE=1 + +MICRO_BATCH_SIZE=2 +GRADIENT_ACCUMULATION_STEPS=2 +WORLD_SIZE=$((SLURM_GPUS_ON_NODE*SLURM_JOB_NUM_NODES)) +GLOBAL_BATCH_SIZE=$((MICRO_BATCH_SIZE*WORLD_SIZE*GRADIENT_ACCUMULATION_STEPS)) + +# Model parameters +source model_params.sh +MODEL_PARAM=("${PARAM_2980M[@]}") +NHIDDEN=${MODEL_PARAM[0]} +FFN_HIDDEN_SIZE=${MODEL_PARAM[1]} +KV_SIZE=${MODEL_PARAM[2]} +NHEADS=${MODEL_PARAM[3]} +NLAYERS=${MODEL_PARAM[4]} +SEQ_LEN=2048 + +echo "Model parameters: d_model $NHIDDEN ffw_size $FFN_HIDDEN_SIZE kv_size $KV_SIZE n_heads $NHEADS n_layers $NLAYERS" + +SAVE_INTERVAL=10000 + +# Tokens: 4750000000 +# -> Samples: 2319336 +TRAIN_SAMPLES=1 + +OPTIMIZER_ARGS=" \ + --optimizer adam \ + --adam-beta1 0.9 \ + --adam-beta2 0.999 \ + --adam-eps 1e-8 \ + --lr 2e-4 \ + --min-lr 2e-5 \ + --lr-decay-style cosine \ + --lr-decay-samples $TRAIN_SAMPLES \ + --lr-warmup-samples 0 \ + --clip-grad 1.0 \ + --weight-decay 1e-1 \ + --no-load-optim \ + --reset-progress \ + --override-lr-scheduler \ + " + +GPT_ARGS=" \ + --num-layers $NLAYERS \ + --hidden-size $NHIDDEN \ + --num-attention-heads $NHEADS \ + --kv-channels $KV_SIZE \ + --ffn-hidden-size $FFN_HIDDEN_SIZE \ + --seq-length $SEQ_LEN \ + --max-position-embeddings $SEQ_LEN \ + --micro-batch-size $MICRO_BATCH_SIZE \ + --global-batch-size $GLOBAL_BATCH_SIZE \ + --train-samples $TRAIN_SAMPLES \ + --vocab-file $VOCAB_FILE \ + --merge-file $MERGE_FILE \ + --clip-grad 1.0 \ + --kill-switch-path $KILL_SWITCH_PATH \ + --bf16 \ + $OPTIMIZER_ARGS \ + " + +OUTPUT_ARGS=" \ + --log-interval 10 \ + --save-interval $SAVE_INTERVAL \ + --eval-interval 1 \ + --eval-iters 100 \ + --eval-only true \ + --tensorboard-dir $TENSORBOARD_PATH \ + --tensorboard-queue-size 5 \ + --log-timers-to-tensorboard \ + --log-batch-size-to-tensorboard \ + --log-validation-ppl-to-tensorboard \ + " + +ZERO_STAGE=0 + +mkdir -p ds_configs +DS_CONFIG_PATH="ds_configs/$SLURM_JOB_ID.json" + +cat < $DS_CONFIG_PATH +{ + "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE, + "train_batch_size": $GLOBAL_BATCH_SIZE, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": $ZERO_STAGE + }, + "bf16": { + "enabled": true + }, + "steps_per_print": 2000, + "wall_clock_breakdown": false +} +EOF + +DEEPSPEED_ARGS=" \ + --deepspeed \ + --deepspeed_config $DS_CONFIG_PATH \ + --zero-stage $ZERO_STAGE \ + " + +CMD=" \ + Megatron-DeepSpeed/pretrain_gpt.py \ + --tensor-model-parallel-size $TP_SIZE \ + --pipeline-model-parallel-size $PP_SIZE \ + $GPT_ARGS \ + $OUTPUT_ARGS \ + --save $CHECKPOINT_PATH \ + --load $CHECKPOINT_PATH \ + --train-weighted-split-paths-path $TRAIN_DATA_PATH \ + --valid-weighted-split-paths-path $VALID_DATA_PATH \ + --data-impl mmap \ + $DEEPSPEED_ARGS \ + " + +echo $CMD + +echo "START $SLURM_JOBID: $(date)" + +# bash launch_srun.sh $CMD +srun --label launch.sh $CMD + +echo "END $SLURM_JOBID: $(date)" diff --git a/2b8100m100m/tensorboard_2b8100m100m/events.out.tfevents.1678973187.nid005170.79080.0 b/2b8100m100m/tensorboard_2b8100m100m/events.out.tfevents.1678973187.nid005170.79080.0 new file mode 100644 index 0000000000000000000000000000000000000000..448d7c7af9c6b9d24e8c334c85da1036ff7d1468 --- /dev/null +++ b/2b8100m100m/tensorboard_2b8100m100m/events.out.tfevents.1678973187.nid005170.79080.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f699880ff63ad6b2dbc2af7329d5b06e736c3a5e88f4681f89e22760ff8a448c +size 355684 diff --git a/2b8100m100m/tensorboard_2b8100m100m/events.out.tfevents.1678992349.nid005299.6176.0 b/2b8100m100m/tensorboard_2b8100m100m/events.out.tfevents.1678992349.nid005299.6176.0 new file mode 100644 index 0000000000000000000000000000000000000000..a99fc79ce752fb1cfb831a3c31ebff05d1433a85 --- /dev/null +++ b/2b8100m100m/tensorboard_2b8100m100m/events.out.tfevents.1678992349.nid005299.6176.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a3e87a9ac2ac18d915e7a5292f40ca59e94f7dcbdfbddbcb3d5ec0b0cac14489 +size 187500 diff --git a/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678985560.nid007374.88520.0 b/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678985560.nid007374.88520.0 new file mode 100644 index 0000000000000000000000000000000000000000..56befd4c6f2cb36a14b77a3a89b18d5eb2262ed3 --- /dev/null +++ b/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678985560.nid007374.88520.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d051cbfecd3e731d17f079e33bce2f5d7fa72a0cade1371c0ade155d027a9f2f +size 40 diff --git a/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678987313.nid005299.119563.0 b/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678987313.nid005299.119563.0 new file mode 100644 index 0000000000000000000000000000000000000000..1605bbd03558b2d32da43c6ff9ac98a7ba86adeb --- /dev/null +++ b/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678987313.nid005299.119563.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bfef1cbfa3fccbf7f66b905ea7c163055661567095fa3160218a4779da5ff602 +size 40 diff --git a/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678989361.nid005299.126973.0 b/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678989361.nid005299.126973.0 new file mode 100644 index 0000000000000000000000000000000000000000..c07fa24801b8af20379df6f18a8a496cc81cc02d --- /dev/null +++ b/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678989361.nid005299.126973.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a50e500ac78a945148caf84406a1417986180b69a89e1100027b315a128ea8ef +size 40 diff --git a/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678994567.nid005299.21258.0 b/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678994567.nid005299.21258.0 new file mode 100644 index 0000000000000000000000000000000000000000..fdb643afb9a994056133f9aa0e17ef055a876839 --- /dev/null +++ b/2b8100m100m/tensorboard_2b8100m100mval/events.out.tfevents.1678994567.nid005299.21258.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:797ab6c15c380313bda7a46cff3a041a3c30618ed993e31c5348a8626af20dbe +size 980 diff --git a/44m1b5100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt b/44m1b5100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt similarity index 100% rename from 44m1b5100m/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt rename to 44m1b5100m/global_step/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt diff --git a/44m1b5100m/layer_01-model_00-model_states.pt b/44m1b5100m/global_step/layer_01-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_01-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_01-model_00-model_states.pt diff --git a/44m1b5100m/layer_03-model_00-model_states.pt b/44m1b5100m/global_step/layer_03-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_03-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_03-model_00-model_states.pt diff --git a/44m1b5100m/layer_04-model_00-model_states.pt b/44m1b5100m/global_step/layer_04-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_04-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_04-model_00-model_states.pt diff --git a/44m1b5100m/layer_05-model_00-model_states.pt b/44m1b5100m/global_step/layer_05-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_05-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_05-model_00-model_states.pt diff --git a/44m1b5100m/layer_06-model_00-model_states.pt b/44m1b5100m/global_step/layer_06-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_06-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_06-model_00-model_states.pt diff --git a/44m1b5100m/layer_07-model_00-model_states.pt b/44m1b5100m/global_step/layer_07-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_07-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_07-model_00-model_states.pt diff --git a/44m1b5100m/layer_08-model_00-model_states.pt b/44m1b5100m/global_step/layer_08-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_08-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_08-model_00-model_states.pt diff --git a/44m1b5100m/layer_09-model_00-model_states.pt b/44m1b5100m/global_step/layer_09-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_09-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_09-model_00-model_states.pt diff --git a/44m1b5100m/layer_10-model_00-model_states.pt b/44m1b5100m/global_step/layer_10-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_10-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_10-model_00-model_states.pt diff --git a/44m1b5100m/layer_12-model_00-model_states.pt b/44m1b5100m/global_step/layer_12-model_00-model_states.pt similarity index 100% rename from 44m1b5100m/layer_12-model_00-model_states.pt rename to 44m1b5100m/global_step/layer_12-model_00-model_states.pt diff --git a/44m1b5100m/mp_rank_00_model_states.pt b/44m1b5100m/global_step/mp_rank_00_model_states.pt similarity index 100% rename from 44m1b5100m/mp_rank_00_model_states.pt rename to 44m1b5100m/global_step/mp_rank_00_model_states.pt